Installing and Running Llama and Gemma Models Using Ollama

Mehmet Ozkaya
4 min read1 hour ago

--

We’re going to explore how to install and run Llama and Gemma language models using Ollama. Ollama is a powerful platform that allows you to run large language models (LLMs) directly on your local machine.

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

This means you can leverage advanced AI capabilities without relying on cloud services, ensuring privacy, low latency, and full control over your data.

What is Ollama?

Ollama is a platform designed to enable users to run LLMs and Small Language Models (SLMs) locally. Unlike cloud-based solutions, Ollama ensures that all data processing happens on your machine, enhancing privacy and security. This is particularly beneficial for developers and businesses that require offline AI capabilities with minimal latency and greater control over their AI models.

Key Features of Ollama

  • Local Execution: Run LLMs and SLMs directly on your device.
  • Pre-built Models: Access optimized models for coding, chat, creative tasks, and more.
  • Privacy-First: Keep all data on your machine, protecting sensitive information.
  • Customization: Fine-tune and adapt models to meet specific needs.
  • Low Latency: Experience quick responses without network dependency.

Available Models on Ollama

Ollama offers a variety of models tailored for different use cases:

  • Chat Models: Models like Llama, Gemma, Qwen, Phi, and Mistral excel at human-like conversations.
  • Code Generation Models: Assistants specialized in code generation, such as CodeLlama.
  • Creative Models: Models capable of text-to-image generation, story creation, and poetry.
  • Domain-Specific Models: Specialized models for industries like finance and healthcare, such as MedLlama.

Installing Ollama

Let’s walk through the process of installing Ollama and running the Llama and Gemma models on your local machine.

Step 1: Visit the Ollama Website

Navigate to the Ollama website to explore the platform and its offerings.

Step 2: Download Ollama

  1. Go to the Download Section: On the Ollama website, find the download link suitable for your operating system.
  2. Choose Your OS Version: Select the version for Windows, macOS, or Linux.
  3. Follow Installation Guide: Carefully follow the installation instructions provided for your specific platform.
  4. Check System Requirements: Ensure your device meets the necessary hardware requirements, especially in terms of memory and storage.

Step 3: Verify Installation

After installing Ollama, you can verify the installation by running the following commands in your terminal or command prompt:

ollama -h       # Displays help information
ollama -v # Shows the version of Ollama installed
ollama list # Lists available models
ollama ps # Shows running models

Running the Llama Model Locally

Now that Ollama is installed, let’s run the Llama model.

Step 1: Open Terminal or Command Prompt

Open your command line interface on your computer.

Step 2: Run the Llama Model

Execute the following command:

ollama run llama3.1

Note: Replace llama3.1 with the specific version of the Llama model you wish to use.

This command will initialize and load the Llama model onto your machine. The process may take some time, especially if it’s the first time you’re running the model.

Step 3: Interact with the Model

Once the model is running, you can start interacting with it. For example:

> Why is the sky blue?

The model will generate a response directly on your local machine. You’ll notice the quick response time due to local processing.

Running the Gemma Model Locally

Similarly, you can run the Gemma model.

Step 1: Check Available Gemma Models

Visit the Gemma model page to see the available versions. Gemma models come in different sizes, such as 2B (2 billion parameters) or 6B (6 billion parameters).

Step 2: Run the Gemma Model

For a smaller model that uses less memory, you can run:

ollama run gemma:2b

This command specifies the 2B parameter version of the Gemma model, which is suitable for machines with limited resources.

Step 3: Interact with Gemma

Ask the model a question:

> How does photosynthesis work?

Again, the response is generated locally, ensuring privacy and quick interaction.

Monitoring Models with Ollama

You can monitor the models running on your machine:

  • List Available Models:
ollama list
  • Check Running Models:
ollama ps

These commands help you manage and switch between different models as needed.

Benefits of Running Models Locally with Ollama

  • Privacy: All data and interactions stay on your machine.
  • Control: Fine-tune models to suit specific requirements without external dependencies.
  • Performance: Reduced latency as there’s no need to communicate with a remote server.
  • Cost-Effective: No recurring cloud service fees.

Conclusion

By installing and running Llama and Gemma models using Ollama, you gain the advantage of powerful AI capabilities right at your fingertips. This setup is ideal for those who prioritize data privacy, require offline access, or need quick response times without relying on internet connectivity.

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

EShop Support App with AI-Powered LLM Capabilities

You’ll get hands-on experience designing a complete EShop Customer Support application, including LLM capabilities like Summarization, Q&A, Classification, Sentiment Analysis, Embedding Semantic Search, Code Generation by integrating LLM architectures into Enterprise applications.

--

--

Mehmet Ozkaya

Software Architect | Udemy Instructor | AWS Community Builder | Cloud-Native and Serverless Event-driven Microservices https://github.com/mehmetozkaya