Setting up Ollama on Local LLMs

System Requirements 

- At least 8GB RAM (16GB recommended for better performance).

- 4GB free disk space for the base installation.

- Additional disk space for models (Mistral typically requires 4GB to 5GB).

Installation Process 

For Windows Users:

1. Download the Ollama installer from https://ollama.ai/download/windows 

2. Run the downloaded .msi installer and follow the installation wizard.

For macOS Users:

1. Download the Ollama .dmg file from https://ollama.ai/download/mac

2. Open the downloaded .dmg file and drag Ollama to your Applications folder.

 

 

3. To set up Ollama, 

   - Open Ollama in the Applications folder.

   - Grant the necessary permissions.

  - You will now be able to see the Ollama icon  in your device's menu bar.

For Linux Users:

1. Install Ollama using the official install script.

    curl -fsSL https://ollama.ai/install.sh | sh 

2. Start the Ollama service.

    systemctl start ollama 

Running the Mistral Model 

After you install the Ollama, follow these steps to download and run the Mistral model:

1. Open your terminal (Command Prompt or PowerShell for Windows, Terminal for macOS/Linux).

2. Pull the Mistral model.

    ollama run mistral 

Now, the model files will download (approximately 4GB to 5GB). 

The downloading time depends on the speed of the internet connection. 
 

 

 

3. Test the model with a simple prompt to verify the installation.

    >> Hello, can you introduce yourself? 

  Sample response

 

Context Window Size in Mistral 

The context window size in language models like Mistral determines how much text the model can process and remember during a conversation or task. 

Think of it as the model's working memory, like the amount of previous conversations it uses to generate a response.

You can modify the context window size when running the model.

 ollama run mistral --context 4096 

Limitation 

Larger context window sizes come with increased computational costs. They require more system memory and therefore slows down the model's response time. 

On resource-constrained systems like laptops or older computers, you might want to reduce the context size to improve performance.

Commonly Used Context Window Sizes 

2048 tokens: Suitable for simple conversations and basic tasks. Optimal for systems with limited RAM and high-priority responses.

4096 tokens: A balanced option for most use cases, providing good context retention while maintaining reasonable performance.

8192 tokens: Ideal for complex tasks requiring extensive context, such as document analysis or technical discussions. Requires more system resources.

When you choose a context window size, consider both your hardware capabilities and use case requirements. Monitor your system's memory usage and model performance to find the optimal balance for your specific needs.

Environmental Variables 

Ollama supports several environmental variables that allow you to customize its behaviour. Two of the important variables are OLLAMA_HOST and OLLAMA_MODELS.

OLLAMA_HOST 

The OLLAMA_HOST variable is set to define which port the Ollama's API should listen for connections from a host.

 export OLLAMA_HOST=0.0.0.0:11434  

 (port number is set to 11434 by default)

This setting is crucial when you want to access Ollama from other computers on your network or when you need to run multiple Ollama instances on different ports. 

The default value of OLLAMA_HOST (127.0.0.1) allows connections only from your local machine. However, setting it to 0.0.0.0 allows connections from any network interface.  

This is useful in development environments when accessing API from different devices or running Ollama in a containerized environment.

OLLAMA_MODELS 

 export OLLAMA_MODELS=/path/to/models  

This setting is crucial when you want to store models in a different location other than the default one. 

It is useful when moving models to a larger drive from local drive, sharing them across different Ollama installations, and keeping them in a specific location for backup or compliance purposes.

Troubleshooting 

Here are the common issues and their solutions.

1. "Command not found" error:

   - Ensure Ollama is properly installed.

   - Verify if the PATH environment variable includes Ollama.

   - Restart the terminal.

2. Model download fails:

   - Check your internet connection.

   - Verify you have enough disk space.

   - Try running ollama pull mistral command.

3. High RAM usage:

   - Adjust the context size using the --context flag.

   - Close other resource-intensive applications.

   - Consider using a lighter model variant.

 Getting Help 

- Visit the official documentation: https://ollama.ai/docs

- Check the GitHub repository: https://github.com/ollama/ollama

- Join the Discord community for support

 Best Practices 

 Resource Management 

   - Monitor system resources while running models.

   - Close the model when not in use to free up memory.

   - Use appropriate context window sizes for your hardware.

 Security Considerations and Performance Optimization 

   - Keep Ollama updated to the latest version.

   - Use GPU acceleration if available.

   - Consider using quantized models for better performance.