Using Local LLMs via Ollama

Overview

This document explains how the AI Assist custom widget integrates with Locally deployed Large Language Models (LLMs) using Ollama. It also highlights the advantages of using Ollama as an on-premises AI solution compared to cloud providers (like OpenAI and Gemini).

For a detailed guide on how to install local LLMs with Ollama, click here.

Click here for step-by-step instructions to configure the widget with OpenAI and Gemini providers.

Provider Configuration

Ollama is configured in config.json with enabled flag set to true.

{

"name": "Ollama",

"enabled": "true"

}

Unlike other providers, Ollama doesn't require an API key since it runs locally.

How Ollama Integration works

Automatically detects available providers

The widget makes a GET request to http://localhost:11434/api/tags to check if Ollama is running.

Sample response

{

"models": [

{"name": "llama2", "modified_at": "2024-01-20T12:00:00Z"},

{"name": "mistral", "modified_at": "2024-01-20T12:00:00Z"}

]

}

In case of any issue (for example, a refused connection), the widget automatically switches to the next available provider (OpenAI/Gemini) and displays a warning message.

Select the provider from the Using drop-down in the widget to switch providers manually.

Communicates using API Endpoints

The widget communicates with Ollama through its HTTP API endpoints. For example, the endpoint for generating AI responses is /api/generate.

Sample Request

POST http://localhost:11434/api/generate

Content-Type: application/json

{

"model": "mistral",

"prompt": "Analyze this service request: User cannot access email",

"stream": false,

"options": {

"temperature": 0.7,

"max_tokens": 500

}

Integration Capabilities

- Text Analysis: Analyzes the request to identify key issues, priority levels, requester sentiment, and other analyses to assist technicians.

- Request Resolution Planning: Generates step-by-step action plans to resolve requests based on the request description and context.

- General AI Queries: Allows users to ask free-form questions about any aspect of a request.

- Context-aware Responses: Refers to relevant request details and their histories to provide contextual responses.

Advantages of Ollama as an On-Premise Solution

1. Data Privacy & Security

- All data stays within your infrastructure: Sensitive data will not be compromised beyond the organization's network.

- No data sharing with external cloud providers: Eliminates the risk of data exposure to third-party cloud services.

- Complete control over data handling and retention: Define and enforce your own data retention and handling policies.

- Compliance with data protection regulations: Easier compliance with GDPR and HIPAA by keeping data in-house.

2. Cost Benefits

- No per-token or per-request pricing: Provides fixed infrastructure cost instead of usage-based billing.

- Predictable infrastructure costs: Gives clear understanding of expenses based on your hardware investment.

- Unlimited API calls within your hardware capabilities: Helps you make unlimited API requests till your hardware can handle them, without additional charges.

3. Customization & Control

- Full control over model selection and versions: Switch between any compatible open-source models.

The widget currently uses the fully open-source mistral model.

- Customizable response parameters: Adjust tokens and other parameters for optimal outputs.

- No vendor lock-in: Freedom to switch between different models and architectures.

4. Network Performance

- No internet dependency for inference: Allows you to continue operations even during internet outages.

- Reliable performance without API rate limits: Consistent response times without external API throttling.

- Works in air-gapped environments: Ideal for high-security environments with no internet access.

Best Practices for Ollama Integration

Resource Management

- Monitor system resources: Track CPU, GPU, and memory usage to ensure optimal performance.

- Scale hardware based on usage patterns: Upgrade the infrastructure based on actual usage metrics.

Model Selection

- Choose appropriate models for your use case: Select models that best fit your specific requirements.

- Regular model updates and maintenance: Ensure that the AI models are regularly updated with the latest versions and security patches.