This document explains how the AI Assist custom widget integrates with Locally deployed Large Language Models (LLMs) using Ollama. It also highlights the advantages of using Ollama as an on-premises AI solution compared to cloud providers (like OpenAI and Gemini).
For a detailed guide on how to install local LLMs with Ollama, click here.
Click here for step-by-step instructions to configure the widget with OpenAI and Gemini providers.
Ollama is configured in config.json with enabled flag set to true.
Select the provider from the Using drop-down in the widget to switch providers manually.

The widget communicates with Ollama through its HTTP API endpoints. For example, the endpoint for generating AI responses is /api/generate.
- Text Analysis: Analyzes the request to identify key issues, priority levels, requester sentiment, and other analyses to assist technicians.
- Request Resolution Planning: Generates step-by-step action plans to resolve requests based on the request description and context.
- General AI Queries: Allows users to ask free-form questions about any aspect of a request.
- Context-aware Responses: Refers to relevant request details and their histories to provide contextual responses.
- All data stays within your infrastructure: Sensitive data will not be compromised beyond the organization's network.
- No data sharing with external cloud providers: Eliminates the risk of data exposure to third-party cloud services.
- Complete control over data handling and retention: Define and enforce your own data retention and handling policies.
- Compliance with data protection regulations: Easier compliance with GDPR and HIPAA by keeping data in-house.
- No per-token or per-request pricing: Provides fixed infrastructure cost instead of usage-based billing.
- Predictable infrastructure costs: Gives clear understanding of expenses based on your hardware investment.
- Unlimited API calls within your hardware capabilities: Helps you make unlimited API requests till your hardware can handle them, without additional charges.
- Full control over model selection and versions: Switch between any compatible open-source models.
- Customizable response parameters: Adjust tokens and other parameters for optimal outputs.
- No vendor lock-in: Freedom to switch between different models and architectures.
- No internet dependency for inference: Allows you to continue operations even during internet outages.
- Reliable performance without API rate limits: Consistent response times without external API throttling.
- Works in air-gapped environments: Ideal for high-security environments with no internet access.
- Monitor system resources: Track CPU, GPU, and memory usage to ensure optimal performance.
- Scale hardware based on usage patterns: Upgrade the infrastructure based on actual usage metrics.
- Choose appropriate models for your use case: Select models that best fit your specific requirements.
- Regular model updates and maintenance: Ensure that the AI models are regularly updated with the latest versions and security patches.