Designing EShop Support with LLMs, Vector Databases, and Semantic Search
We’re going to explore how to design a cutting-edge customer support system for an e-commerce platform by leveraging Large Language Models (LLMs), Vector Databases, and Semantic Search.
Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB
This integration will enable us to deliver fast, accurate, and context-aware responses, transforming the customer support experience.
Why Vector Databases for EShop Support?
As businesses evolve, the expectations for customer support systems have increased significantly. Customers now demand instant, precise, and personalized responses to their inquiries. To meet these demands, we need to enhance our traditional microservices architecture by incorporating AI-powered technologies.
Traditional Microservices Architecture
Historically, microservices architectures have relied on cloud-native backing services to function efficiently:
- Databases: For structured data storage.
- Distributed Caches: To speed up data retrieval.
- Message Brokers: For asynchronous communication between services.
While these components are essential, they lack the intelligence required to understand and process complex customer queries.
AI-Powered Backing Services
To bridge this gap, we’re introducing LLMs and Vector Databases as new backing services:
- LLMs: Provide language understanding, contextual responses, and real-time AI capabilities.
- Vector Databases: Enable semantic search and similarity matching, crucial for retrieving relevant information quickly.
By integrating these AI-powered services, we empower our microservices to deliver intelligent and efficient workflows.
EShop Support Domain and Use Cases
Before diving into the architecture, let’s revisit the core functionalities of our EShop Support system.
Application Flow
- Customer Ticket Submission: Customers can open support tickets through our system.
- Support Agent Ticket Management:
- Ticket Listing: Agents view a list of open tickets.
- Ticket Details: Agents can click on a specific ticket to see detailed information.
- Q&A Chat with AI: Within the ticket details, agents engage in a Q&A chat powered by AI to assist with resolving the issue.
AI Integration Points
Ticket Summarization and Classification:
- We integrate AI into the ticket list page to automatically summarize and classify tickets.
- This enhances the agent’s ability to prioritize and address tickets effectively.
Retrieval-Augmented Generation (RAG):
- Agents use AI for a Q&A chat that retrieves relevant data from the company’s knowledge base.
- The AI generates accurate and context-aware responses, aiding agents in resolving customer issues swiftly.
Now, let’s design this system from a software architecture perspective.
Application Architecture Overview
Our goal is to embrace a modern microservices-based architecture that seamlessly integrates AI-powered features.
1. Microservices with API Gateway
- Client Applications: Web or mobile clients send requests to the system.
- API Gateway: Acts as the single entry point for all client requests.
- Routes requests to appropriate internal microservices.
- Ensures modularity and scalability.
- We can use the YARP API Gateway library for forwarding requests to internal microservices.
2. CustomerSupport Microservice
- Development Stack: Built with .NET 8.
- Implements the Vertical Slice Architecture for modularity.
- Follows the database-per-service pattern for data isolation.
- Database: Uses PostgreSQL for structured data storage.
3. Cloud-Native AI Backing Services
We introduce AI components as backing services to power our support system.
Ollama in Docker Containers
- Hosts different LLM models like Llama, Gemma, Mistral, Phi-3, etc.
- Provides both LLM functionality and embedding creation.
- Runs Llama 2 for generating context-aware responses.
- Uses all-MiniLM for embedding creation.
- Deployed within Docker containers for scalability and isolation.
Chroma Vector Database
- Stores high-dimensional embeddings.
- Performs semantic search and similarity matching.
- Manages chat history.
- Supports RAG workflows.
- Other vector databases like Weaviate, Pinecone, Qdrant, Milvus.
- PgVector extension if you prefer to stay within PostgreSQL.
4. The Glue Framework — AI Integration Components
To integrate the microservices with AI backing services, we use an AI integration framework.
Semantic Kernel
- Acts as a glue framework for interacting with LLMs.
- Connects the CustomerSupport Microservice to Ollama.
- Facilitates embedding generation.
- Manages prompts and responses.
Containerized Deployment
- All services, including LLMs, databases, and microservices, are containerized using Docker.
- They run within a unified Docker network, ensuring robust communication and scalability.
End-to-End Request Flow — Workflows Overview
Our architecture supports two primary workflows that interact with LLMs and Vector Databases:
1. Offline Workflow (Data Ingestion)
- Purpose: Prepares the system by populating the vector database with high-quality data.
- Outcome: Enables retrieval-augmented generation (RAG) by ensuring runtime responses are accurate and efficient.
2. Runtime Workflow (Query and Response)
- Purpose: Processes real-time queries to deliver context-aware responses to support agents.
- Outcome: Provides agents with the necessary information to assist customers effectively.
Now lets understand these workflows one by one.
Offline Workflow — Data Ingestion for RAG
Let’s delve into how we set up our system for real-time operations.
1. Data Collection
- Upload Documents:
- FAQs, manuals, and support guides are uploaded via an admin interface.
- Provides the foundational knowledge base for the system.
2. Embedding Creation
- Pre-processing:
- Documents are split into smaller chunks to enhance retrieval efficiency.
- Embedding Generation:
- Each chunk is converted into vector embeddings using the all-MiniLM model from Ollama.
- Embeddings capture the semantic meaning, making them ideal for similarity searches.
3. Storage in Vector Database
- Chroma Vector Database: Stores the generated embeddings.
- Optimized for fast retrieval and similarity comparisons.
- Indexing: The database indexes embeddings for efficient search operations during runtime.
Runtime Workflow — Query to Context-Aware Response
Now, let’s explore how the system processes support agent queries in real-time.
1. Frontend Interaction: Query Input
- Support Agent Action:
- The agent submits a query through the Q&A chatbox on the frontend.
2. Embedding Generation
- CustomerSupport Microservice:
- Receives the query from the frontend.
- Converts the query into an embedding using the all-MiniLM model in Ollama.
- Process: Utilizes the Semantic Kernel for seamless integration.
- Embedding represents the semantic essence of the query.
3. Semantic Search
- Chroma Vector Database: Compares the query embedding with stored document embeddings.
- Retrieves the most relevant document chunks.
- Outcome: Provides the necessary context to answer the query accurately.
4. Prompt Creation
- CustomerSupport Microservice: Combines the retrieved context with the original query.
- Forms a structured, context-aware prompt.
5. LLM Response Generation
- Ollama’s Llama 2 Model: Receives the prompt.
- Generates a detailed and contextually accurate response.
- Response Delivery: The response is returned to the frontend.
- Enables the support agent to provide effective assistance to the customer.
Technical Components — Semantic Search in Action
Let’s highlight the key components that enable semantic search and context-aware responses, along with alternative tools you might consider.
1. Chroma Vector Database
- Function: Manages embeddings.
- Performs fast similarity searches to retrieve relevant context.
- Alternatives: Milvus, Qdrant, Weaviate, Pinecone, PgVector for PostgreSQL
2. Semantic Kernel Integration
- Function: Acts as the glue between microservices and LLMs.
- Simplifies the integration process.
- Alternatives: LangChain, LlamaIndex, Spring AI for Java applications
3. Ollama’s LLM and Embedding Models
- Function: Powers natural language understanding (via embeddings).
- Handles response generation (via Llama 2).
- Alternatives: Mistral, Gemma, Phi-3
Conclusion — Fully Private and Open-Source Design
Our EShop Support architecture is thoughtfully designed with privacy, scalability, and cost-effectiveness in mind.
1. Fully Private Deployment
- Data Privacy: By using Ollama and Chroma, we avoid reliance on external APIs.
- Keeps sensitive customer data within our system.
- Enhances security and compliance with data protection regulations.
2. Open-Source Tools
- Cost-Effectiveness: Utilizing open-source technologies like Docker, Chroma, and Ollama reduces licensing costs.
- Adaptability: Open-source tools offer flexibility for customization and extension.
- Community Support: Benefit from community-driven improvements and updates.
3. Future-Ready Architecture
- Scalability: Microservices and containerization ensure the system can scale with demand.
- Modular AI Integration: Easy to update or swap out AI components as technologies evolve.
- Innovation: Positions EShop Support for long-term growth and the incorporation of future AI advancements.
This architecture demonstrates how modern enterprises can leverage LLMs and vector databases to create intelligent, scalable customer support systems. By combining retrieval-augmented generation (RAG) workflows, private LLMs, and vector databases, EShop Support sets a benchmark for AI-powered enterprise applications.
Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB
You’ll get hands-on experience designing a complete EShop Customer Support application, including LLM capabilities like Summarization, Q&A, Classification, Sentiment Analysis, Embedding Semantic Search, Code Generation by integrating LLM architectures into Enterprise applications.