Designing EShop Support with LLMs, Vector Databases, and Semantic Search

7 min readDec 4, 2024

We’re going to explore how to design a cutting-edge customer support system for an e-commerce platform by leveraging Large Language Models (LLMs), Vector Databases, and Semantic Search.

EShop Support with LLMs, Vector Databases, and Semantic Search

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

This integration will enable us to deliver fast, accurate, and context-aware responses, transforming the customer support experience.

Why Vector Databases for EShop Support?

As businesses evolve, the expectations for customer support systems have increased significantly. Customers now demand instant, precise, and personalized responses to their inquiries. To meet these demands, we need to enhance our traditional microservices architecture by incorporating AI-powered technologies.

Traditional Microservices Architecture

Historically, microservices architectures have relied on cloud-native backing services to function efficiently:

Databases: For structured data storage.
Distributed Caches: To speed up data retrieval.
Message Brokers: For asynchronous communication between services.

While these components are essential, they lack the intelligence required to understand and process complex customer queries.

AI-Powered Backing Services

To bridge this gap, we’re introducing LLMs and Vector Databases as new backing services:

LLMs: Provide language understanding, contextual responses, and real-time AI capabilities.
Vector Databases: Enable semantic search and similarity matching, crucial for retrieving relevant information quickly.

By integrating these AI-powered services, we empower our microservices to deliver intelligent and efficient workflows.

EShop Support Domain and Use Cases

Before diving into the architecture, let’s revisit the core functionalities of our EShop Support system.

Application Flow

Customer Ticket Submission: Customers can open support tickets through our system.
Support Agent Ticket Management:

Ticket Listing: Agents view a list of open tickets.
Ticket Details: Agents can click on a specific ticket to see detailed information.
Q&A Chat with AI: Within the ticket details, agents engage in a Q&A chat powered by AI to assist with resolving the issue.

AI Integration Points

Ticket Summarization and Classification:

We integrate AI into the ticket list page to automatically summarize and classify tickets.
This enhances the agent’s ability to prioritize and address tickets effectively.

Retrieval-Augmented Generation (RAG):

Agents use AI for a Q&A chat that retrieves relevant data from the company’s knowledge base.
The AI generates accurate and context-aware responses, aiding agents in resolving customer issues swiftly.

Now, let’s design this system from a software architecture perspective.

Application Architecture Overview

Our goal is to embrace a modern microservices-based architecture that seamlessly integrates AI-powered features.

1. Microservices with API Gateway

Client Applications: Web or mobile clients send requests to the system.
API Gateway: Acts as the single entry point for all client requests.
Routes requests to appropriate internal microservices.
Ensures modularity and scalability.
We can use the YARP API Gateway library for forwarding requests to internal microservices.

2. CustomerSupport Microservice

Development Stack: Built with .NET 8.
Implements the Vertical Slice Architecture for modularity.
Follows the database-per-service pattern for data isolation.
Database: Uses PostgreSQL for structured data storage.

3. Cloud-Native AI Backing Services

We introduce AI components as backing services to power our support system.

Ollama in Docker Containers

Hosts different LLM models like Llama, Gemma, Mistral, Phi-3, etc.
Provides both LLM functionality and embedding creation.
Runs Llama 2 for generating context-aware responses.
Uses all-MiniLM for embedding creation.
Deployed within Docker containers for scalability and isolation.

Chroma Vector Database

Stores high-dimensional embeddings.
Performs semantic search and similarity matching.
Manages chat history.
Supports RAG workflows.
Other vector databases like Weaviate, Pinecone, Qdrant, Milvus.
PgVector extension if you prefer to stay within PostgreSQL.

4. The Glue Framework — AI Integration Components

To integrate the microservices with AI backing services, we use an AI integration framework.

Semantic Kernel

Acts as a glue framework for interacting with LLMs.
Connects the CustomerSupport Microservice to Ollama.
Facilitates embedding generation.
Manages prompts and responses.

Containerized Deployment

All services, including LLMs, databases, and microservices, are containerized using Docker.
They run within a unified Docker network, ensuring robust communication and scalability.

End-to-End Request Flow — Workflows Overview

Our architecture supports two primary workflows that interact with LLMs and Vector Databases:

1. Offline Workflow (Data Ingestion)

Purpose: Prepares the system by populating the vector database with high-quality data.
Outcome: Enables retrieval-augmented generation (RAG) by ensuring runtime responses are accurate and efficient.

2. Runtime Workflow (Query and Response)

Purpose: Processes real-time queries to deliver context-aware responses to support agents.
Outcome: Provides agents with the necessary information to assist customers effectively.

Now lets understand these workflows one by one.

Offline Workflow — Data Ingestion for RAG

Let’s delve into how we set up our system for real-time operations.

1. Data Collection

Upload Documents:
FAQs, manuals, and support guides are uploaded via an admin interface.
Provides the foundational knowledge base for the system.

2. Embedding Creation

Pre-processing:
Documents are split into smaller chunks to enhance retrieval efficiency.
Embedding Generation:
Each chunk is converted into vector embeddings using the all-MiniLM model from Ollama.
Embeddings capture the semantic meaning, making them ideal for similarity searches.

3. Storage in Vector Database

Chroma Vector Database: Stores the generated embeddings.
Optimized for fast retrieval and similarity comparisons.
Indexing: The database indexes embeddings for efficient search operations during runtime.

Runtime Workflow — Query to Context-Aware Response

Now, let’s explore how the system processes support agent queries in real-time.

1. Frontend Interaction: Query Input

Support Agent Action:
The agent submits a query through the Q&A chatbox on the frontend.

2. Embedding Generation

CustomerSupport Microservice:
Receives the query from the frontend.
Converts the query into an embedding using the all-MiniLM model in Ollama.
Process: Utilizes the Semantic Kernel for seamless integration.
Embedding represents the semantic essence of the query.

3. Semantic Search

Chroma Vector Database: Compares the query embedding with stored document embeddings.
Retrieves the most relevant document chunks.
Outcome: Provides the necessary context to answer the query accurately.

4. Prompt Creation

CustomerSupport Microservice: Combines the retrieved context with the original query.
Forms a structured, context-aware prompt.

5. LLM Response Generation

Ollama’s Llama 2 Model: Receives the prompt.
Generates a detailed and contextually accurate response.
Response Delivery: The response is returned to the frontend.
Enables the support agent to provide effective assistance to the customer.

Technical Components — Semantic Search in Action

Let’s highlight the key components that enable semantic search and context-aware responses, along with alternative tools you might consider.

1. Chroma Vector Database

Function: Manages embeddings.
Performs fast similarity searches to retrieve relevant context.
Alternatives: Milvus, Qdrant, Weaviate, Pinecone, PgVector for PostgreSQL

2. Semantic Kernel Integration

Function: Acts as the glue between microservices and LLMs.
Simplifies the integration process.
Alternatives: LangChain, LlamaIndex, Spring AI for Java applications

3. Ollama’s LLM and Embedding Models

Function: Powers natural language understanding (via embeddings).
Handles response generation (via Llama 2).
Alternatives: Mistral, Gemma, Phi-3

Conclusion — Fully Private and Open-Source Design

Our EShop Support architecture is thoughtfully designed with privacy, scalability, and cost-effectiveness in mind.

1. Fully Private Deployment

Data Privacy: By using Ollama and Chroma, we avoid reliance on external APIs.
Keeps sensitive customer data within our system.
Enhances security and compliance with data protection regulations.

2. Open-Source Tools

Cost-Effectiveness: Utilizing open-source technologies like Docker, Chroma, and Ollama reduces licensing costs.
Adaptability: Open-source tools offer flexibility for customization and extension.
Community Support: Benefit from community-driven improvements and updates.

3. Future-Ready Architecture

Scalability: Microservices and containerization ensure the system can scale with demand.
Modular AI Integration: Easy to update or swap out AI components as technologies evolve.
Innovation: Positions EShop Support for long-term growth and the incorporation of future AI advancements.

This architecture demonstrates how modern enterprises can leverage LLMs and vector databases to create intelligent, scalable customer support systems. By combining retrieval-augmented generation (RAG) workflows, private LLMs, and vector databases, EShop Support sets a benchmark for AI-powered enterprise applications.

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

EShop Support App with AI-Powered LLM Capabilities

You’ll get hands-on experience designing a complete EShop Customer Support application, including LLM capabilities like Summarization, Q&A, Classification, Sentiment Analysis, Embedding Semantic Search, Code Generation by integrating LLM architectures into Enterprise applications.

Designing EShop Support with LLMs, Vector Databases, and Semantic Search

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

Why Vector Databases for EShop Support?

Traditional Microservices Architecture

AI-Powered Backing Services

EShop Support Domain and Use Cases

Application Flow

AI Integration Points

Ticket Summarization and Classification:

Retrieval-Augmented Generation (RAG):

Application Architecture Overview

1. Microservices with API Gateway

2. CustomerSupport Microservice

3. Cloud-Native AI Backing Services

Ollama in Docker Containers

Chroma Vector Database

4. The Glue Framework — AI Integration Components

Semantic Kernel

Containerized Deployment

End-to-End Request Flow — Workflows Overview

1. Offline Workflow (Data Ingestion)

2. Runtime Workflow (Query and Response)

Offline Workflow — Data Ingestion for RAG

1. Data Collection

2. Embedding Creation

3. Storage in Vector Database

Runtime Workflow — Query to Context-Aware Response

1. Frontend Interaction: Query Input

2. Embedding Generation

3. Semantic Search

4. Prompt Creation

5. LLM Response Generation

Technical Components — Semantic Search in Action

1. Chroma Vector Database

2. Semantic Kernel Integration

3. Ollama’s LLM and Embedding Models

Conclusion — Fully Private and Open-Source Design

1. Fully Private Deployment

2. Open-Source Tools

3. Future-Ready Architecture

Get Udemy Course with limited discounted coupon — Generative AI Architectures with LLM, Prompt, RAG, Fine-Tuning and Vector DB

Written by Mehmet Ozkaya

No responses yet