RAG Architecture

The Knowledge-base Pipeline in the LikAI application represents a streamlined, one-time ingestion process that converts authoritative PDFs—encompassing Good Aquaculture Practices (GAqP) for shrimp and crab, general GAqP guidelines, and the Shrimp Industry Roadmap—into a semantically indexed repository for the AI chatbot's Retrieval-Augmented Generation (RAG) system.

This architecture ensures efficient transformation of static documents into a dynamic knowledge base, facilitating precise, context-aware biosecurity recommendations for shrimp farmers while promoting computational efficiency and data fidelity. The process unfolds sequentially, with each step building upon the previous to optimize retrieval accuracy and system performance. 

PDFs (GAqP Shrimp/Crab, GAqP General, Shrimp Roadmap):
- These documents serve as the foundational inputs, aggregating regulatory and strategic insights from entities such as the Bureau of Fisheries and Aquatic Resources. Comprising detailed protocols on sustainable aquaculture practices, disease prevention, and industry planning, they are sourced from a dedicated repository within the application. This initial aggregation establishes a credible knowledge corpus, essential for anchoring AI outputs in domain-specific expertise and regulatory compliance.
Load & Split into Chunks:
- The pipeline initiates by loading the PDFs into memory using a specialized ingestion mechanism, which extracts core textual content while filtering out non-essential elements like metadata or visuals. The extracted text is then divided into discrete chunks, each constrained to approximately 1000 characters, with intentional overlaps to preserve narrative continuity. This segmentation enhances manageability, preventing overload during subsequent processing and ensuring that comprehensive documents are rendered accessible for granular analysis.
Text Splitter (RecursiveCharacterTextSplitter, Chunk Size: 1000, Overlap: 200):
- Employing a recursive algorithm, this component hierarchically partitions the text, beginning with larger structural divisions and refining to sentence-level granularity. The defined chunk size maintains computational tractability, while the overlap safeguards against the severance of interdependent concepts, thereby upholding semantic coherence. This methodical approach is critical for adapting to varied document complexities, yielding a refined set of textual units optimized for embedding and retrieval.
Embed Chunks:
- Each partitioned chunk undergoes semantic encoding into vector representations, distilling its meaning into numerical formats that enable quantitative comparisons. This transformation leverages advanced embedding techniques to encapsulate contextual and thematic elements, facilitating the identification of conceptual similarities across the knowledge base. By converting qualitative text into quantitative vectors, this step lays the groundwork for efficient, meaning-based searches that transcend literal keyword matching.
Embeddings Model (BGE-M3 via Hugging Face):
- This model, integrated via an external inference platform, generates dense vectors adept at multilingual and technical content, producing outputs typically in 768 dimensions. It processes chunks iteratively, quantifying attributes such as terminology and intent to enable proximity-based affinities— for instance, aligning vectors related to sanitation protocols. The selection of this model reflects a balance of precision and efficiency, crucial for handling aquaculture-specific jargon and ensuring robust downstream query performance.
Store Embeddings:
- The resultant vectors, linked to their source chunks, are archived in a vector database designed for similarity-driven indexing and local persistence. This storage mechanism supports swift retrieval operations and accommodates updates, such as incorporating revised industry roadmaps, while enabling offline functionality. It culminates the pipeline by creating a scalable, query-ready repository that minimizes redundant computations.

In essence, this pipeline orchestrates the metamorphosis of raw PDFs into an actionable knowledge foundation, empowering the LikAI chatbot to provide reliable, tailored guidance. Its design prioritizes efficiency and adaptability, with built-in mechanisms for error handling, positioning it as a cornerstone for advancing aquaculture resilience through informed AI assistance.

This architecture shows how the AI chatbot in LikAI works, from when a user asks a question to when they see the answer. It's designed to give shrimp farmers helpful, personalized advice on biosecurity using a system called Retrieval-Augmented Generation (RAG). The chatbot pulls knowledge from documents like GAqP guidelines and the Shrimp Industry Roadmap, combines it with the farmer's own farm data (like farm size or location), and generates a smart response. Everything runs through a backend API, and it's built to be secure and efficient. Explained below is the step by step in simple terms, following the flow in the diagram. 

User Prompt + Farmer's Farm Data: This is where it starts. The user types a question, like "How do I prevent diseases in my shrimp pond?" They might also include details from their farm, such as "My farm is 2 hectares in the Philippines." This info comes from the app's frontend (like a chat window) and gets sent to the server. 

API Route /api/chat-coach: The question and farm data are packaged and sent to a special server endpoint called /api/chat-coach. This is like a door in the app's backend that handles all chat requests. It uses Next.js to receive the input securely. 

API Handler: Once the request arrives, the API handler takes over. It's the main controller that processes everything. It checks the input and starts the steps to build a response.  Sanitize Input: First, the system cleans up the user's question to make sure it's safe. This removes any weird or harmful text that could cause problems, like code injections. It's a security step to protect the app. 

Embed Query BGE-M3: Next, the cleaned question is turned into a mathematical representation called an embedding. This uses a tool called BGE-M3, which is good at understanding text in different languages. The embedding is like a digital fingerprint of the question, making it easy to compare with stored knowledge. 

Retrieve Top-K Chunks: Using the embedding, the system searches a vector database (a special storage for these fingerprints) to find the most relevant pieces of information, or "chunks," from the knowledge documents. It picks the top few (like top 5) that match the question best. These chunks are small sections from PDFs about biosecurity rules and shrimp farming guides. If nothing matches well, it might use a backup plan. 

Augment with System Prompt: The retrieved chunks are combined with a preset system prompt. This prompt is like instructions for the AI: "You are a biosecurity coach for shrimp farmers. Use only this context to answer, be helpful, and cite sources. Here's the context from documents, the farmer's farm data, and the question." This creates... 

Augmented Prompt: ...an "augmented prompt," which is the full package: the system's instructions, the matching chunks, the farmer's data for personalization (e.g., "For your 2-hectare farm..."), and the original question. This makes the AI's input more focused and relevant. 

Generate: The augmented prompt is sent to the AI model, which is Ollama running Llama 3.2 (a quantized version for speed). Ollama is a local tool that runs the model on your computer or server. The model thinks about the prompt and creates a response, like step-by-step advice on preventing diseases, formatted nicely with lists or bold text. 

Response JSON: The generated answer is packaged into a JSON format (a structured way to send data). It includes the main response text (in easy-to-read Markdown style), sources (like which document the info came from), and other details like status or timestamp. This makes it simple for the app to handle. 

Chatbot UI Display: Finally, the JSON response is sent back to the app's frontend. The chat interface shows the answer to the user, with any formatting (like bold steps or citations) displayed nicely. The user sees personalized advice right in the chatbot window. 

Overall, this setup ensures the chatbot gives accurate, farm-specific answers without making things up—it always bases responses on real document chunks and user data. It's efficient for local use but can scale to a server later. If something goes wrong, like no good chunks found, it has fallbacks to keep things running smoothly.

PreviousDetailed Architecture NextUser Flow Diagram

Last updated 10 days ago