How Retrieval-Augmented Generation (RAG) works

Tiya Vaj
1 min readApr 6, 2024

Sure, here’s a summary of the process of how Retrieval-Augmented Generation (RAG) works, broken down into bullet points:

1. Data Preparation:
— Gather text data from a suitable source like documents, books, or transcripts.
Use library to load and split the data into manageable chunks.

2. Vector Database Creation:
— Utilize OpenAI to generate vector embeddings for each chunk of text.
— Create a Chroma database using the vector embeddings as keys.

3. Querying for Relevant Data:
— Input a query text.
— Search the Chroma database to find the most relevant chunks of information related to the query.

4. Prompt Creation:
— Use the retrieved data chunks and the query to create a prompt template.
— Format the template with the relevant data and the query to generate a complete prompt.

5. Response Generation:
— Input the prompt into an LLM (Large Language Model) like OpenAI’s GPT.
— Receive a response generated by the model based on the prompt.

6. Source References:
— Extract metadata from the retrieved data chunks to provide references back to the original sources.

7. Output:
— Display the response along with the source references.

8. Evaluation:
— Evaluate the relevance and quality of the response based on the query and the retrieved data.

This process combines retrieval and generation techniques to provide informative and contextually relevant responses to user queries, leveraging both existing data and AI-generated content.



Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here