29. Retrieval-Augmented Generation

Learning objectives

RAG is an application pattern for LLMS.

It uses information retrieval systems to give LLMs extra context.

Allows the LLM to answer user queries not covered in its training data

RAG

Chunking: Turn your dataset into text documents and break it down into small pieces.
Embed documents: Turn each chunk into vectors representing their semantic meaning.
VectorDB: Store embeddings in a vector database.
Retrieval: Upon receiving a user query, retrieve chunks relevant to the user’s request.
Response Generation: Add chunks to the context and use the LLM.

How can we make retrieval robust to variability in user input?

Approaches:

Query expansion: decomposes the input into sub-questions.
Query re-writing: re-write user questions to improve retrieval.
Query compression: A chat conversation is compress into a final question for retrieval.

Where does the data live? what syntax is needed to query the data?

Query construction

In response synthesis.

In response evaluation.

In production RAG systems are a group of AI models, each playing its part in the workflow of data processing and response generation.

RAG systems