Advanced RAG
Andrew Ng and Team offer a variety of short courses at Deeplearning.AI , one of which is focused on ways to improve the semantic search component of RAG. The default approach is to search for semantically similar vectors to the user query within a vector database. The course discusses the limitations of this approach and explores several sophisticated techniques to enhance the semantic search of a query within a vector database.
The following are some of the main points mentioned in the course:
- Text Splitting Based on Embedding Model
- It is important to split the text based on the embedding model used in the application.
- If the embedding model has a smaller context window than the chunk size, there will be a loss of information.
- Hence, it is better to apply a further split on chunks so that the resultant chunks have a context window suitable for the embedding model.
- RAG Prompts
- RAG prompts show that an LLM can reply based only on the information supplied, eliminating the scope for hallucination.
- This uses the LLM’s capability to generate text based on the specific context provided.
- Visualization of Embedding Space
- Visualize the projection of the embedding space and overlay the projection of the query and responses.
- This provides insight into the various neighbors picked up for the query embedding.
- Semantic Search Challenges
- Simple semantic search has limitations because the embedding space lacks knowledge of the specific query.
- There is a scope for noisy answers, and unrelated queries can still yield results.
- Distractor-Free RAGs
- RAGs must be made distractor-free, meaning results should not contain irrelevant content.
- Query Expansion
- Query expansion with a generated query: Ask the LLM to generate a hypothetical answer and use this to retrieve similar chunks.
- Query expansion with multiple queries: Generate new queries related to the original and use them for retrieval.
- Cross-Encoder Reranking
- Re-ranking helps rank results in the context of the query.
- Rerank results from the vector database, focusing on the top results and diversity.
- Bi-encoders and cross-encoders are used for processing and classification.
- Combining Techniques
- Combine Query Expansion and Cross-encoder Reranking to enhance semantic search results.
- Embedding Adaptors
- Alter the query embedding to improve results.
- Use user feedback to train models and make annotated datasets.
- Use a lightweight model to transform query embeddings into more relevant ones.
- Utilize an LLM for data labeling.
- Other Techniques
- Fine-tune the embedding model.
- RA-DIT: Retrieval-Augmented Dual Instruction Tuning
- InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
- Implement deep embedding adaptors, deep relevance modeling, and deep chunking.