Andrew Ng and Team offer a variety of short courses at Deeplearning.AI , one of which is focused on ways to improve the semantic search component of RAG. The default approach is to search for semantically similar vectors to the user query within a vector database. The course discusses the limitations of this approach and explores several sophisticated techniques to enhance the semantic search of a query within a vector database.

The following are some of the main points mentioned in the course:

  • Text Splitting Based on Embedding Model
    • It is important to split the text based on the embedding model used in the application.
    • If the embedding model has a smaller context window than the chunk size, there will be a loss of information.
    • Hence, it is better to apply a further split on chunks so that the resultant chunks have a context window suitable for the embedding model.
  • RAG Prompts
    • RAG prompts show that an LLM can reply based only on the information supplied, eliminating the scope for hallucination.
    • This uses the LLM’s capability to generate text based on the specific context provided.
  • Visualization of Embedding Space
    • Visualize the projection of the embedding space and overlay the projection of the query and responses.
    • This provides insight into the various neighbors picked up for the query embedding.
  • Semantic Search Challenges
    • Simple semantic search has limitations because the embedding space lacks knowledge of the specific query.
    • There is a scope for noisy answers, and unrelated queries can still yield results.
  • Distractor-Free RAGs
    • RAGs must be made distractor-free, meaning results should not contain irrelevant content.
  • Query Expansion
    • Query expansion with a generated query: Ask the LLM to generate a hypothetical answer and use this to retrieve similar chunks.
    • Query expansion with multiple queries: Generate new queries related to the original and use them for retrieval.
  • Cross-Encoder Reranking
    • Re-ranking helps rank results in the context of the query.
    • Rerank results from the vector database, focusing on the top results and diversity.
    • Bi-encoders and cross-encoders are used for processing and classification.
  • Combining Techniques
    • Combine Query Expansion and Cross-encoder Reranking to enhance semantic search results.
  • Embedding Adaptors
    • Alter the query embedding to improve results.
    • Use user feedback to train models and make annotated datasets.
    • Use a lightweight model to transform query embeddings into more relevant ones.
    • Utilize an LLM for data labeling.
  • Other Techniques