Solana was experiencing a slowdown in network activity which may impact demand for SOL.
SOL struggles to maintain above $160 after its recent bullish recovery.
Visualising AI project launched by Google DeepMind. From Unsplash image.
Exploring RAG techniques to improve retrieval accuracy
In Part 1, we dove into improving RAG (retrieval augmented generation) outcomes by re-writing queries before performing retrieval. This time we will learn about how re-ranking results from vector database retrievals helps performance.
While I highly recommend experimenting with promising proprietary options like Cohere’s Re-Rank 3, we’ll focus mainly on understanding what researchers have shared on the topic.
Re-Ranking, what’s the point?
First of all, why rerank at all? Results from vector databases return “similarity” scores based on the embeddings of the query and document. These scores can already be used to sort the results and since this is already a semantic similarity scoring of the document and query, why would we need another step?
There are a few reasons why we would take this approach:
Document embeddings are “lossy”. Documents are compressed in vector format before seeing the query, which means the document vector is not tailored to the query vector. Re-ranking allows us to better understand the document’s meaning specific to the query.
Two-stage systems have become standard in traditional search and recommender systems. They offer improvements in scalability, flexibility, and accuracy. Retrieval models are very fast, whereas ranking models are slow. By building a hybrid system, we can balance the speed and accuracy trade-offs between each stage.
Re-ranking allows us to reduce the number of documents we stuff into the context window which a) reduces costs and b) reduces the chances of relevant data being “lost in the haystack”.
Traditional Methods of Re-Ranking
Informational retrieval is not a new field. Before LLMs employed RAG to improve generation, search engines used re-ranking methods to improve search results. Two popular methodologies are TF-IDF (term frequency–inverse document frequency) and BM25 (Best Match 25).
Karen Spärck Jones conceived of the concept of IDF (of TF-IDF), inverse document frequency, as a statistical interpretation of term-specificity in the 1970s. The general concept is that the specificity of a term can be quantified as an inverse function of the number of documents in which it occurs. A toy example is the frequency of terms in Shakespearean plays. Because the term “Romeo” only appears in one play, we believe it is more informative to the subject of the play than the word “sweet” because that term occurs in all plays.
BM25 or Okapi BM25 was developed by both Karen Spärck Jones and Stephen Robertson as an improvement to TF-IDF. BM25 is a “bag-of-words” retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of their proximity within the document. This method expands on TF-IDF in a few important ways:
BM25 uses a saturation function where the importance of a term increases with frequency but with diminishing returns. (Side note: This was important for protecting accuracy when search engine optimization (SEO) became higher stakes. You can’t just spam your keyword with this improvement.)
BM25 includes document length normalization to ensure that longer documents are not unfairly advantaged. (Another improvement to thwart would-be SEO gamers.)
Both of these methods can be used to re-rank results from vector databases before documents are used in the context of generation. This would be called “feature” based re-ranking.
Neural Re-Ranking Models
Something you should notice about the traditional methods is that they focus on exact term matches. These methods will struggle when documents use semantically similar but different terms. Neural re-ranking methods like SBERT (Sentence Transformers) seek to overcome this limitation.
SBERT is a fine-tuned BERT (Bidirectional Encoder Representations from Transformers) model with a siamese / triplet network architecture which greatly improves the computation efficiency and latency for calculating sentence similarity. Transformers like SBERT (Sentence-BERT) use the context in which terms are used, allowing the model to handle synonyms and words with multiple meanings.
SBERT tends to perform better for semantic similarity ranking due to its specialization. However, using SBERT comes with the downside that you will need to manage the models locally versus calling an API, such as with OpenAI’s embedding models. Pick your poison wisely!
Cross-Encoder Re-Ranking
The top K results from a vector database search are the most similar document vectors compared to the query vector. Another way of describing this ranking method is to say it is a “bi-encoder” ranking. Vectors are calculated up front and approximate nearest neighbors algorithms (ANNs) select the most similar documents making this a highly efficient ranking method. But that efficiency comes at the expense of some accuracy.
In contrast, cross-encoders use a classification mechanism on data pairs to calculate similarity. This means you need a pair for each document and query. This approach can yield much more accurate results but it’s highly inefficient. That is why cross-encoders are best implemented through a hybrid approach where the number of documents is first pruned using a “bi-encoder” top K result before ranking with a cross-encoder. You can read more about using bi-encoders and cross-encoders together in the SBERT documentation.
Information Retrieval / Question Answering Retrieval Diagram explaining how to use Bi-Encoders and Cross-Encoders together from the SBERT documentation, full citation below.
@inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", }
Prompt-based Re-Ranking (PBR)
Until now, we have focused on using vectors or other numeric methods to rerank our RAG results. But does that mean we are underleveraging the LLM? Feeding the document and the query back to the LLM for scoring can be an effective way to score the document; there is approximately no information loss when you take this approach. If the LLM is prompted to return only a single token (the score), the latency incurred is often acceptable (although this is one of the more expensive approaches to scale). This is considered to be “zero-shot” re-ranking and research is still limited on this topic but we know it must be sensitive to the quality of the prompt.
Another version of (PBR) is the DSLR Framework (Document Refinement with Sentence-Level Re-ranking and Reconstruction). DSLR proposes an unsupervised method that decomposes retrieved documents into sentences, re-ranks them based on relevance, and reconstructs them into coherent passages before passing them to the LLM. This approach contrasts with traditional methods that rely on fixed-size passages, which may include redundant or irrelevant information. Pruning non-relevant sentences before generating a response can reduce hallucinations and improve overall accuracy. Below you can see an example of how DSLR refinement improves the LLMs response.
Sometimes the answer is not going to fit cleanly inside a single document chunk. Books and papers are written with the expectation that they’ll be read linearly or at least the reader will be able to easily refer back to earlier passages. For example, you could be asked to refer back to an earlier chapter on BM25 when reading about SBERT. In a basic RAG application, this would be impossible because your retrieved document would have no connections to the previous chapters.
G-RAG, an approach proposed by researchers at Google and UCLA, hopes to alleviate this issue. G-RAG is a re-ranker that leverages graph neural networks (GNNs) to consider connections between retrieved documents. Documents are represented as nodes and edges are shared concepts between documents. These graphs are generated as Abstract Meaning Representation (AMR) Graphs which can be created with tools like https://github.com/goodbai-nlp/AMRBART (MIT License).
Experiments with Natural Question (NQ) and TriviaQA (TQA) datasets showed this approach made improvements to Mean Tied Reciprocal Ranking (MTRR) and Tied Mean Hits@10 (TMHits@10) over other state-of-the-art approaches.
From UCLA and Google researchers: G-RAG uses two graphs for re-ranking documents: The Abstract Meaning Representation (AMR) graph is used as feature for the document-level graph. Document graph is then used for document reranking
Conclusion
I hope you’ve enjoyed this overview of techniques you can use to improve the performance of your RAG applications. I look forward to continued advancements in this field. I know there will be many considering the blistering pace of research at the moment.
Let me know in the comments section if you have favorite re-ranking methods not covered in this article.
In this post, we dive into the architecture and implementation details of GenASL, which uses AWS generative AI capabilities to create human-like ASL avatar videos. GenASL is a solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language.
A group of six entrepreneurs and angel investors have announced today the launch of their first fund, aiming to support Dutch early-stage tech startups. Named the Dutch Operator Fund I, it has raised €2mn in capital. The amount is provided by the team itself, as well as a loan from the Seed Business Angel scheme of the Netherlands Enterprise Agency (RVO). Investments will range between €75K and €200K, with the possibility of follow-up funding. The fund targets startups in the (pre-)seed phase. “There is a need for this in our country, because for many institutional investors, investing in this stage…
Apple has confirmed its “Glowtime” special event will take place on September 9. Here’s what to expect to see from the event alongside the iPhone 16.
Apple CEO Tim Cook surrounded by potential new device launches
As regular as clockwork, Apple has declared it will be holding a special event on September 9. The event will introduce the iPhone 16, and will also feature other upcoming products in Apple’s catalog.
Protect more than just your front door with the MyQ smart garage security camera and know who’s really coming and going at all times. Plus, it’s on sale today!
Apple just sent out invites for the iPhone 16 event – ‘It’s Glowtime.’ is set for September 9
Originally appeared here:
Apple just sent out invites for the iPhone 16 event – ‘It’s Glowtime.’ is set for September 9
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.