Retrieval-Augmented Generation

Retrieval-Augmented Generation

Information Retrieval
Skip Lists
Deep Learning
May 1, 2024
This blog post is based on the video made by YouTuber Umar. It illustrates very well the big picture of how RAG works. Please feel free to also have a look at this video too.
Video preview

What is RAG?

notion image
RAG employs a transformer-based sequence-to-sequence model, combined with a document retrieval component to form its architecture. It retrieves relevant documents from a corpus and uses these documents to condition the generation of its output. This blend of retrieval and generation enables RAG to generate more factual and detailed responses.
notion image
RAG utilizes the BERT transformer model for its sequence-to-sequence component, and it uses the Dense Passage Retriever for document retrieval. The DPR retrieves documents based on their relevance to the input query, which is then used to condition the output generation.
The retrieved documents are then passed to the transformer model where they are concatenated with the input query for further processing. The model generates an output sequence based on this combined input, which can be more relevant and detailed than the output of a traditional sequence-to-sequence model.
In addition to the RAG architecture, another important concept is the SentenceBERT. SentenceBERT is a modification of the pre-trained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings. These embeddings can then be compared using cosine-similarity to determine semantic similarity.
notion image

More about SentenceBERT

A key component of SentenceBERT is the use of a Vector Database (Vector DB). This is used in combination with a K-Nearest Neighbors (K-NN) algorithm to find the sentences that are most similar to a given input. This is crucial for tasks such as semantic search, where the goal is to find the most relevant documents given a query. However, this approach has a main drawback of slow computation with time complexity , where N is the number of the embeddings, and D is the dimension of each embedding.
Another technique that could be used along with SentenceBERT is the Approximate Nearest Neighbors technique, specifically the Hierarchical Navigable Small Worlds (HNSW) method. This method is based on the small world phenomenon, where any two points are likely to be connected by a short path. The HNSW algorithm effectively navigates this small world to find the nearest neighbors to a given point.

HNSW instead of K-NN

The Hierarchical NSW (HNSW) is a combination of Skip-List and NSW. A skip list is a data structure that allows fast search within an ordered sequence of elements. It achieves this by maintaining a layered list of links, with each layer skipping over a reduced number of elements.
notion image
For example, consider the following sequence of elements: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
A skip list for this sequence might look like this:
Level 3: 1 -------------------------> 10
Level 2: 1 ----------> 5 ----------> 10
Level 1: 1 --> 3 --> 5 --> 7 --> 9 --> 10
In this example, the bottom level (Level 1) is an ordinary ordered linked list. The higher levels are express lanes, where each element in Level 2 skips over 2 elements in Level 1, and each element in Level 3 skips over 4 elements in Level 2.
This means that if we're searching for an element, we can start at the highest level and move right until we hit a number greater than our target. Then, we can move down a level and continue moving right. This process continues until we find the target or conclude that it is not in the list.
Below is a basic example of a skip list in Python.
class Node: def __init__(self, height = 0, elem = None): self.elem = elem = [None]*height class SkipList: def __init__(self): self.head = Node() self.len = 0 self.maxHeight = 0 def __len__(self): return self.len def find(self, elem, update = None): if update == None: update = self.updateList(elem) if len(update) > 0: item = update[0].next[0] if item != None and item.elem == elem: return item return None def contains(self, elem, update = None): return self.find(elem, update) != None def randomHeight(self): height = 1 while random.randint(0, 1) != 1: height += 1 return height def updateList(self, elem): update = [None]*self.maxHeight x = self.head for i in reversed(range(self.maxHeight)): while[i] != None and[i].elem < elem: x =[i] update[i] = x return update def insert(self, elem): _node = Node(self.randomHeight(), elem) self.maxHeight = max(self.maxHeight, len( while len( < len( update = self.updateList(elem) if self.find(elem, update) == None: for i in range(len([i] = update[i].next[i] update[i].next[i] = _node self.len += 1 def remove(self, elem): update = self.updateList(elem) x = self.find(elem, update) if x != None: for i in reversed(range(len( update[i].next[i] =[i] if[i] == None: self.maxHeight -= 1 self.len -= 1 def printList(self): for i in range(len(, -1, -1): x = self.head while[i] != None: print([i].elem,) x =[i] print('')
Overall, both Retrieval-Augmented Generation and SentenceBERT represent advanced techniques in the field of natural language processing, leveraging the power of transformer models and clever data structures to generate meaningful and relevant outputs.

RAG in practice

Furthermore, they also showcase how modern NLP techniques can be combined with traditional computer science concepts like skip lists and nearest neighbor algorithms to tackle complex problems in information retrieval and text generation.
The potential applications of these techniques are vast and varied. For instance, they can be used in automated chatbots to generate more contextually accurate and detailed responses. They can also be applied in recommendation systems, where the goal is to retrieve the most relevant content based on user's input or past behavior.
In addition, RAG can also enhance machine translation by utilizing a database of texts to improve accuracy, handle rare words, maintain consistent terminology, increase fluency, and capture cultural nuances. This approach leverages external information to provide contextually relevant translation examples, making translations more precise and context-aware.
Moreover, these models can be utilized in the field of academic research, where they can assist in literature review by retrieving the most relevant papers based on a given query. They can also be used in legal and healthcare industries for document retrieval and analysis, thus saving time and improving accuracy.
On the other hand, it is also important to consider the limitations and challenges associated with these techniques. The quality and relevance of the output from RAG and SentenceBERT models heavily depend on the training data. Therefore, the presence of biases in the training data can lead to biased outputs. Additionally, these models require substantial computational resources for training and inference, which might not be feasible for all applications.


Retrieval-Augmented Generation and SentenceBERT are powerful tools in the field of natural language processing. They demonstrate how the combination of different techniques and concepts can lead to sophisticated solutions, capable of tackling complex problems in information retrieval and text generation. However, it is crucial to consider the potential limitations and challenges, and to continue exploring ways to mitigate them.

Further Reading

  • Active Augmented Retrieval Generation
    • a method that combines generative capabilities with access to external knowledge by using retrieval techniques. It anticipates future content by retrieving relevant documents and regenerating sentences with low-confidence tokens.
  • VectorDB