Understand the Basics

What are Retrievers in LlamaIndex?

Retrievers play a critical role in the LlamaIndex ecosystem. They are tasked with fetching the most relevant context for a given user query or message. This process involves:

  • Efficiently retrieving relevant context from an index based on the query.
  • Being a crucial component in query engines and chat engines for delivering pertinent information.
  • The possibility of building atop indexes or being defined independently underscores their versatility.

Pathway Retriever and its Integration with LlamaIndex

While so far we've used Pathway's LLM app, you might know that Pathway stands out as an open data processing framework, ideal for developing data transformation pipelines and machine learning applications that deal with live and evolving data sources. Interestingly it's the world's fastest framework for stream data processing. (ArXiV paper) 😉

Now, the integration with LlamaIndex is facilitated through the PathwayReader and PathwayRetriever. Here our focus is on PathwayRetriever which taps into Pathway's dynamic indexing capabilities to provide always up-to-date answers. This linked documentation here is also quite comprehensive but let us give you a quick walkthrough once.

Key Features of the Integration:

  • Live Data Indexing Pipeline: Monitors various data sources for changes, parses and embeds documents using LLaMAIndex methods, and builds a vector index.
  • Simple to Complex Pipelines: While the basic pipeline focuses on indexing files from cloud storage, Pathway supports more sophisticated operations like SQL-like operations, time-based grouping, and a wide range of connectors for comprehensive data pipeline construction.
  • Ease of Setup: The integration process involves installing necessary packages, setting up environment variables, and configuring data sources to be tracked by Pathway.