Skip to main content

Open-source tool for building and evaluating retrieval pipelines.

Project description

Retrieval Evaluation Pipelines alpha

RAG evaluation framework for faster iteration

About retri-eval

Evaluating all of the components of a RAG pipeline is challenging. We didn't find a great existing solution that was

  1. flexible enough to fit on top of our document and query processing.
  2. gave us confidence in scaling the database up without increasing latency or costs.
  3. encouraged reuse of components.

retri-eval aims to be unopinionated enough that you can reuse any existing pipelines you have.

Built With

  • MTEB
  • BEIR
  • Pydantic

Getting Started

Installation

pip install retri-eval

Define your data type

We use Pydantic to make sure that the index receives the expected data.

To use MTEB and BEIR datasets, retri-eval expects your data to provide a doc_id field. This is set inside of our retriever and is how BEIR evaluates your results.

Below, we create a QdrantDocument that specifically indexes text alongside the embedding.

class QdrantDocument(MTEBDocument):
    id: str
    doc_id: str
    embedding: List[float]
    text: str

Create a Document Processing Pipeline

A document processor encapsulates the logic to translate from raw data to our defined type.

class DocumentProcessor(ProcessingPipeline[Dict[str, str], QdrantDocument]):
    def __init__(self, model, name='', version=''):
        super().__init__(name, version)
        self.model = model

    def process(self, batch: List[Dict[str, str]], batch_size: int=0, **kwargs) -> List[QdrantDocument]:
        chunker = lambda x: [x]

        results = []
        for x in batch:
            doc = MTEBDocument(**x)

            chunks = chunker(doc.text)
            embedding = self.model.encode(chunks)
            for i, chunk in enumerate(chunks):
                results.append(QdrantDocument(
                    id=uuid.uuid4().hex,
                    doc_id=doc.doc_id,
                    text=chunk,
                    embedding=embedding[i],
                ))
        return results

Create a Query Processing Pipeline

Similar to document processing, we need a way to convert strings to something the index will understand.

For dense retrieval, we return embeddings from a model.

class QueryProcessor(ProcessingPipeline[str, List[float]]):
    def __init__(self, model, name = '', version = ''):
        super().__init__(name, version)
        self.model = model

    def process(self, batch: List[str], batch_size: int=0, **kwargs) -> List[List[float]]:
        return self.model.encode_queries(batch)

Define a Retriever

The Retriever class acts as our interface to processing. It defines our search behavior over the index. retri-eval defines a DenseRetriever for MTEB.

model_name ="BAAI/bge-small-en-v1.5"
model = FlagModel(model_name,
                  query_instruction_for_retrieval="Represent this sentence for searching relevant passages: ",
                  use_fp16=True)

index = QdrantIndex("CQADupstackEnglish", vector_config=VectorParams(size=384, distance=Distance.COSINE))
doc_processor = DocumentProcessor(model, name=model_name)
query_processor = QueryProcessor(model, name=model_name)

retriever = DenseRetriever(
    index=index,
    query_processor=query_processor,
    doc_processor=doc_processor,
)

Use our MTEB Tasks

MTEB makes it difficult to use our own search functionality, so we wrote our own MTEB Task and extended MTEB tasks to use it.

This lets us bring our own indexes and define custom searching behavior. We're hoping to upstream this in the future.

from retri-eval.evaluation.mteb_tasks import CQADupstackEnglishRetrieval

eval = MTEB(tasks=[CQADupstackEnglishRetrieval()])
results = eval.run(retriever, verbosity=2, overwrite_results=True, output_folder=f"results/{id}")

print(json.dumps(results, indent=1))

results:

{
 "CQADupstackEnglishRetrieval": {
  "mteb_version": "1.1.1",
  "dataset_revision": null,
  "mteb_dataset_name": "CQADupstackEnglishRetrieval",
  "test": {
   "ndcg_at_1": 0.37006,
   "ndcg_at_3": 0.39158,
   "ndcg_at_5": 0.4085,
   "ndcg_at_10": 0.42312,
   "ndcg_at_100": 0.46351,
   "ndcg_at_1000": 0.48629,
   "map_at_1": 0.29171,
   "map_at_3": 0.35044,
   "map_at_5": 0.36476,
   "map_at_10": 0.3735,
   "map_at_100": 0.38446,
   "map_at_1000": 0.38571,
   "recall_at_1": 0.29171,
   "recall_at_3": 0.40163,
   "recall_at_5": 0.44919,
   "recall_at_10": 0.49723,
   "recall_at_100": 0.67031,
   "recall_at_1000": 0.81938,
   "precision_at_1": 0.37006,
   "precision_at_3": 0.18535,
   "precision_at_5": 0.13121,
   "precision_at_10": 0.07694,
   "precision_at_100": 0.01252,
   "precision_at_1000": 0.00173,
   "mrr_at_1": 0.37006,
   "mrr_at_3": 0.41943,
   "mrr_at_5": 0.4314,
   "mrr_at_10": 0.43838,
   "mrr_at_100": 0.44447,
   "mrr_at_1000": 0.44497,
   "retrieval_latency_at_50": 0.07202814750780817,
   "retrieval_latency_at_95": 0.09553944145009152,
   "retrieval_latency_at_99": 0.20645513817435127,
   "evaluation_time": 538.25
  }
 }
}

Roadmap

retri-eval is still in active development. We're planning to add the following functionality:

  • Support reranking models
  • Add support for hybrid retrieval baselines
  • Support for automatic dataset generation
  • Support parallel execution
  • Add support for latency and cost benchmarks

What dataset to evaluate on

retri-eval is currently integrated into MTEB for retrieval tasks only, but we're working on more.

MTEB's available tasks

We also recommend building your own internal dataset, but this can be time consuming and potentially error prone. We'd love to chat if you're working on this.

License

Distributed under the AGPL-3 License. If you need an alternate license, please reach out.

Let's Chat!

Reach out! Our team has experience working on petabyte-scale search and analytics applications. We'd love to hear what you're working on and see how we can help.

Matt - matt [at] deployql.com - Or Schedule some time to chat on my calendar

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retri_evals-0.0.1.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retri_evals-0.0.1-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file retri_evals-0.0.1.tar.gz.

File metadata

  • Download URL: retri_evals-0.0.1.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.133.1-microsoft-standard-WSL2

File hashes

Hashes for retri_evals-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6bcb87d6d46196bbf43bb078336ac353081966d09dea60c700800c8321c282c3
MD5 7025b5e48597842f7f311ea963c386f4
BLAKE2b-256 992c3d6eb132c50eae96f37c139e847fead960345a2f5b27967fada2d9114b6c

See more details on using hashes here.

File details

Details for the file retri_evals-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: retri_evals-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.133.1-microsoft-standard-WSL2

File hashes

Hashes for retri_evals-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0ca486359b7894c00b2770d8a9040da021ffb8293d0e0621d56aee87c830d4ff
MD5 a7ef76044d01004e8c425b3972363592
BLAKE2b-256 e914cfefaffaafb5aa72aae725577844cf8ed2f15461891aff36bb068c8b1b83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page