Embeddings in R2R

Introduction

R2R supports multiple Embedding providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various Embedding providers within the R2R framework. For a quick start on basic configuration, including embedding setup, please refer to our configuration guide.

Providers

R2R currently supports the following cloud embedding providers:

OpenAI
Azure
Cohere
HuggingFace
Bedrock (Amazon)
Vertex AI (Google)
Voyage AI

And for local inference:

Ollama
SentenceTransformers

Configuration

Update the embedding section in your r2r.toml file to configure your embedding provider. Here are some example configurations:

LiteLLM (Default)

The default R2R configuration uses LiteLLM to communicate with OpenAI:

[embedding]
provider = "litellm"
base_model = "text-embedding-3-small"
base_dimension = 512
batch_size = 128
add_title_as_prefix = false
rerank_model = "None"
concurrent_request_limit = 256

[embedding.text_splitter]
type = "recursive_character"
chunk_size = 512
chunk_overlap = 20

OpenAI

Here is how to configure R2R to communicate with OpenAI via their client:

[embedding]
provider = "openai"
base_model = "text-embedding-3-small"
base_dimension = 512
batch_size = 128
add_title_as_prefix = false
rerank_model = "None"
concurrent_request_limit = 256

[embedding.text_splitter]
type = "recursive_character"
chunk_size = 512
chunk_overlap = 20

Ollama (Local)

For local embedding generation using Ollama:

[embedding]
provider = "ollama"
base_model = "mxbai-embed-large"
base_dimension = 1024
batch_size = 32
concurrent_request_limit = 32

[embedding.text_splitter]
type = "recursive_character"
chunk_size = 512
chunk_overlap = 20

Selecting Different Embedding Providers

R2R supports a wide range of embedding providers through LiteLLM. Here’s how to configure and use them:

export OPENAI_API_KEY=your_openai_key
# Update r2r.toml:
# "provider": "litellm" | "openai"
# "base_model": "text-embedding-3-small"
# "base_dimension": 512
r2r serve --config-path=r2r.toml

Supported models include:

text-embedding-3-small
text-embedding-3-small
text-embedding-ada-002

export AZURE_API_KEY=your_azure_api_key
export AZURE_API_BASE=your_azure_api_base
export AZURE_API_VERSION=your_azure_api_version
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "azure/<your deployment name>"
# "base_dimension": XXX
r2r serve --config-path=r2r.toml

Supported models include:

text-embedding-ada-002

export COHERE_API_KEY=your_cohere_api_key
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "embed-english-v3.0"
# "base_dimension": 1024
r2r serve --config-path=r2r.toml

Supported models include:

embed-english-v3.0
embed-english-light-v3.0
embed-multilingual-v3.0
embed-multilingual-light-v3.0

r2r.toml

[embedding]
provider = "ollama"
base_model = "mxbai-embed-large"
base_dimension = 1024
batch_size = 32
concurrent_request_limit = 32

Deploy your R2R server with:

r2r serve --config-path=r2r.toml

export HUGGINGFACE_API_KEY=your_huggingface_api_key
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "huggingface/microsoft/codebert-base"
# "base_dimension": 768
r2r serve --config-path=r2r.toml

LiteLLM supports all Feature-Extraction Embedding models on HuggingFace.

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION_NAME=your_region_name
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "amazon.titan-embed-text-v1"
# "base_dimension": 1024
r2r serve --config-path=r2r.toml

Supported models include:

amazon.titan-embed-text-v1
cohere.embed-english-v3
cohere.embed-multilingual-v3

export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
export VERTEX_PROJECT=your_project_id
export VERTEX_LOCATION=your_project_location
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "vertex_ai/textembedding-gecko"
# "base_dimension": 768
r2r serve --config-path=r2r.toml

Supported models include:

textembedding-gecko
textembedding-gecko-multilingual

export VOYAGE_API_KEY=your_voyage_api_key
# Update r2r.toml:
# "provider": "litellm"
# "base_model": "voyage/voyage-01"
# "base_dimension": 1024
r2r serve --config-path=r2r.toml

Supported models include:

voyage-01
voyage-lite-01
voyage-lite-01-instruct

Embedding Service Endpoints

The EmbeddingProvider is responsible for core functionalities in these R2R endpoints:

update_files: When updating existing files in the system
ingest_files: During the ingestion of new files
search: For embedding search queries
rag: As part of the Retrieval-Augmented Generation process

Here’s how you can use these endpoints with embeddings:

File Ingestion

from r2r import R2R

app = R2R()

# Ingest a file, which will use the configured embedding model
response = app.ingest_files(["path/to/your/file.txt"])
print(f"Ingestion response: {response}")

Search

# Perform a search, which will embed the query using the configured model
search_results = app.search("Your search query here")
print(f"Search results: {search_results}")

RAG (Retrieval-Augmented Generation)

# Use RAG, which involves embedding for retrieval
rag_response = app.rag("Your question or prompt here")
print(f"RAG response: {rag_response}")

Updating Files

# Update existing files, which may involve re-embedding
update_response = app.update_files(["path/to/updated/file.txt"])
print(f"Update response: {update_response}")

Remember that you don’t directly call the embedding methods in your application code. R2R handles the embedding process internally based on your configuration.

Security Best Practices

API Key Management: Use environment variables or secure key management solutions for API keys.
Input Validation: Sanitize and validate all inputs before generating embeddings.
Rate Limiting: Implement rate limiting to prevent abuse of embedding endpoints.
Monitoring: Regularly monitor embedding usage for anomalies or misuse.

Custom Embedding Providers in R2R

You can create custom embedding providers by inheriting from the EmbeddingProvider class and implementing the required methods. This allows you to integrate any embedding model or service into R2R.

Embedding Provider Structure

The Embedding system in R2R is built on two main components:

EmbeddingConfig: A configuration class for Embedding providers.
EmbeddingProvider: An abstract base class that defines the interface for all Embedding providers.

EmbeddingConfig

The EmbeddingConfig class is used to configure Embedding providers:

from r2r.base import ProviderConfig
from typing import Optional

class EmbeddingConfig(ProviderConfig):
    provider: Optional[str] = None
    base_model: Optional[str] = None
    base_dimension: Optional[int] = None
    rerank_model: Optional[str] = None
    rerank_dimension: Optional[int] = None
    rerank_transformer_type: Optional[str] = None
    batch_size: int = 1
    prefixes: Optional[dict[str, str]] = None

    def validate(self) -> None:
        if self.provider not in self.supported_providers:
            raise ValueError(f"Provider '{self.provider}' is not supported.")

    @property
    def supported_providers(self) -> list[str]:
        return [None, "openai", "ollama", "sentence-transformers"]

EmbeddingProvider

The EmbeddingProvider is an abstract base class that defines the common interface for all Embedding providers:

from abc import abstractmethod
from enum import Enum
from r2r.base import Provider
from r2r.abstractions.embedding import EmbeddingPurpose
from r2r.abstractions.search import VectorSearchResult

class EmbeddingProvider(Provider):
    class PipeStage(Enum):
        BASE = 1
        RERANK = 2

    def __init__(self, config: EmbeddingConfig):
        if not isinstance(config, EmbeddingConfig):
            raise ValueError("EmbeddingProvider must be initialized with a `EmbeddingConfig`.")
        super().__init__(config)

    @abstractmethod
    def get_embedding(
        self,
        text: str,
        stage: PipeStage = PipeStage.BASE,
        purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
    ):
        pass

    @abstractmethod
    def get_embeddings(
        self,
        texts: list[str],
        stage: PipeStage = PipeStage.BASE,
        purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
    ):
        pass

    @abstractmethod
    def rerank(
        self,
        query: str,
        results: list[VectorSearchResult],
        stage: PipeStage = PipeStage.RERANK,
        limit: int = 10,
    ):
        pass

    @abstractmethod
    def tokenize_string(
        self, text: str, model: str, stage: PipeStage
    ) -> list[int]:
        pass

    def set_prefixes(self, config_prefixes: dict[str, str], base_model: str):
        # Implementation of prefix setting
        pass

Creating a Custom Embedding Provider

To create a custom Embedding provider, follow these steps:

Create a new class that inherits from EmbeddingProvider.
Implement the required methods: get_embedding, get_embeddings, rerank, and tokenize_string.
(Optional) Implement async versions of methods if needed.
(Optional) Add any additional methods or attributes specific to your provider.

Here’s an example of a custom Embedding provider:

import numpy as np
from r2r.base import EmbeddingProvider, EmbeddingConfig
from r2r.abstractions.embedding import EmbeddingPurpose
from r2r.abstractions.search import VectorSearchResult

class CustomEmbeddingProvider(EmbeddingProvider):
    def __init__(self, config: EmbeddingConfig):
        super().__init__(config)
        # Initialize any custom attributes or models here
        self.model = self._load_custom_model(config.base_model)

    def _load_custom_model(self, model_name):
        # Load your custom embedding model here
        pass

    def get_embedding(
        self,
        text: str,
        stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
        purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
    ) -> list[float]:
        # Apply prefix if available
        if purpose in self.prefixes:
            text = f"{self.prefixes[purpose]}{text}"

        # Generate embedding using your custom model
        embedding = self.model.encode(text)
        return embedding.tolist()

    def get_embeddings(
        self,
        texts: list[str],
        stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
        purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
    ) -> list[list[float]]:
        # Apply prefixes if available
        if purpose in self.prefixes:
            texts = [f"{self.prefixes[purpose]}{text}" for text in texts]

        # Generate embeddings in batches
        all_embeddings = []
        for i in range(0, len(texts), self.config.batch_size):
            batch = texts[i:i+self.config.batch_size]
            batch_embeddings = self.model.encode(batch)
            all_embeddings.extend(batch_embeddings.tolist())
        return all_embeddings

    def rerank(
        self,
        query: str,
        results: list[VectorSearchResult],
        stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.RERANK,
        limit: int = 10,
    ) -> list[VectorSearchResult]:
        if not self.config.rerank_model:
            return results[:limit]

        # Implement custom reranking logic here
        # This is a simple example using dot product similarity
        query_embedding = self.get_embedding(query, stage, EmbeddingPurpose.QUERY)
        for result in results:
            result.score = np.dot(query_embedding, result.embedding)

        reranked_results = sorted(results, key=lambda x: x.score, reverse=True)
        return reranked_results[:limit]

    def tokenize_string(
        self, text: str, model: str, stage: EmbeddingProvider.PipeStage
    ) -> list[int]:
        # Implement custom tokenization logic
        # This is a simple example using basic string splitting
        return [ord(char) for word in text.split() for char in word]

    # Optionally implement async versions of methods
    async def async_get_embedding(self, text: str, stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
        # Implement async version if needed
        return self.get_embedding(text, stage, purpose)

    async def async_get_embeddings(self, texts: list[str], stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
        # Implement async version if needed
        return self.get_embeddings(texts, stage, purpose)

Registering and Using the Custom Provider

To use your custom Embedding provider in R2R:

Update the EmbeddingConfig class to include your custom provider:

class EmbeddingConfig(ProviderConfig):
    # ...existing code...

    @property
    def supported_providers(self) -> list[str]:
        return [None, "openai", "ollama", "sentence-transformers", "custom"]  # Add your custom provider here

Update your R2R configuration to use the custom provider:

[embedding]
provider = "custom"
base_model = "your-custom-model"
base_dimension = 768
batch_size = 32

[embedding.prefixes]
index = "Represent this document for retrieval: "
query = "Represent this query for retrieving relevant documents: "

In your R2R application, register the custom provider:

from r2r import R2R
from r2r.base import EmbeddingConfig
from your_module import CustomEmbeddingProvider

def get_embedding_provider(config: EmbeddingConfig):
    if config.provider == "custom":
        return CustomEmbeddingProvider(config)
    # ... handle other providers ...

r2r = R2R(embedding_provider_factory=get_embedding_provider)

Now you can use your custom Embedding provider seamlessly within your R2R application:

# Ingest documents (embeddings will be generated using your custom provider)
r2r.ingest_files(["path/to/document.txt"])

# Perform a search
results = r2r.search("Your search query")

# Use RAG
rag_response = r2r.rag("Your question here")

By following this structure, you can integrate any embedding model or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed. This approach allows for great flexibility in choosing or implementing embedding solutions that best fit your specific use case.### Embedding Prefixes

Embedding Prefixes

R2R supports embedding prefixes to enhance embedding quality for different purposes:

Index Prefixes: Applied to documents during indexing.
Query Prefixes: Applied to search queries.

Configure prefixes in your r2r.toml or when initializing the EmbeddingConfig.

Troubleshooting

Common issues and solutions:

API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
Dimension Mismatch: Verify that the base_dimension in your config matches the actual output of the chosen model.
Out of Memory Errors: Adjust the batch size or choose a smaller model if encountering memory issues with local models.

Performance Considerations

Batching: Use batching for multiple, similar requests to improve throughput.
Model Selection: Balance between model capability and inference speed based on your use case.
Caching: Implement caching strategies to avoid re-embedding identical text.

Conclusion

R2R’s Embedding system provides a flexible and powerful foundation for integrating various embedding models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage embeddings to enhance your R2R-based projects. For an advanced example of implementing reranking in R2R.

Getting Started

Setup

Deployment

Deep Dives

Embeddings

Embeddings in R2R

Introduction

Providers

Configuration

Selecting Different Embedding Providers

Embedding Service Endpoints

File Ingestion

Search

RAG (Retrieval-Augmented Generation)

Updating Files

Security Best Practices

Custom Embedding Providers in R2R

Embedding Provider Structure

EmbeddingConfig

EmbeddingProvider

Creating a Custom Embedding Provider

Registering and Using the Custom Provider

Embedding Prefixes

Troubleshooting

Performance Considerations

Conclusion

Getting Started

Setup

Deployment

Deep Dives

​Embeddings in R2R

​Introduction

​Providers

​Configuration

​Selecting Different Embedding Providers

​Embedding Service Endpoints

​File Ingestion

​Search

​RAG (Retrieval-Augmented Generation)

​Updating Files

​Security Best Practices

​Custom Embedding Providers in R2R

​Embedding Provider Structure

​EmbeddingConfig

​EmbeddingProvider

​Creating a Custom Embedding Provider

​Registering and Using the Custom Provider

​Embedding Prefixes

​Troubleshooting

​Performance Considerations

​Conclusion

Embeddings in R2R

Introduction

Providers

Configuration

Selecting Different Embedding Providers

Embedding Service Endpoints

File Ingestion

Search

RAG (Retrieval-Augmented Generation)

Updating Files

Security Best Practices

Custom Embedding Providers in R2R

Embedding Provider Structure

EmbeddingConfig

EmbeddingProvider

Creating a Custom Embedding Provider

Registering and Using the Custom Provider

Embedding Prefixes

Troubleshooting

Performance Considerations

Conclusion