Embeddings
Embeddings in R2R
Introduction
R2R supports multiple Embedding providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various Embedding providers within the R2R framework.
For a quick start on basic configuration, including embedding setup, please refer to our configuration guide.
Providers
R2R currently supports the following cloud embedding providers:
- OpenAI
- Azure
- Cohere
- HuggingFace
- Bedrock (Amazon)
- Vertex AI (Google)
- Voyage AI
And for local inference:
- Ollama
- SentenceTransformers
Configuration
Update the embedding
section in your r2r.toml
file to configure your embedding provider. Here are some example configurations:
Selecting Different Embedding Providers
R2R supports a wide range of embedding providers through LiteLLM. Here’s how to configure and use them:
export OPENAI_API_KEY=your_openai_key
# Update r2r.toml:
# "provider": "litellm" | "openai"
# "base_model": "text-embedding-3-small"
# "base_dimension": 512
r2r serve --config-path=r2r.toml
Supported models include:
- text-embedding-3-small
- text-embedding-3-small
- text-embedding-ada-002
Embedding Service Endpoints
The EmbeddingProvider is responsible for core functionalities in these R2R endpoints:
update_files
: When updating existing files in the systemingest_files
: During the ingestion of new filessearch
: For embedding search queriesrag
: As part of the Retrieval-Augmented Generation process
Here’s how you can use these endpoints with embeddings:
File Ingestion
from r2r import R2R
app = R2R()
# Ingest a file, which will use the configured embedding model
response = app.ingest_files(["path/to/your/file.txt"])
print(f"Ingestion response: {response}")
Search
# Perform a search, which will embed the query using the configured model
search_results = app.search("Your search query here")
print(f"Search results: {search_results}")
RAG (Retrieval-Augmented Generation)
# Use RAG, which involves embedding for retrieval
rag_response = app.rag("Your question or prompt here")
print(f"RAG response: {rag_response}")
Updating Files
# Update existing files, which may involve re-embedding
update_response = app.update_files(["path/to/updated/file.txt"])
print(f"Update response: {update_response}")
Remember that you don’t directly call the embedding methods in your application code. R2R handles the embedding process internally based on your configuration.
Security Best Practices
- API Key Management: Use environment variables or secure key management solutions for API keys.
- Input Validation: Sanitize and validate all inputs before generating embeddings.
- Rate Limiting: Implement rate limiting to prevent abuse of embedding endpoints.
- Monitoring: Regularly monitor embedding usage for anomalies or misuse.
Custom Embedding Providers in R2R
You can create custom embedding providers by inheriting from the EmbeddingProvider
class and implementing the required methods. This allows you to integrate any embedding model or service into R2R.
Embedding Provider Structure
The Embedding system in R2R is built on two main components:
EmbeddingConfig
: A configuration class for Embedding providers.EmbeddingProvider
: An abstract base class that defines the interface for all Embedding providers.
EmbeddingConfig
The EmbeddingConfig
class is used to configure Embedding providers:
from r2r.base import ProviderConfig
from typing import Optional
class EmbeddingConfig(ProviderConfig):
provider: Optional[str] = None
base_model: Optional[str] = None
base_dimension: Optional[int] = None
rerank_model: Optional[str] = None
rerank_dimension: Optional[int] = None
rerank_transformer_type: Optional[str] = None
batch_size: int = 1
prefixes: Optional[dict[str, str]] = None
def validate(self) -> None:
if self.provider not in self.supported_providers:
raise ValueError(f"Provider '{self.provider}' is not supported.")
@property
def supported_providers(self) -> list[str]:
return [None, "openai", "ollama", "sentence-transformers"]
EmbeddingProvider
The EmbeddingProvider
is an abstract base class that defines the common interface for all Embedding providers:
from abc import abstractmethod
from enum import Enum
from r2r.base import Provider
from r2r.abstractions.embedding import EmbeddingPurpose
from r2r.abstractions.search import VectorSearchResult
class EmbeddingProvider(Provider):
class PipeStage(Enum):
BASE = 1
RERANK = 2
def __init__(self, config: EmbeddingConfig):
if not isinstance(config, EmbeddingConfig):
raise ValueError("EmbeddingProvider must be initialized with a `EmbeddingConfig`.")
super().__init__(config)
@abstractmethod
def get_embedding(
self,
text: str,
stage: PipeStage = PipeStage.BASE,
purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
):
pass
@abstractmethod
def get_embeddings(
self,
texts: list[str],
stage: PipeStage = PipeStage.BASE,
purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
):
pass
@abstractmethod
def rerank(
self,
query: str,
results: list[VectorSearchResult],
stage: PipeStage = PipeStage.RERANK,
limit: int = 10,
):
pass
@abstractmethod
def tokenize_string(
self, text: str, model: str, stage: PipeStage
) -> list[int]:
pass
def set_prefixes(self, config_prefixes: dict[str, str], base_model: str):
# Implementation of prefix setting
pass
Creating a Custom Embedding Provider
To create a custom Embedding provider, follow these steps:
- Create a new class that inherits from
EmbeddingProvider
. - Implement the required methods:
get_embedding
,get_embeddings
,rerank
, andtokenize_string
. - (Optional) Implement async versions of methods if needed.
- (Optional) Add any additional methods or attributes specific to your provider.
Here’s an example of a custom Embedding provider:
import numpy as np
from r2r.base import EmbeddingProvider, EmbeddingConfig
from r2r.abstractions.embedding import EmbeddingPurpose
from r2r.abstractions.search import VectorSearchResult
class CustomEmbeddingProvider(EmbeddingProvider):
def __init__(self, config: EmbeddingConfig):
super().__init__(config)
# Initialize any custom attributes or models here
self.model = self._load_custom_model(config.base_model)
def _load_custom_model(self, model_name):
# Load your custom embedding model here
pass
def get_embedding(
self,
text: str,
stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
) -> list[float]:
# Apply prefix if available
if purpose in self.prefixes:
text = f"{self.prefixes[purpose]}{text}"
# Generate embedding using your custom model
embedding = self.model.encode(text)
return embedding.tolist()
def get_embeddings(
self,
texts: list[str],
stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
) -> list[list[float]]:
# Apply prefixes if available
if purpose in self.prefixes:
texts = [f"{self.prefixes[purpose]}{text}" for text in texts]
# Generate embeddings in batches
all_embeddings = []
for i in range(0, len(texts), self.config.batch_size):
batch = texts[i:i+self.config.batch_size]
batch_embeddings = self.model.encode(batch)
all_embeddings.extend(batch_embeddings.tolist())
return all_embeddings
def rerank(
self,
query: str,
results: list[VectorSearchResult],
stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.RERANK,
limit: int = 10,
) -> list[VectorSearchResult]:
if not self.config.rerank_model:
return results[:limit]
# Implement custom reranking logic here
# This is a simple example using dot product similarity
query_embedding = self.get_embedding(query, stage, EmbeddingPurpose.QUERY)
for result in results:
result.score = np.dot(query_embedding, result.embedding)
reranked_results = sorted(results, key=lambda x: x.score, reverse=True)
return reranked_results[:limit]
def tokenize_string(
self, text: str, model: str, stage: EmbeddingProvider.PipeStage
) -> list[int]:
# Implement custom tokenization logic
# This is a simple example using basic string splitting
return [ord(char) for word in text.split() for char in word]
# Optionally implement async versions of methods
async def async_get_embedding(self, text: str, stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
# Implement async version if needed
return self.get_embedding(text, stage, purpose)
async def async_get_embeddings(self, texts: list[str], stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
# Implement async version if needed
return self.get_embeddings(texts, stage, purpose)
Registering and Using the Custom Provider
To use your custom Embedding provider in R2R:
- Update the
EmbeddingConfig
class to include your custom provider:
class EmbeddingConfig(ProviderConfig):
# ...existing code...
@property
def supported_providers(self) -> list[str]:
return [None, "openai", "ollama", "sentence-transformers", "custom"] # Add your custom provider here
- Update your R2R configuration to use the custom provider:
[embedding]
provider = "custom"
base_model = "your-custom-model"
base_dimension = 768
batch_size = 32
[embedding.prefixes]
index = "Represent this document for retrieval: "
query = "Represent this query for retrieving relevant documents: "
- In your R2R application, register the custom provider:
from r2r import R2R
from r2r.base import EmbeddingConfig
from your_module import CustomEmbeddingProvider
def get_embedding_provider(config: EmbeddingConfig):
if config.provider == "custom":
return CustomEmbeddingProvider(config)
# ... handle other providers ...
r2r = R2R(embedding_provider_factory=get_embedding_provider)
Now you can use your custom Embedding provider seamlessly within your R2R application:
# Ingest documents (embeddings will be generated using your custom provider)
r2r.ingest_files(["path/to/document.txt"])
# Perform a search
results = r2r.search("Your search query")
# Use RAG
rag_response = r2r.rag("Your question here")
By following this structure, you can integrate any embedding model or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed. This approach allows for great flexibility in choosing or implementing embedding solutions that best fit your specific use case.### Embedding Prefixes
Embedding Prefixes
R2R supports embedding prefixes to enhance embedding quality for different purposes:
- Index Prefixes: Applied to documents during indexing.
- Query Prefixes: Applied to search queries.
Configure prefixes in your r2r.toml
or when initializing the EmbeddingConfig.
Troubleshooting
Common issues and solutions:
- API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
- Dimension Mismatch: Verify that the
base_dimension
in your config matches the actual output of the chosen model. - Out of Memory Errors: Adjust the batch size or choose a smaller model if encountering memory issues with local models.
Performance Considerations
- Batching: Use batching for multiple, similar requests to improve throughput.
- Model Selection: Balance between model capability and inference speed based on your use case.
- Caching: Implement caching strategies to avoid re-embedding identical text.
Conclusion
R2R’s Embedding system provides a flexible and powerful foundation for integrating various embedding models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage embeddings to enhance your R2R-based projects.
For an advanced example of implementing reranking in R2R.
Was this page helpful?