Embedding Provider
By default, R2R uses the LiteLLM framework to communicate with various cloud embedding providers. To customize the embedding settings:
[embedding]
provider = "litellm"
base_model = "openai/text-embedding-3-small"
base_dimension = 512
batch_size = 128
add_title_as_prefix = false
rerank_model = "None"
concurrent_request_limit = 256
Let’s break down the embedding configuration options:
provider: Choose from ollama, litellm and openai. R2R defaults to using the LiteLLM framework for maximum embedding provider flexibility.
base_model: Specifies the embedding model to use. Format is typically “provider/model-name” (e.g., "openai/text-embedding-3-small").
base_dimension: Sets the dimension of the embedding vectors. Should match the output dimension of the chosen model.
batch_size: Determines the number of texts to embed in a single API call. Larger values can improve throughput but may increase latency.
add_title_as_prefix: When true, prepends the document title to the text before embedding, providing additional context.
rerank_model: Specifies a model for reranking results. Set to “None” to disable reranking (note: not supported by LiteLLMEmbeddingProvider).
concurrent_request_limit: Sets the maximum number of concurrent embedding requests to manage load and avoid rate limiting.
Embedding providers for an R2R system cannot be configured at runtime and are instead configured server side.
Supported LiteLLM Providers
Support for any of the embedding providers listed below is provided through LiteLLM.
OpenAI
Azure
Anthropic
Cohere
Ollama
HuggingFace
Bedrock
Vertex AI
Voyage AI
Example configuration:provider = "litellm"
base_model = "openai/text-embedding-3-small"
base_dimension = 512
export OPENAI_API_KEY=your_openai_key
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:
- text-embedding-3-small
- text-embedding-3-large
- text-embedding-ada-002
For detailed usage instructions, refer to the LiteLLM OpenAI Embedding documentation. Example configuration:provider = "litellm"
base_model = "azure/<your deployment name>"
base_dimension = XXX
export AZURE_API_KEY=your_azure_api_key
export AZURE_API_BASE=your_azure_api_base
export AZURE_API_VERSION=your_azure_api_version
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:For detailed usage instructions, refer to the LiteLLM Azure Embedding documentation. Anthropic does not currently offer embedding models. Consider using OpenAI or another provider for embeddings.
Example configuration:provider = "litellm"
base_model = "cohere/embed-english-v3.0"
base_dimension = 1_024
export COHERE_API_KEY=your_cohere_api_key
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:
- embed-english-v3.0
- embed-english-light-v3.0
- embed-multilingual-v3.0
- embed-multilingual-light-v3.0
- embed-english-v2.0
- embed-english-light-v2.0
- embed-multilingual-v2.0
For detailed usage instructions, refer to the LiteLLM Cohere Embedding documentation. When running with Ollama, additional changes are recommended for the to the r2r.toml file. In addition to using the ollama provider directly, we recommend restricting the concurrent_request_limit in order to avoid exceeding the throughput of your Ollama server.[embedding]
provider = "ollama"
base_model = "ollama/mxbai-embed-large"
base_dimension = 1_024
batch_size = 32
add_title_as_prefix = true
# Ensure your Ollama server is running
# Default Ollama server address: http://localhost:11434
# <-- OR -->
# Use `r2r --config-name=local_llm serve --docker`
# which bundles ollama with R2R in Docker by default!
r2r serve --config-path=r2r.toml
Then, deploy your R2R server with r2r serve --config-path=r2r.toml . Example configuration:[embedding]
provider = "litellm"
base_model = "huggingface/microsoft/codebert-base"
base_dimension = 768
export HUGGINGFACE_API_KEY=your_huggingface_api_key
r2r serve --config-path=r2r.toml
LiteLLM supports all Feature-Extraction Embedding models on HuggingFace.For detailed usage instructions, refer to the LiteLLM HuggingFace Embedding documentation. Example configuration:provider = "litellm"
base_model = "bedrock/amazon.titan-embed-text-v1"
base_dimension = 1_024
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION_NAME=your_region_name
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:
- amazon.titan-embed-text-v1
- cohere.embed-english-v3
- cohere.embed-multilingual-v3
For detailed usage instructions, refer to the LiteLLM Bedrock Embedding documentation. Example configuration:provider = "litellm"
base_model = "vertex_ai/textembedding-gecko"
base_dimension = 768
export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
export VERTEX_PROJECT=your_project_id
export VERTEX_LOCATION=your_project_location
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:
- textembedding-gecko
- textembedding-gecko-multilingual
- textembedding-gecko@001
- textembedding-gecko@003
- text-embedding-preview-0409
- text-multilingual-embedding-preview-0409
For detailed usage instructions, refer to the LiteLLM Vertex AI Embedding documentation. Example Configurationprovider = "litellm"
base_model = "voyage/voyage-01"
base_dimension = 1_024
export VOYAGE_API_KEY=your_voyage_api_key
# .. set other environment variables
r2r serve --config-path=r2r.toml
Supported models include:
- voyage-01
- voyage-lite-01
- voyage-lite-01-instruct
For detailed usage instructions, refer to the LiteLLM Voyage AI Embedding documentation.