Documentation Index
Fetch the complete documentation index at: https://r2r-patch-fix-ingestion.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml:
[completion]
provider = "litellm"
concurrent_request_limit = 16
[completion.generation_config]
model = "openai/gpt-4o"
temperature = 0.1
top_p = 1
max_tokens_to_sample = 1_024
stream = false
add_generation_kwargs = {}
Key generation configuration options:
provider: The LLM provider (defaults to “LiteLLM” for maximum flexibility).
concurrent_request_limit: Maximum number of concurrent LLM requests.
model: The language model to use for generation.
temperature: Controls the randomness of the output (0.0 to 1.0).
top_p: Nucleus sampling parameter (0.0 to 1.0).
max_tokens_to_sample: Maximum number of tokens to generate.
stream: Enable/disable streaming of generated text.
api_base: The base URL for remote communication, e.g. https://api.openai.com/v1
Serving select LLM providers
OpenAI
Azure
Anthropic
Vertex AI
AWS Bedrock
Groq
Ollama
Cohere
Anyscale
export OPENAI_API_KEY=your_openai_key
# .. set other environment variables
# Optional - Update default model
# Set '"model": "openai/gpt-4o-mini"' in `r2r.toml`
# then call `r2r serve --config-path=r2r.toml`
r2r serve
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.export AZURE_API_KEY=your_azure_api_key
export AZURE_API_BASE=your_azure_api_base
export AZURE_API_VERSION=your_azure_api_version
# .. set other environment variables
# Optional - Update default model
# Set '"model": "azure/<your deployment name>"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- azure/gpt-4o
- azure/gpt-4-turbo
- azure/gpt-4
- azure/gpt-4o-mini
- azure/gpt-4o-mini
For a complete list of supported Azure models and detailed usage instructions, please refer to the LiteLLM Azure documentation.
export ANTHROPIC_API_KEY=your_anthropic_key
# export ANTHROPIC_API_BASE=your_anthropic_base_url
# .. set other environment variables
# Optional - Update default model
# Set '"model": "anthropic/claude-3-opus-20240229"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- anthropic/claude-3-5-sonnet-20240620
- anthropic/claude-3-opus-20240229
- anthropic/claude-3-sonnet-20240229
- anthropic/claude-3-haiku-20240307
- anthropic/claude-2.1
For a complete list of supported Anthropic models and detailed usage instructions, please refer to the LiteLLM Anthropic documentation.export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
export VERTEX_PROJECT=your_project_id
export VERTEX_LOCATION=your_project_location
# .. set other environment variables
# Optional - Update default model
# Set '"model": "vertex_ai/gemini-pro"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- vertex_ai/gemini-pro
- vertex_ai/gemini-pro-vision
- vertex_ai/claude-3-opus@20240229
- vertex_ai/claude-3-sonnet@20240229
- vertex_ai/mistral-large@2407
For a complete list of supported Vertex AI models and detailed usage instructions, please refer to the LiteLLM Vertex AI documentation. Vertex AI requires additional setup for authentication and project configuration. Refer to the documentation for detailed instructions on setting up service accounts and configuring your environment.
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION_NAME=your_region_name
# .. set other environment variables
# Optional - Update default model
# Set '"model": "bedrock/anthropic.claude-v2"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- bedrock/anthropic.claude-3-sonnet-20240229-v1:0
- bedrock/anthropic.claude-v2
- bedrock/anthropic.claude-instant-v1
- bedrock/amazon.titan-text-express-v1
- bedrock/meta.llama2-70b-chat-v1
- bedrock/mistral.mixtral-8x7b-instruct-v0:1
For a complete list of supported AWS Bedrock models and detailed usage instructions, please refer to the LiteLLM AWS Bedrock documentation. AWS Bedrock requires boto3 to be installed (pip install boto3>=1.28.57). Make sure to set up your AWS credentials properly before using Bedrock models.
export GROQ_API_KEY=your_groq_api_key
# .. set other environment variables
# Optional - Update default model
# Set '"model": "groq/llama3-8b-8192"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- llama-3.1-8b-instant
- llama-3.1-70b-versatile
- llama-3.1-405b-reasoning
- llama3-8b-8192
- llama3-70b-8192
- mixtral-8x7b-32768
- gemma-7b-it
For a complete list of supported Groq models and detailed usage instructions, please refer to the LiteLLM Groq documentation.Note: Groq supports ALL models available on their platform. Use the prefix groq/ when specifying the model name.Additional features:
- Supports streaming responses
- Function/Tool calling available for compatible models
- Speech-to-Text capabilities with Whisper model
# Ensure your Ollama server is running
# Default Ollama server address: http://localhost:11434
# <-- OR -->
# Use `r2r --config-name=local_llm serve --docker`
# which bundles ollama with R2R in Docker by default!
# Optional - Update default model
# Copy `r2r/examples/configs/local_llm.toml` into `my_r2r_local_llm.toml`
# Set '"model": "ollama/llama3.1"' in `my_r2r_local_llm.toml`
# then call `r2r --config-path=my_r2r_local_llm.toml`
Supported models include:
- llama2
- mistral
- mistral-7B-Instruct-v0.1
- mixtral-8x7B-Instruct-v0.1
- codellama
- llava (vision model)
For a complete list of supported Ollama models and detailed usage instructions, please refer to the LiteLLM Ollama documentation.Ollama supports local deployment of various open-source models. Ensure you have the desired model pulled and running on your Ollama server.
See here for more detailed instructions on local RAG setup.
export COHERE_API_KEY=your_cohere_api_key
# .. set other environment variables
# Optional - Update default model
# Set '"model": "command-r"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- command-r
- command-light
- command-r-plus
- command-medium
For a complete list of supported Cohere models and detailed usage instructions, please refer to the LiteLLM Cohere documentation.export ANYSCALE_API_KEY=your_anyscale_api_key
# .. set other environment variables
# Optional - Update default model
# Set '"model": "anyscale/mistralai/Mistral-7B-Instruct-v0.1"' in `r2r.toml`
r2r serve --config-path=my_r2r.toml
Supported models include:
- anyscale/meta-llama/Llama-2-7b-chat-hf
- anyscale/meta-llama/Llama-2-13b-chat-hf
- anyscale/meta-llama/Llama-2-70b-chat-hf
- anyscale/mistralai/Mistral-7B-Instruct-v0.1
- anyscale/codellama/CodeLlama-34b-Instruct-hf
For a complete list of supported Anyscale models and detailed usage instructions, please refer to the Anyscale Endpoints documentation.Anyscale supports a wide range of models, including Llama 2, Mistral, and CodeLlama variants. Check the Anyscale Endpoints documentation for the most up-to-date list of available models.
Runtime Configuration of LLM Provider
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
Combining Search and Generation
When performing a RAG query, you can dynamically set the LLM generation settings:
response = client.rag(
"What are the latest advancements in quantum computing?",
rag_generation_config={
"stream": False,
"model": "openai/gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 150
}
)
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
Next Steps
For more detailed information on configuring specific components of R2R, please refer to the following pages: