> ## Documentation Index
> Fetch the complete documentation index at: https://r2r-patch-fix-ingestion.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Language Models (LLMs)

> Configure and use multiple Language Model providers in R2R

## Introduction

R2R's `LLMProvider` supports multiple third-party Language Model (LLM) providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various LLM providers within the R2R framework.

## Architecture Overview

R2R's LLM system is built on a flexible provider model:

1. **LLM Provider**: An abstract base class that defines the common interface for all LLM providers.
2. **Specific LLM Providers**: Concrete implementations for different LLM services (e.g., OpenAI, LiteLLM).

These providers work in tandem to ensure flexible and efficient language model integration.

## Providers

### LiteLLM Provider (Default)

The default `LiteLLMProvider` offers a unified interface for multiple LLM services.

Key features:

* Support for OpenAI, Anthropic, Vertex AI, HuggingFace, Azure OpenAI, Ollama, Together AI, and Openrouter
* Consistent API across different LLM providers
* Easy switching between models

### OpenAI Provider

The `OpenAILLM` class provides direct integration with OpenAI's models.

Key features:

* Direct access to OpenAI's API
* Support for the latest OpenAI models
* Fine-grained control over model parameters

### Local Models

Support for running models locally using Ollama or other local inference engines, through LiteLLM.

Key features:

* Privacy-preserving local inference
* Customizable model selection
* Reduced latency for certain use cases

## Configuration

### LLM Configuration

Update the `completions` section in your `r2r.toml` file:

```
[completions]
provider = "litellm"

[completions.generation_config]
model = "gpt-4"
temperature = 0.7
max_tokens = 150
```

<Note>
  The provided `generation_config` is used to establish the default generation parameters for your deployment. These settings can be overridden at runtime, offering flexibility in your application. You can adjust parameters:

  1. At the application level, by modifying the R2R configuration
  2. For individual requests, by passing custom parameters to the `rag` or `get_completion` methods
  3. Through API calls, by including specific parameters in your request payload

  This allows you to fine-tune the behavior of your language model interactions on a per-use basis while maintaining a consistent baseline configuration.
</Note>

## Security Best Practices

1. **API Key Management**: Use environment variables or secure key management solutions for API keys.
2. **Rate Limiting**: Implement rate limiting to prevent abuse of LLM endpoints.
3. **Input Validation**: Sanitize and validate all inputs before passing them to LLMs.
4. **Output Filtering**: Implement content filtering for LLM outputs to prevent inappropriate content.
5. **Monitoring**: Regularly monitor LLM usage and outputs for anomalies or misuse.

## Custom LLM Providers in R2R

### LLM Provider Structure

The LLM system in R2R is built on two main components:

1. `LLMConfig`: A configuration class for LLM providers.
2. `LLMProvider`: An abstract base class that defines the interface for all LLM providers.

### LLMConfig

The `LLMConfig` class is used to configure LLM providers:

```python
from r2r.base import ProviderConfig
from r2r.base.abstractions.llm import GenerationConfig
from typing import Optional

class LLMConfig(ProviderConfig):
    provider: Optional[str] = None
    generation_config: Optional[GenerationConfig] = None

    def validate(self) -> None:
        if not self.provider:
            raise ValueError("Provider must be set.")
        if self.provider and self.provider not in self.supported_providers:
            raise ValueError(f"Provider '{self.provider}' is not supported.")

    @property
    def supported_providers(self) -> list[str]:
        return ["litellm", "openai"]
```

### LLMProvider

The `LLMProvider` is an abstract base class that defines the common interface for all LLM providers:

```python
from abc import abstractmethod
from r2r.base import Provider
from r2r.base.abstractions.llm import GenerationConfig, LLMChatCompletion, LLMChatCompletionChunk

class LLMProvider(Provider):
    def __init__(self, config: LLMConfig) -> None:
        if not isinstance(config, LLMConfig):
            raise ValueError("LLMProvider must be initialized with a `LLMConfig`.")
        super().__init__(config)

    @abstractmethod
    def get_completion(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletion:
        pass

    @abstractmethod
    def get_completion_stream(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletionChunk:
        pass
```

### Creating a Custom LLM Provider

To create a custom LLM provider, follow these steps:

1. Create a new class that inherits from `LLMProvider`.
2. Implement the required methods: `get_completion` and `get_completion_stream`.
3. (Optional) Add any additional methods or attributes specific to your provider.

Here's an example of a custom LLM provider:

```python
import logging
from typing import Generator
from r2r.base import LLMProvider, LLMConfig, LLMChatCompletion, LLMChatCompletionChunk
from r2r.base.abstractions.llm import GenerationConfig

logger = logging.getLogger(__name__)

class CustomLLMProvider(LLMProvider):
    def __init__(self, config: LLMConfig) -> None:
        super().__init__(config)
        # Initialize any custom attributes or connections here
        self.custom_client = self._initialize_custom_client()

    def _initialize_custom_client(self):
        # Initialize your custom LLM client here
        pass

    def get_completion(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletion:
        # Implement the logic to get a completion from your custom LLM
        response = self.custom_client.generate(messages, **generation_config.dict(), **kwargs)

        # Convert the response to LLMChatCompletion format
        return LLMChatCompletion(
            id=response.id,
            choices=[
                {
                    "message": {
                        "role": "assistant",
                        "content": response.text
                    },
                    "finish_reason": response.finish_reason
                }
            ],
            usage={
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens
            }
        )

    def get_completion_stream(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> Generator[LLMChatCompletionChunk, None, None]:
        # Implement the logic to get a streaming completion from your custom LLM
        stream = self.custom_client.generate_stream(messages, **generation_config.dict(), **kwargs)

        for chunk in stream:
            yield LLMChatCompletionChunk(
                id=chunk.id,
                choices=[
                    {
                        "delta": {
                            "role": "assistant",
                            "content": chunk.text
                        },
                        "finish_reason": chunk.finish_reason
                    }
                ]
            )

    # Add any additional methods specific to your custom provider
    def custom_method(self, *args, **kwargs):
        # Implement custom functionality
        pass
```

### Registering and Using the Custom Provider

To use your custom LLM provider in R2R:

1. Update the `LLMConfig` class to include your custom provider:

```python
class LLMConfig(ProviderConfig):
    # ...existing code...

    @property
    def supported_providers(self) -> list[str]:
        return ["litellm", "openai", "custom"]  # Add your custom provider here
```

2. Update your R2R configuration to use the custom provider:

```toml
[completions]
provider = "custom"

[completions.generation_config]
model = "your-custom-model"
temperature = 0.7
max_tokens = 150
```

3. In your R2R application, register the custom provider:

```python
from r2r import R2R
from r2r.base import LLMConfig
from your_module import CustomLLMProvider

def get_llm_provider(config: LLMConfig):
    if config.provider == "custom":
        return CustomLLMProvider(config)
    # ... handle other providers ...

r2r = R2R(llm_provider_factory=get_llm_provider)
```

Now you can use your custom LLM provider seamlessly within your R2R application:

```python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

response = r2r.get_completion(messages)
print(response.choices[0].message.content)
```

By following this structure, you can integrate any LLM or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed.

## Prompt Engineering

R2R supports advanced prompt engineering techniques:

1. **Template Management**: Create and manage reusable prompt templates.
2. **Dynamic Prompts**: Generate prompts dynamically based on context or user input.
3. **Few-shot Learning**: Incorporate examples in your prompts for better results.

## Troubleshooting

Common issues and solutions:

1. **API Key Errors**: Ensure your API keys are correctly set and have the necessary permissions.
2. **Rate Limiting**: Implement exponential backoff for retries on rate limit errors.
3. **Context Length Errors**: Be mindful of the maximum context length for your chosen model.
4. **Model Availability**: Ensure the requested model is available and properly configured.

## Performance Considerations

1. **Batching**: Use batching for multiple, similar requests to improve throughput.
2. **Streaming**: Utilize streaming for long-form content generation to improve perceived latency.
3. **Model Selection**: Balance between model capability and inference speed based on your use case.

## Server Configuration

The `R2RConfig` class handles the configuration of various components, including LLMs. Here's a simplified version:

```python
class R2RConfig:
    REQUIRED_KEYS: dict[str, list] = {
        # ... other keys ...
        "completions": ["provider"],
        # ... other keys ...
    }

    def __init__(self, config_data: dict[str, Any]):
        # Load and validate configuration
        # ...

        # Set LLM configuration
        self.completions = LLMConfig.create(**self.completions)

        # Override GenerationConfig defaults
        GenerationConfig.set_default(**self.completions.get("generation_config", {}))

        # ... other initialization ...
```

This configuration system allows for flexible setup of LLM providers and their default parameters.

## Conclusion

R2R's LLM system provides a flexible and powerful foundation for integrating various language models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage LLMs to enhance your R2R-based projects.

For further customization and advanced use cases, refer to the [R2R API Documentation](/api-reference) and [configuration guide](/documentation/configuration).