> ## Documentation Index
> Fetch the complete documentation index at: https://r2r-patch-fix-ingestion.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Database

> Learn how to configure and use the database provider in R2R

## Introduction

R2R's `DatabaseProvider` offers a unified interface for both relational and vector database operations. R2R only provides database support through Postgres with the pgvector extension.

Postgres was selected to power R2R because it is free, open source, and widely considered be a stable state of the art database. Further, the Postgres community has implemented pgvector for efficient storage and retrieval of vector embeddings alongside traditional relational data.

## Architecture Overview

The Database Provider in R2R is built on two primary components:

1. **Vector Database Provider**: Handles vector-based operations such as similarity search and hybrid search.
2. **Relational Database Provider**: Manages traditional relational database operations, including user management and document metadata storage.

These providers work in tandem to ensure efficient data management and retrieval.

## Configuration

Update the `database` section in your `r2r.toml` file:

```toml
[database]
provider = "postgres"
user = "your_postgres_user"
password = "your_postgres_password"
host = "your_postgres_host"
port = "your_postgres_port"
db_name = "your_database_name"
your_project_name = "your_project_name"
```

Alternatively, you can set these values using environment variables:

```bash
export POSTGRES_USER=your_postgres_user
export POSTGRES_PASSWORD=your_postgres_password
export POSTGRES_HOST=your_postgres_host
export POSTGRES_PORT=your_postgres_port
export POSTGRES_DBNAME=your_database_name
export R2R_PROJECT_NAME=your_project_name
```

Environment variables take precedence over the config settings in case of conflicts. The R2R Docker includes configuration options that facilitate integration with a combined Postgres+pgvector database setup.

## Vector Database Operations

### Initialization

The vector database is automatically initialized with dimensions that correspond to your selected embedding model when the Database Provider is first created.

### Upsert Vector Entries

```python
from r2r import R2R, VectorEntry

app = R2R()

vector_entry = VectorEntry(id="unique_id", vector=[0.1, 0.2, 0.3], metadata={"key": "value"})
app.providers.database.vector.upsert(vector_entry)
```

### Search

```python
results = app.providers.database.vector.search(query_vector=[0.1, 0.2, 0.3], limit=10)
```

### Hybrid Search

```python
results = app.providers.database.vector.hybrid_search(
    query_text="search query",
    query_vector=[0.1, 0.2, 0.3],
    limit=10,
    full_text_weight=1.0,
    semantic_weight=1.0
)
```

### Delete by Metadata

```python
deleted_ids = app.providers.database.vector.delete_by_metadata(
    metadata_fields=["key1", "key2"],
    metadata_values=["value1", "value2"]
)
```

## Relational Database Operations

### User Management

#### Create User

```python
from r2r import UserCreate

new_user = UserCreate(email="user@example.com", password="secure_password")
created_user = app.providers.database.relational.create_user(new_user)
```

#### Get User by Email

```python
user = app.providers.database.relational.get_user_by_email("user@example.com")
```

#### Update User

```python
user.name = "New Name"
updated_user = app.providers.database.relational.update_user(user)
```

### Document Management

#### Upsert Document Overview

```python
from r2r import DocumentInfo

doc_info = DocumentInfo(
    document_id="doc_id",
    title="Document Title",
    user_id="user_id",
    version="1.0",
    size_in_bytes=1024,
    metadata={"key": "value"},
    status="processed"
)
app.providers.database.relational.upsert_documents_overview([doc_info])
```

#### Get Documents Overview

```python
docs = app.providers.database.relational.get_documents_overview(
    filter_document_ids=["doc_id1", "doc_id2"],
    filter_user_ids=["user_id1", "user_id2"]
)
```

## Advanced Features

### Hybrid Search Function

The Database Provider includes a custom Postgres function for hybrid search, combining full-text and vector similarity search.

### Token Blacklisting

For enhanced security, the provider supports token blacklisting:

```python
app.providers.database.relational.blacklist_token("token_to_blacklist")
is_blacklisted = app.providers.database.relational.is_token_blacklisted("token_to_check")
```

## Security Best Practices

1. **Environment Variables**: Use environment variables for sensitive information like database credentials.
2. **Connection Pooling**: The provider uses connection pooling for efficient database connections.
3. **Prepared Statements**: SQL queries use prepared statements to prevent SQL injection.
4. **Password Hashing**: User passwords are hashed before storage using the configured Crypto Provider.

## Performance Considerations

1. **Indexing**: The vector database automatically creates appropriate indexes for efficient similarity search.
2. **Batch Operations**: Use batch operations like `upsert_entries` for better performance when dealing with multiple records.
3. **Connection Reuse**: The provider reuses database connections to minimize overhead.

## Troubleshooting

Common issues and solutions:

1. **Connection Errors**: Ensure your database credentials and connection details are correct.
2. **Dimension Mismatch**: Verify that the vector dimension in your configuration matches your embeddings.
3. **Performance Issues**: Consider optimizing your queries or adding appropriate indexes.

## Conclusion

R2R's Database Provider offers a powerful and flexible solution for managing both vector and relational data. By leveraging Postgres with pgvector, it provides efficient storage and retrieval of embeddings alongside traditional relational data, enabling advanced search capabilities and robust user management.