Configure your R2R Postgres database
R2R uses PostgreSQL as the sole provider for relational and vector search queries. This means that Postgres is involved in handling authentication, document management, and search across R2R. For robust search capabilities, R2R leverages the pgvector
extension and ts_rank
to implement customizable hybrid search.
R2R chose Postgres as its core technology for several reasons:
Read more about Postgres here.
To customize the database settings, you can modify the database
section in your r2r.toml
file and set corresponding environment variables or provide the settings directly in the configuration file.
database
section in your r2r.toml
file:R2R leverages several advanced PostgreSQL features to provide powerful search and retrieval capabilities:
R2R uses the pgvector
extension to enable efficient vector similarity search. This is crucial for semantic search operations. The collection.py
file defines a custom Vector
type that interfaces with pgvector
:
This allows R2R to perform vector similarity searches using different distance measures.
R2R implements a sophisticated hybrid search that combines full-text search and vector similarity search. This approach provides more accurate and contextually relevant results. Key components of the hybrid search include:
ts_rank
and websearch_to_tsquery
.pgvector
.The collection.py
file includes methods for building complex SQL queries that implement this hybrid search approach.
R2R uses GIN (Generalized Inverted Index) indexing to optimize full-text searches:
This indexing strategy allows for efficient full-text search and trigram similarity matching.
R2R leverages PostgreSQL’s JSONB type for flexible metadata storage:
This allows for efficient storage and querying of structured metadata alongside vector embeddings.
When setting up PostgreSQL for R2R, consider the following performance optimizations:
Indexing: Ensure proper indexing for both full-text and vector searches. R2R automatically creates necessary indexes, but you may need to optimize them based on your specific usage patterns.
Hardware: For large-scale deployments, consider using dedicated PostgreSQL instances with sufficient CPU and RAM to handle vector operations efficiently.
Vacuuming: Regular vacuuming helps maintain database performance, especially for tables with frequent updates or deletions.
Partitioning: For very large datasets, consider table partitioning to improve query performance.
By leveraging these advanced PostgreSQL features and optimizations, R2R provides a powerful and flexible foundation for building sophisticated retrieval and search systems.
Configure your R2R Postgres database
R2R uses PostgreSQL as the sole provider for relational and vector search queries. This means that Postgres is involved in handling authentication, document management, and search across R2R. For robust search capabilities, R2R leverages the pgvector
extension and ts_rank
to implement customizable hybrid search.
R2R chose Postgres as its core technology for several reasons:
Read more about Postgres here.
To customize the database settings, you can modify the database
section in your r2r.toml
file and set corresponding environment variables or provide the settings directly in the configuration file.
database
section in your r2r.toml
file:R2R leverages several advanced PostgreSQL features to provide powerful search and retrieval capabilities:
R2R uses the pgvector
extension to enable efficient vector similarity search. This is crucial for semantic search operations. The collection.py
file defines a custom Vector
type that interfaces with pgvector
:
This allows R2R to perform vector similarity searches using different distance measures.
R2R implements a sophisticated hybrid search that combines full-text search and vector similarity search. This approach provides more accurate and contextually relevant results. Key components of the hybrid search include:
ts_rank
and websearch_to_tsquery
.pgvector
.The collection.py
file includes methods for building complex SQL queries that implement this hybrid search approach.
R2R uses GIN (Generalized Inverted Index) indexing to optimize full-text searches:
This indexing strategy allows for efficient full-text search and trigram similarity matching.
R2R leverages PostgreSQL’s JSONB type for flexible metadata storage:
This allows for efficient storage and querying of structured metadata alongside vector embeddings.
When setting up PostgreSQL for R2R, consider the following performance optimizations:
Indexing: Ensure proper indexing for both full-text and vector searches. R2R automatically creates necessary indexes, but you may need to optimize them based on your specific usage patterns.
Hardware: For large-scale deployments, consider using dedicated PostgreSQL instances with sufficient CPU and RAM to handle vector operations efficiently.
Vacuuming: Regular vacuuming helps maintain database performance, especially for tables with frequent updates or deletions.
Partitioning: For very large datasets, consider table partitioning to improve query performance.
By leveraging these advanced PostgreSQL features and optimizations, R2R provides a powerful and flexible foundation for building sophisticated retrieval and search systems.