Vector store issues
Troubleshooting Guide: Vector Storage Problems in R2R
Vector storage is a crucial component in R2R (RAG to Riches) for efficient similarity searches. This guide focuses on troubleshooting common vector storage issues, particularly with Postgres and pgvector.
1. Connection Issues
Symptom: R2R can’t connect to the vector database
-
Check Postgres Connection:
psql -h localhost -U your_username -d your_database
If this fails, the issue might be with Postgres itself, not specifically vector storage.
-
Verify Environment Variables: Ensure these are correctly set in your R2R configuration:
POSTGRES_USER
POSTGRES_PASSWORD
POSTGRES_HOST
POSTGRES_PORT
POSTGRES_DBNAME
R2R_PROJECT_NAME
-
Check Docker Network: If using Docker, ensure the R2R and Postgres containers are on the same network:
docker network inspect r2r-network
2. pgvector Extension Issues
Symptom: “extension pgvector does not exist” error
-
Check if pgvector is Installed: Connect to your database and run:
SELECT * FROM pg_extension WHERE extname = 'vector';
-
Install pgvector: If not installed, run:
CREATE EXTENSION vector;
-
Verify Postgres Version: pgvector requires Postgres 11 or later. Check your version:
SELECT version();
3. Vector Dimension Mismatch
Symptom: Error inserting vectors or during similarity search
-
Check Vector Dimensions: Verify the dimension of vectors you’re trying to insert matches your schema:
SELECT * FROM information_schema.columns WHERE table_name = 'your_vector_table' AND data_type = 'vector';
-
Verify R2R Configuration: Ensure the vector dimension in your R2R configuration matches your database schema.
-
Recreate Table with Correct Dimensions: If dimensions are mismatched, you may need to recreate the table:
DROP TABLE your_vector_table; CREATE TABLE your_vector_table (id bigserial PRIMARY KEY, embedding vector(384));
4. Performance Issues
Symptom: Slow similarity searches
-
Check Index: Ensure you have an appropriate index:
CREATE INDEX ON your_vector_table USING ivfflat (embedding vector_cosine_ops);
-
Analyze Table: Run ANALYZE to update statistics:
ANALYZE your_vector_table;
-
Monitor Query Performance: Use
EXPLAIN ANALYZE
to check query execution plans:EXPLAIN ANALYZE SELECT * FROM your_vector_table ORDER BY embedding <=> '[your_vector]' LIMIT 10;
-
Adjust Work Memory: If dealing with large vectors, increase work_mem:
SET work_mem = '1GB';
5. Data Integrity Issues
Symptom: Unexpected search results or missing data
-
Check Vector Normalization: Ensure vectors are normalized before insertion if using cosine similarity.
-
Verify Data Insertion: Check if data is being correctly inserted:
SELECT COUNT(*) FROM your_vector_table;
-
Inspect Random Samples: Look at some random entries to ensure data quality:
SELECT * FROM your_vector_table ORDER BY RANDOM() LIMIT 10;
6. Disk Space Issues
Symptom: Insertion failures or database unresponsiveness
-
Check Disk Space:
df -h
-
Monitor Postgres Disk Usage:
SELECT pg_size_pretty(pg_database_size('your_database'));
-
Identify Large Tables:
SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;
7. Backup and Recovery
If all else fails, you may need to restore from a backup:
-
Create a Backup:
pg_dump -h localhost -U your_username -d your_database > backup.sql
-
Restore from Backup:
psql -h localhost -U your_username -d your_database < backup.sql
Getting Further Help
If these steps don’t resolve your issue:
- Check R2R logs for more detailed error messages.
- Consult the pgvector documentation for advanced troubleshooting.
- Reach out to the R2R community or support channels with detailed information about your setup and the steps you’ve tried.
Remember to always backup your data before making significant changes to your database or vector storage configuration.
Was this page helpful?