# RAG Troubleshooting ## Auto-Triage: Run First Start with this diagnostic sweep: ```bash echo "=== CONTAINERS ===" && docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null | grep -E "(rag|elasticsearch|milvus|seaweedfs|nim|ingest|redis|etcd|embedding|ranking)" | head -25; echo "=== HEALTH ===" && curl -s http://localhost:8081/v1/health?check_dependencies=true 2>/dev/null || echo "RAG_UNREACHABLE"; curl -s http://localhost:8082/v1/health?check_dependencies=true 2>/dev/null || echo "INGESTOR_UNREACHABLE"; echo "=== LOGS ===" && for svc in rag-server ingestor-server nim-llm-ms nemotron-vlm-embedding-ms nemotron-embedding-ms nemotron-ranking-ms elasticsearch seaweedfs; do echo "--- $svc ---"; docker logs --tail 20 "$svc" 2>/dev/null | grep -iE "(error|fail|exception|timeout|oom|permission)" || echo "OK"; done; echo "=== GPU ===" && nvidia-smi 2>/dev/null | head -20 || echo "NO_GPU"; echo "=== DISK ===" && df -h / | tail -1; echo "=== DOCKER_DISK ===" && docker system df 2>/dev/null; echo "=== VOLUMES ===" && docker volume ls --filter "name=^rag-vol-" 2>/dev/null; echo "=== K8S ===" && kubectl get pods -n rag 2>/dev/null | head -20 || echo "NOT_K8S" ``` Analyze all output, then diagnose and fix. If Auto-Triage doesn't reveal the cause, dig deeper into the specific failing service's logs (`docker logs --tail 100` or `kubectl logs -n rag --tail 100`). Confirm with the user before deleting data (volumes, collections, model cache), changing deployment mode, or modifying API keys. ## Source Documentation for Detailed Diagnosis Read these docs to find specific issue descriptions, causes, and fixes: - `docs/troubleshooting.md` — primary reference: all common issues with detailed symptoms/fixes - `docs/debugging.md` — Pipeline debugging: monitoring deployment, verifying endpoints, tracing requests - `docs/service-port-gpu-reference.md` — Complete port/GPU mapping table for all services ## Expected Deployment Times If user reports "deployment is taking too long," compare against these baselines: | Mode | First Run | Subsequent | |------|-----------|------------| | Docker (self-hosted) | 15--30 min (model downloads) | 2--5 min | | Docker (NVIDIA-hosted) | 5--10 min (no model downloads) | 1--2 min | | K8s/Helm | 60--70 min (NIM cache 40--50 min + init 10--15 min + pod startup 5--10 min) | 10--15 min | If deployment exceeds these times, check NIM container logs: `docker logs nim-llm-ms --tail 50` and model cache disk usage: `watch -n 10 'du -sh ~/.cache/model-cache/'`. ## Service Health Endpoints Read `docs/service-port-gpu-reference.md` for the complete port/GPU mapping. Quick check: | Service | URL | Expected | |---------|-----|----------| | RAG Server | `http://localhost:8081/v1/health?check_dependencies=true` | `{"status":"healthy"}` | | Ingestor | `http://localhost:8082/v1/health?check_dependencies=true` | `{"status":"healthy"}` | | NV-Ingest | `http://localhost:7670/v1/health/ready` | 200 OK | | VLM Embedding NIM (default) | `http://localhost:9081/v1/health/ready` | 200 OK | | LLM NIM | `http://localhost:8999/v1/health/ready` | 200 OK | | Ranking NIM | `http://localhost:1976/v1/health/ready` | 200 OK | | Elasticsearch | `http://localhost:9200/_cluster/health` | `green` or `yellow` | ## Kubernetes Monitoring Commands ```bash kubectl get nimcache -n rag kubectl get pods -n rag kubectl logs -f -n rag kubectl get pvc -n rag kubectl get events -n rag --sort-by='.lastTimestamp' ``` Pods in `ContainerCreating` or `Init` state during model download is expected. Use `kubectl get nimcache -n rag -w` to watch download progress. ## Enable Debug Logging ```bash export LOGLEVEL=DEBUG docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d --no-deps ingestor-server docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d --no-deps rag-server ``` --- ## Symptom-to-Fix Quick Index Match the symptom from Auto-Triage output, then read `docs/troubleshooting.md` for the detailed fix. For pipeline debugging steps, read `docs/debugging.md`. | Symptom | Category | Quick Fix | |---------|----------|-----------| | NIM container stuck at `(health: starting)` >30min | NIM Startup | Check GPU memory, NGC auth, disk space. First-run model downloads are slow — wait and monitor cache size. | | Elasticsearch unhealthy / search returns nothing | Elasticsearch | Restart vectordb compose. Check port 9200, disk, credentials, and `rag-vol-elasticsearch`. | | Document upload fails / ingestor health check fails | NV-Ingest | Check Redis, OCR NIMs. Rate limit (429) → reduce batch vars. Large PDFs → reduce batch size. | | Chat returns errors / /generate fails | RAG Server | Check LLM NIM health, embedding NIM, cloud API key. Verify `APP_LLM_MODELNAME` matches deployed NIM. | | DNS resolution failed for `:` | Networking | Service container not running. Check `docker ps`, restart missing service. | | Port already in use | Networking | `lsof -i :` to find conflicting process. See port table above. | | GPU out of memory / `torch.OutOfMemoryError` | GPU | Kill other GPU processes, use `--profile rag` for fewer NIMs, or set correct `NIM_MODEL_PROFILE`. | | `nvidia-container-cli: unknown device` | GPU | GPU ID exceeds available GPUs. Run `nvidia-smi -L`, adjust `*_GPU_ID` vars to valid IDs. | | Disk full / insufficient space | Disk | `docker system prune -f`, remove unused images, check model cache size. | | `no configuration file provided: not found` | Docker Compose | Run from the repo root directory. | | `too many open files` | Docker Compose | Set `LimitNOFILE=65536` in containerd override, restart containerd. | | PVC stuck in Pending | Helm | Create missing StorageClass or update PVC. | | `ProvisioningFailed` access mode mismatch | Helm | Patch NIMCache to `ReadWriteOnce`. | | Ingestor OOMKilled | Helm | Increase memory limits in values.yaml. Set `SUMMARY_MAX_PARALLELIZATION=1`. | | Elasticsearch timeout during ingestion | Elasticsearch | Increase `ES_REQUEST_TIMEOUT` (default 600s). | | Need to inspect or reset persisted Docker data | Volumes | Use `docker volume ls --filter "name=^rag-vol-"`; see `docs/troubleshooting.md#manage-persistent-data-volumes`. | | Hallucination / out-of-context responses | Quality | Add missing-info handling to prompt in `prompt.yaml`. | | Embedding dimensions mismatch | Models | Set `APP_EMBEDDINGS_DIMENSIONS` to match model output. Re-ingest. | | Hybrid/dense search type mismatch | Search | Align `APP_VECTORSTORE_SEARCHTYPE` on ingestor and rag-server. Re-ingest. | | Confidence threshold filtering all results | Search | Lower `RERANKER_SCORE_THRESHOLD` (range 0.0–1.0, default 0.0). | | OCR not starting / connection errors | OCR | Check GPU memory, NGC auth. Verify `OCR_GRPC_ENDPOINT`/`OCR_HTTP_ENDPOINT` match running service. | | NVIDIA API credits exhausted | Cloud | Contact NVIDIA representative for additional credits. | | Image-only PDFs not ingesting | Ingestion | Enable `APP_NVINGEST_EXTRACTINFOGRAPHICS`. Consider image captioning. | --- ## Troubleshooting Checklists ### Ingestion Checklist - [ ] All required containers running (ingestor-server, nv-ingest-ms-runtime, milvus, redis) - [ ] Vector database accessible (`curl http://localhost:9200/_cluster/health` for default Elasticsearch, or `curl http://localhost:9091/healthz` for Milvus) - [ ] Embedding service healthy (`curl http://localhost:9081/v1/health/ready` for default VLM embedding, or `curl http://localhost:9080/v1/health/ready` for `text-embed`) - [ ] File format supported and size <= 400 MB - [ ] Sufficient disk space (`df -h /`) - [ ] GPU resources available (`nvidia-smi`) ### Retrieval Checklist - [ ] RAG server running and healthy - [ ] LLM service accessible (`curl http://localhost:8999/v1/health/ready`) - [ ] Vector database contains data (collection exists with documents) - [ ] Collection name is correct - [ ] Query format is valid ### Quality Checklist - [ ] Reranker is enabled and healthy - [ ] Top-K values are appropriate - [ ] Collection has sufficient relevant data - [ ] Query rewriting configured correctly - [ ] Prompt template appropriate for use case --- ## Full Reset Destroys all data (volumes, images, caches). Confirm with the user before running. If nothing else works and the user confirms: ```bash cd "$(git rev-parse --show-toplevel)" docker compose -f deploy/compose/docker-compose-nemo-guardrails.yaml down 2>/dev/null docker compose -f deploy/compose/observability.yaml down 2>/dev/null docker compose -f deploy/compose/docker-compose-rag-server.yaml down 2>/dev/null docker compose -f deploy/compose/docker-compose-ingestor-server.yaml down 2>/dev/null docker compose -f deploy/compose/vectordb.yaml down 2>/dev/null docker compose -f deploy/compose/nims.yaml down 2>/dev/null docker volume ls -q --filter "name=^rag-vol-" | xargs -r docker volume rm docker system prune -af ``` Then deploy fresh using the deploy workflow.