14 KiB
14 KiB
Debugging and Diagnostics
This guide provides comprehensive debugging techniques, log analysis methods, and performance profiling tools for Open Notebook.
Log Analysis
Understanding Log Levels
Open Notebook uses structured logging with the following levels:
DEBUG: Detailed information for debuggingINFO: General information about system operationWARNING: Potentially problematic situationsERROR: Error events that might still allow the application to continueCRITICAL: Serious errors that may cause the application to abort
Accessing Logs
Docker Deployment
# View all service logs
docker compose logs
# Follow logs in real-time
docker compose logs -f
# View logs for specific service
docker compose logs surrealdb
docker compose logs open_notebook
# View last 100 lines
docker compose logs --tail=100
# View logs with timestamps
docker compose logs -t
Source Installation
# API logs (if running in background)
tail -f api.log
# Worker logs
tail -f worker.log
# Database logs
docker compose logs surrealdb
# Next.js logs (stdout)
# Run in foreground to see logs directly
Log Configuration
Enable Debug Logging
# Add to .env or docker.env
LOG_LEVEL=DEBUG
# Restart services
docker compose restart
Custom Log Configuration
# For development, add to your Python code
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
Common Log Messages
Successful Operations
INFO - Starting Open Notebook services
INFO - Database connection established
INFO - API server started on port 5055
INFO - React frontend started on port 8502
INFO - Background worker started
INFO - Model configuration loaded
INFO - Source processed successfully
Warning Messages
WARNING - Rate limit approaching for provider: openai
WARNING - Large file upload detected: 50MB
WARNING - Model response truncated due to length
WARNING - Database connection retrying
WARNING - Cache miss for embedding
Error Messages
ERROR - Failed to connect to database: Connection refused
ERROR - API key invalid for provider: openai
ERROR - Model not found: gpt-4-invalid
ERROR - File processing failed: Unsupported format
ERROR - Background job failed: Timeout
ERROR - Memory limit exceeded
Error Diagnosis
Database Connection Errors
Symptoms
ERROR - Database connection failed
ERROR - Connection refused at localhost:8000
ERROR - Authentication failed for SurrealDB
Diagnosis Steps
-
Check SurrealDB status:
docker compose ps surrealdb -
Verify connection settings:
# Check environment variables echo $SURREAL_URL echo $SURREAL_USER echo $SURREAL_PASSWORD -
Test direct connection:
curl http://localhost:8000/health -
Check database logs:
docker compose logs surrealdb
Common Solutions
- Restart SurrealDB container
- Check port availability
- Verify credentials
- Check file permissions for data directory
AI Provider Errors
API Key Issues
ERROR - Invalid API key for provider: openai
ERROR - Authentication failed: API key not found
ERROR - Insufficient credits for provider: anthropic
Diagnosis:
# Test OpenAI key
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
# Test Anthropic key
curl -H "x-api-key: $ANTHROPIC_API_KEY" \
https://api.anthropic.com/v1/models
Model Not Found
ERROR - Model not found: gpt-4-invalid
ERROR - Model not available for your account
Diagnosis:
- Check model name spelling
- Verify model availability for your account
- Check provider documentation for exact model names
Rate Limiting
ERROR - Rate limit exceeded for provider: openai
ERROR - Too many requests, please retry later
Diagnosis:
# Check rate limits in provider dashboard
# Monitor request frequency
# Implement retry logic with backoff
File Processing Errors
Upload Issues
ERROR - File upload failed: File too large
ERROR - Unsupported file type: .xyz
ERROR - File processing timeout
Diagnosis:
-
Check file size:
ls -lh /path/to/file -
Verify file type:
file /path/to/file -
Test with smaller file:
- Use minimal test file
- Gradually increase complexity
Processing Failures
ERROR - PDF extraction failed: Encrypted file
ERROR - Audio transcription failed: Unsupported codec
ERROR - Image OCR failed: Invalid image format
Diagnosis:
- Check file integrity
- Verify file format compliance
- Test with known good files
Memory and Performance Issues
Out of Memory
ERROR - Out of memory: Cannot allocate
ERROR - Process killed due to memory limit
ERROR - Docker container OOMKilled
Diagnosis:
# Check memory usage
docker stats
# Check system memory
free -h
# Check Docker memory limits
docker system info | grep Memory
Performance Degradation
WARNING - Response time exceeded threshold: 30s
WARNING - High CPU usage detected: 95%
WARNING - Database query slow: 5s
Diagnosis:
# Monitor resources
htop
iostat -x 1
# Check database performance
docker compose logs surrealdb | grep -i slow
Performance Profiling
System Resource Monitoring
Real-time Monitoring
# Docker container resources
docker stats
# System resources
htop
# Disk I/O
iostat -x 1
# Network usage
nethogs
Historical Analysis
# Container resource history
docker logs --since="1h" container_name | grep -i memory
# System logs
journalctl -u docker --since="1 hour ago"
Application Performance
Response Time Analysis
# Measure API response times
time curl http://localhost:5055/api/notebooks
# Measure with verbose output
curl -w "@curl-format.txt" http://localhost:5055/api/notebooks
Create curl-format.txt:
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
Database Performance
# Check database query performance
docker compose logs surrealdb | grep -i "slow\|performance\|query"
# Monitor database connections
docker compose exec surrealdb ps aux
Memory Profiling
# Add to Python code for memory profiling
import tracemalloc
tracemalloc.start()
# Your code here
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current / 1024 / 1024:.1f} MB")
print(f"Peak memory usage: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()
AI Provider Performance
Response Time Monitoring
# Monitor AI provider response times
grep -r "provider.*response_time" logs/
# Check for timeouts
grep -r "timeout\|Timeout" logs/
Usage Analytics
# Track AI usage patterns
# Add to your monitoring code
import time
start_time = time.time()
# AI API call here
end_time = time.time()
response_time = end_time - start_time
print(f"AI response time: {response_time:.2f}s")
Support Information Gathering
System Information Collection
Basic System Info
# System details
uname -a
lsb_release -a # Linux
sw_vers # macOS
# Docker information
docker version
docker compose version
docker system info
Open Notebook Information
# Version information
grep version pyproject.toml
# Service status
make status
# Environment check (without sensitive info)
env | grep -E "(SURREAL|LOG|DEBUG)" | grep -v "PASSWORD\|KEY"
Log Collection for Support
Comprehensive Log Collection
#!/bin/bash
# collect_logs.sh
echo "Collecting Open Notebook diagnostic information..."
# Create diagnostic directory
mkdir -p diagnostic_$(date +%Y%m%d_%H%M%S)
cd diagnostic_$(date +%Y%m%d_%H%M%S)
# System information
echo "Collecting system information..."
uname -a > system_info.txt
docker version >> system_info.txt
docker compose version >> system_info.txt
# Service status
echo "Collecting service status..."
make status > service_status.txt
docker compose ps >> service_status.txt
# Logs
echo "Collecting logs..."
docker compose logs --tail=500 > docker_logs.txt
docker compose logs surrealdb --tail=200 > surrealdb_logs.txt
# Configuration (sanitized)
echo "Collecting configuration..."
env | grep -E "(SURREAL|LOG|DEBUG)" | grep -v "PASSWORD\|KEY" > environment.txt
# Resource usage
echo "Collecting resource information..."
docker stats --no-stream > resource_usage.txt
df -h > disk_usage.txt
free -h > memory_info.txt
echo "Diagnostic collection complete!"
echo "Please compress and share the diagnostic_* directory"
Sanitizing Logs
# Remove sensitive information from logs
sed -i 's/sk-[a-zA-Z0-9]*/[REDACTED_API_KEY]/g' logs.txt
sed -i 's/password=[^[:space:]]*/password=[REDACTED]/g' logs.txt
Creating Reproduction Cases
Minimal Reproduction
-
Start with clean environment:
# Fresh installation rm -rf surreal_data/ notebook_data/ docker compose down docker compose up -d -
Document exact steps:
- Each click or command
- Exact file used
- Configuration settings
- Expected vs actual behavior
-
Capture evidence:
- Screenshots of errors
- Full error messages
- Log excerpts
- System state
Test Case Template
## Bug Report
### Environment
- OS: [e.g., Ubuntu 22.04]
- Docker version: [e.g., 24.0.7]
- Open Notebook version: [e.g., 1.0.0]
- Installation method: [Docker/Source]
### Steps to Reproduce
1. Start Open Notebook
2. Create new notebook named "Test"
3. Add text source: "Hello world"
4. Navigate to Chat
5. Ask: "What is this about?"
### Expected Behavior
Should receive response about the text content
### Actual Behavior
Error: "Model not found"
### Logs
ERROR - Model not found: gpt-4-invalid
### Additional Context
- Using OpenAI provider
- gpt-5-mini model configured
- First time setup
Advanced Debugging
Database Debugging
Direct Database Access
# Connect to SurrealDB directly
docker compose exec surrealdb /surreal sql \
--conn http://localhost:8000 \
--user root \
--pass root \
--ns open_notebook \
--db production
Query Analysis
-- Check table contents
SELECT * FROM notebook LIMIT 10;
-- Check relationships
SELECT * FROM source WHERE notebook_id = notebook:abc123;
-- Performance analysis
SELECT count() FROM source GROUP BY notebook_id;
Network Debugging
Service Communication
# Test internal Docker network
docker compose exec open_notebook ping surrealdb
# Test external connectivity
docker compose exec open_notebook curl -I https://api.openai.com
# Check port bindings
netstat -tulpn | grep -E "(8000|5055|8502)"
DNS Resolution
# Check DNS from container
docker compose exec open_notebook nslookup api.openai.com
# Check /etc/hosts
docker compose exec open_notebook cat /etc/hosts
Performance Debugging
CPU Profiling
# Add to Python code
import cProfile
import pstats
# Profile your function
cProfile.run('your_function()', 'profile_stats')
# Analyze results
p = pstats.Stats('profile_stats')
p.sort_stats('cumulative').print_stats(10)
Memory Leak Detection
# Track memory usage over time
import psutil
import os
def log_memory_usage():
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"Memory usage: {memory_mb:.1f} MB")
# Call periodically
log_memory_usage()
Monitoring and Alerting
Health Checks
Service Health Endpoints
# Check all health endpoints
curl -f http://localhost:8000/health # SurrealDB
curl -f http://localhost:5055/health # API
curl -f http://localhost:8502/healthz # Next.js
Automated Health Monitoring
#!/bin/bash
# health_check.sh
services=("8000" "5055" "8502")
for port in "${services[@]}"; do
if curl -f http://localhost:$port/health* >/dev/null 2>&1; then
echo "✅ Service on port $port is healthy"
else
echo "❌ Service on port $port is unhealthy"
fi
done
Log Monitoring
Real-time Error Monitoring
# Monitor for errors in real-time
docker compose logs -f | grep -i error
# Monitor specific patterns
docker compose logs -f | grep -E "(ERROR|CRITICAL|timeout)"
Log Analysis Scripts
#!/bin/bash
# analyze_logs.sh
echo "Error Summary:"
docker compose logs --since="1h" | grep -c "ERROR"
echo "Top Error Messages:"
docker compose logs --since="1h" | grep "ERROR" | \
cut -d':' -f4- | sort | uniq -c | sort -nr | head -10
echo "Provider Issues:"
docker compose logs --since="1h" | grep -i "provider.*error"
Best Practices for Debugging
Systematic Approach
- Reproduce the issue consistently
- Isolate the problem to specific components
- Check recent changes that might have caused issues
- Gather evidence through logs and monitoring
- Test hypotheses systematically
- Document findings for future reference
Debugging Tools Checklist
- System resource monitoring (htop, docker stats)
- Log aggregation and analysis
- Network connectivity testing
- Database query analysis
- API response time measurement
- Memory usage tracking
- Error rate monitoring
When to Seek Help
- Issue persists after following troubleshooting guides
- Problem affects multiple users or systems
- Security-related concerns
- Performance degradation without clear cause
- Data integrity issues
This debugging guide is continuously updated based on real-world troubleshooting experiences. For additional support, join our Discord community or create a GitHub issue with your diagnostic information.