14 KiB

Raw Permalink Blame History

Debugging and Diagnostics

This guide provides comprehensive debugging techniques, log analysis methods, and performance profiling tools for Open Notebook.

Log Analysis

Understanding Log Levels

Open Notebook uses structured logging with the following levels:

DEBUG: Detailed information for debugging
INFO: General information about system operation
WARNING: Potentially problematic situations
ERROR: Error events that might still allow the application to continue
CRITICAL: Serious errors that may cause the application to abort

Accessing Logs

Docker Deployment

# View all service logs
docker compose logs

# Follow logs in real-time
docker compose logs -f

# View logs for specific service
docker compose logs surrealdb
docker compose logs open_notebook

# View last 100 lines
docker compose logs --tail=100

# View logs with timestamps
docker compose logs -t

Source Installation

# API logs (if running in background)
tail -f api.log

# Worker logs
tail -f worker.log

# Database logs
docker compose logs surrealdb

# Next.js logs (stdout)
# Run in foreground to see logs directly

Log Configuration

Enable Debug Logging

# Add to .env or docker.env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart

Custom Log Configuration

# For development, add to your Python code
import logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Common Log Messages

Successful Operations

INFO - Starting Open Notebook services
INFO - Database connection established
INFO - API server started on port 5055
INFO - React frontend started on port 8502
INFO - Background worker started
INFO - Model configuration loaded
INFO - Source processed successfully

Warning Messages

WARNING - Rate limit approaching for provider: openai
WARNING - Large file upload detected: 50MB
WARNING - Model response truncated due to length
WARNING - Database connection retrying
WARNING - Cache miss for embedding

Error Messages

ERROR - Failed to connect to database: Connection refused
ERROR - API key invalid for provider: openai
ERROR - Model not found: gpt-4-invalid
ERROR - File processing failed: Unsupported format
ERROR - Background job failed: Timeout
ERROR - Memory limit exceeded

Error Diagnosis

Database Connection Errors

Symptoms

ERROR - Database connection failed
ERROR - Connection refused at localhost:8000
ERROR - Authentication failed for SurrealDB

Diagnosis Steps

Check SurrealDB status:
```
docker compose ps surrealdb
```

Verify connection settings:

# Check environment variables
echo $SURREAL_URL
echo $SURREAL_USER
echo $SURREAL_PASSWORD

Test direct connection:
```
curl http://localhost:8000/health
```
Check database logs:
```
docker compose logs surrealdb
```

Common Solutions

Restart SurrealDB container
Check port availability
Verify credentials
Check file permissions for data directory

AI Provider Errors

API Key Issues

ERROR - Invalid API key for provider: openai
ERROR - Authentication failed: API key not found
ERROR - Insufficient credits for provider: anthropic

Diagnosis:

# Test OpenAI key
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models

# Test Anthropic key
curl -H "x-api-key: $ANTHROPIC_API_KEY" \
     https://api.anthropic.com/v1/models

Model Not Found

ERROR - Model not found: gpt-4-invalid
ERROR - Model not available for your account

Diagnosis:

Check model name spelling
Verify model availability for your account
Check provider documentation for exact model names

Rate Limiting

ERROR - Rate limit exceeded for provider: openai
ERROR - Too many requests, please retry later

Diagnosis:

# Check rate limits in provider dashboard
# Monitor request frequency
# Implement retry logic with backoff

File Processing Errors

Upload Issues

ERROR - File upload failed: File too large
ERROR - Unsupported file type: .xyz
ERROR - File processing timeout

Diagnosis:

Check file size:
```
ls -lh /path/to/file
```
Verify file type:
```
file /path/to/file
```
Test with smaller file:
- Use minimal test file
- Gradually increase complexity

Processing Failures

ERROR - PDF extraction failed: Encrypted file
ERROR - Audio transcription failed: Unsupported codec
ERROR - Image OCR failed: Invalid image format

Diagnosis:

Check file integrity
Verify file format compliance
Test with known good files

Memory and Performance Issues

Out of Memory

ERROR - Out of memory: Cannot allocate
ERROR - Process killed due to memory limit
ERROR - Docker container OOMKilled

Diagnosis:

# Check memory usage
docker stats

# Check system memory
free -h

# Check Docker memory limits
docker system info | grep Memory

Performance Degradation

WARNING - Response time exceeded threshold: 30s
WARNING - High CPU usage detected: 95%
WARNING - Database query slow: 5s

Diagnosis:

# Monitor resources
htop
iostat -x 1

# Check database performance
docker compose logs surrealdb | grep -i slow

Performance Profiling

System Resource Monitoring

Real-time Monitoring

# Docker container resources
docker stats

# System resources
htop

# Disk I/O
iostat -x 1

# Network usage
nethogs

Historical Analysis

# Container resource history
docker logs --since="1h" container_name | grep -i memory

# System logs
journalctl -u docker --since="1 hour ago"

Application Performance

Response Time Analysis

# Measure API response times
time curl http://localhost:5055/api/notebooks

# Measure with verbose output
curl -w "@curl-format.txt" http://localhost:5055/api/notebooks

Create curl-format.txt:

     time_namelookup:  %{time_namelookup}\n
        time_connect:  %{time_connect}\n
     time_appconnect:  %{time_appconnect}\n
    time_pretransfer:  %{time_pretransfer}\n
       time_redirect:  %{time_redirect}\n
  time_starttransfer:  %{time_starttransfer}\n
                     ----------\n
          time_total:  %{time_total}\n

Database Performance

# Check database query performance
docker compose logs surrealdb | grep -i "slow\|performance\|query"

# Monitor database connections
docker compose exec surrealdb ps aux

Memory Profiling

# Add to Python code for memory profiling
import tracemalloc
tracemalloc.start()

# Your code here

current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current / 1024 / 1024:.1f} MB")
print(f"Peak memory usage: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

AI Provider Performance

Response Time Monitoring

# Monitor AI provider response times
grep -r "provider.*response_time" logs/

# Check for timeouts
grep -r "timeout\|Timeout" logs/

Usage Analytics

# Track AI usage patterns
# Add to your monitoring code
import time
start_time = time.time()

# AI API call here

end_time = time.time()
response_time = end_time - start_time
print(f"AI response time: {response_time:.2f}s")

Support Information Gathering

System Information Collection

Basic System Info

# System details
uname -a
lsb_release -a  # Linux
sw_vers  # macOS

# Docker information
docker version
docker compose version
docker system info

Open Notebook Information

# Version information
grep version pyproject.toml

# Service status
make status

# Environment check (without sensitive info)
env | grep -E "(SURREAL|LOG|DEBUG)" | grep -v "PASSWORD\|KEY"

Log Collection for Support

Comprehensive Log Collection

#!/bin/bash
# collect_logs.sh

echo "Collecting Open Notebook diagnostic information..."

# Create diagnostic directory
mkdir -p diagnostic_$(date +%Y%m%d_%H%M%S)
cd diagnostic_$(date +%Y%m%d_%H%M%S)

# System information
echo "Collecting system information..."
uname -a > system_info.txt
docker version >> system_info.txt
docker compose version >> system_info.txt

# Service status
echo "Collecting service status..."
make status > service_status.txt
docker compose ps >> service_status.txt

# Logs
echo "Collecting logs..."
docker compose logs --tail=500 > docker_logs.txt
docker compose logs surrealdb --tail=200 > surrealdb_logs.txt

# Configuration (sanitized)
echo "Collecting configuration..."
env | grep -E "(SURREAL|LOG|DEBUG)" | grep -v "PASSWORD\|KEY" > environment.txt

# Resource usage
echo "Collecting resource information..."
docker stats --no-stream > resource_usage.txt
df -h > disk_usage.txt
free -h > memory_info.txt

echo "Diagnostic collection complete!"
echo "Please compress and share the diagnostic_* directory"

Sanitizing Logs

# Remove sensitive information from logs
sed -i 's/sk-[a-zA-Z0-9]*/[REDACTED_API_KEY]/g' logs.txt
sed -i 's/password=[^[:space:]]*/password=[REDACTED]/g' logs.txt

Creating Reproduction Cases

Minimal Reproduction

Start with clean environment:

# Fresh installation
rm -rf surreal_data/ notebook_data/
docker compose down
docker compose up -d

Document exact steps:
- Each click or command
- Exact file used
- Configuration settings
- Expected vs actual behavior
Capture evidence:
- Screenshots of errors
- Full error messages
- Log excerpts
- System state

Test Case Template

## Bug Report

### Environment
- OS: [e.g., Ubuntu 22.04]
- Docker version: [e.g., 24.0.7]
- Open Notebook version: [e.g., 1.0.0]
- Installation method: [Docker/Source]

### Steps to Reproduce
1. Start Open Notebook
2. Create new notebook named "Test"
3. Add text source: "Hello world"
4. Navigate to Chat
5. Ask: "What is this about?"

### Expected Behavior
Should receive response about the text content

### Actual Behavior
Error: "Model not found"

### Logs

ERROR - Model not found: gpt-4-invalid


### Additional Context
- Using OpenAI provider
- gpt-5-mini model configured
- First time setup

Advanced Debugging

Database Debugging

Direct Database Access

# Connect to SurrealDB directly
docker compose exec surrealdb /surreal sql \
  --conn http://localhost:8000 \
  --user root \
  --pass root \
  --ns open_notebook \
  --db production

Query Analysis

-- Check table contents
SELECT * FROM notebook LIMIT 10;

-- Check relationships
SELECT * FROM source WHERE notebook_id = notebook:abc123;

-- Performance analysis
SELECT count() FROM source GROUP BY notebook_id;

Network Debugging

Service Communication

# Test internal Docker network
docker compose exec open_notebook ping surrealdb

# Test external connectivity
docker compose exec open_notebook curl -I https://api.openai.com

# Check port bindings
netstat -tulpn | grep -E "(8000|5055|8502)"

DNS Resolution

# Check DNS from container
docker compose exec open_notebook nslookup api.openai.com

# Check /etc/hosts
docker compose exec open_notebook cat /etc/hosts

Performance Debugging

CPU Profiling

# Add to Python code
import cProfile
import pstats

# Profile your function
cProfile.run('your_function()', 'profile_stats')

# Analyze results
p = pstats.Stats('profile_stats')
p.sort_stats('cumulative').print_stats(10)

Memory Leak Detection

# Track memory usage over time
import psutil
import os

def log_memory_usage():
    process = psutil.Process(os.getpid())
    memory_mb = process.memory_info().rss / 1024 / 1024
    print(f"Memory usage: {memory_mb:.1f} MB")

# Call periodically
log_memory_usage()

Monitoring and Alerting

Health Checks

Service Health Endpoints

# Check all health endpoints
curl -f http://localhost:8000/health  # SurrealDB
curl -f http://localhost:5055/health  # API
curl -f http://localhost:8502/healthz  # Next.js

Automated Health Monitoring

#!/bin/bash
# health_check.sh

services=("8000" "5055" "8502")
for port in "${services[@]}"; do
    if curl -f http://localhost:$port/health* >/dev/null 2>&1; then
        echo "✅ Service on port $port is healthy"
    else
        echo "❌ Service on port $port is unhealthy"
    fi
done

Log Monitoring

Real-time Error Monitoring

# Monitor for errors in real-time
docker compose logs -f | grep -i error

# Monitor specific patterns
docker compose logs -f | grep -E "(ERROR|CRITICAL|timeout)"

Log Analysis Scripts

#!/bin/bash
# analyze_logs.sh

echo "Error Summary:"
docker compose logs --since="1h" | grep -c "ERROR"

echo "Top Error Messages:"
docker compose logs --since="1h" | grep "ERROR" | \
  cut -d':' -f4- | sort | uniq -c | sort -nr | head -10

echo "Provider Issues:"
docker compose logs --since="1h" | grep -i "provider.*error"

Best Practices for Debugging

Systematic Approach

Reproduce the issue consistently
Isolate the problem to specific components
Check recent changes that might have caused issues
Gather evidence through logs and monitoring
Test hypotheses systematically
Document findings for future reference

Debugging Tools Checklist

System resource monitoring (htop, docker stats)
Log aggregation and analysis
Network connectivity testing
Database query analysis
API response time measurement
Memory usage tracking
Error rate monitoring

When to Seek Help

Issue persists after following troubleshooting guides
Problem affects multiple users or systems
Security-related concerns
Performance degradation without clear cause
Data integrity issues

This debugging guide is continuously updated based on real-world troubleshooting experiences. For additional support, join our Discord community or create a GitHub issue with your diagnostic information.

14 KiB Raw Permalink Blame History

Debugging and Diagnostics

Log Analysis

Understanding Log Levels

Accessing Logs

Docker Deployment

Source Installation

Log Configuration

Enable Debug Logging

Custom Log Configuration

Common Log Messages

Successful Operations

Warning Messages

Error Messages

Error Diagnosis

Database Connection Errors

Symptoms

Diagnosis Steps

Common Solutions

AI Provider Errors

API Key Issues

Model Not Found

Rate Limiting

File Processing Errors

Upload Issues

Processing Failures

Memory and Performance Issues

Out of Memory

Performance Degradation

Performance Profiling

System Resource Monitoring

Real-time Monitoring

Historical Analysis

Application Performance

Response Time Analysis

Database Performance

Memory Profiling

AI Provider Performance

Response Time Monitoring

Usage Analytics

Support Information Gathering

System Information Collection

Basic System Info

Open Notebook Information

Log Collection for Support

Comprehensive Log Collection

Sanitizing Logs

Creating Reproduction Cases

Minimal Reproduction

Test Case Template

Advanced Debugging

Database Debugging

Direct Database Access

Query Analysis

Network Debugging

Service Communication

DNS Resolution

Performance Debugging

CPU Profiling

Memory Leak Detection

Monitoring and Alerting

Health Checks

Service Health Endpoints

Automated Health Monitoring

Log Monitoring

Real-time Error Monitoring

Log Analysis Scripts

Best Practices for Debugging

Systematic Approach

Debugging Tools Checklist

When to Seek Help

14 KiB

Raw Permalink Blame History