17 KiB
System Architecture
This document provides a comprehensive overview of Open Notebook's architecture, including system design, component relationships, database schema, and service communication patterns.
🏗️ High-Level Architecture
Open Notebook follows a modern layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ Frontend Layer │
├─────────────────────────────────────────────────────────────┤
│ React frontend (pages/) │ REST API Clients (external) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ API Layer │
├─────────────────────────────────────────────────────────────┤
│ FastAPI Routers (api/routers/) │ Models (api/models.py) │
│ Middleware (auth, CORS) │ Service Layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Domain Layer │
├─────────────────────────────────────────────────────────────┤
│ Business Logic (open_notebook/domain/) │
│ Entity Models │ Validation │ Domain Services │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
├─────────────────────────────────────────────────────────────┤
│ Database (SurrealDB) │ AI Services (Esperanto) │
│ File Storage │ External APIs │
└─────────────────────────────────────────────────────────────┘
🧩 Core Components
1. API Layer (api/)
Purpose: HTTP interface for all application functionality
Key Components:
- FastAPI Application (
api/main.py): Main application with middleware and routing - Routers (
api/routers/): Endpoint definitions organized by domain - Models (
api/models.py): Pydantic models for request/response validation - Services (
api/*_service.py): Business logic orchestration - Authentication (
api/auth.py): Password-based authentication middleware
Architecture Pattern: Clean API architecture with service layer abstraction
# Example API structure
@router.post("/notebooks", response_model=NotebookResponse)
async def create_notebook(notebook: NotebookCreate):
"""Create a new notebook with validation and error handling."""
new_notebook = Notebook(name=notebook.name, description=notebook.description)
await new_notebook.save()
return NotebookResponse.from_domain(new_notebook)
2. Domain Layer (open_notebook/domain/)
Purpose: Core business logic and domain models
Key Components:
- Base Models (
base.py): Abstract base classes with common functionality - Entities:
Notebook,Source,Note,Model,Transformation - Services: Domain-specific business logic
- Validation: Data integrity and business rules
Architecture Pattern: Domain-Driven Design (DDD) with rich domain models
# Example domain model
class Notebook(BaseModel):
name: str
description: str
archived: bool = False
@classmethod
async def get_all(cls, order_by: str = "updated desc") -> List["Notebook"]:
"""Retrieve all notebooks with ordering."""
# Business logic implementation
async def save(self) -> None:
"""Save notebook with validation."""
# Domain validation and persistence
3. Database Layer (open_notebook/database/)
Purpose: Data persistence and query abstraction
Key Components:
- Repository Pattern (
repository.py): CRUD operations abstraction - Connection Management: Async SurrealDB connection handling
- Migrations: Database schema evolution (
migrations/) - Query Builder: SurrealQL query construction helpers
Architecture Pattern: Repository pattern with async/await
# Repository functions
async def repo_create(table: str, data: Dict[str, Any]) -> Dict[str, Any]
async def repo_query(query_str: str, vars: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]
async def repo_update(table: str, id: str, data: Dict[str, Any]) -> List[Dict[str, Any]]
async def repo_delete(record_id: Union[str, RecordID])
4. AI Processing Layer (open_notebook/graphs/)
Purpose: AI workflows and content processing
Key Components:
- LangChain Graphs: Multi-step AI workflows
- Ask System (
ask.py): Question-answering pipeline - Chat System (
chat.py): Conversational AI - Transformations (
transformation.py): Content analysis workflows - Source Processing (
source.py): Content ingestion and embedding
Architecture Pattern: LangGraph for workflow orchestration
# Example AI workflow
@create_graph
async def ask_graph(state: AskState):
"""Multi-step question answering workflow."""
# 1. Strategy generation
# 2. Search execution
# 3. Answer synthesis
# 4. Final response generation
5. Background Processing (commands/)
Purpose: Asynchronous job processing
Key Components:
- Command System: Background job definitions
- Job Queue: SurrealDB-backed job scheduling
- Status Tracking: Real-time job progress monitoring
- Error Handling: Comprehensive error recovery
Architecture Pattern: Command pattern with async job queue
🗃️ Database Schema
Open Notebook uses SurrealDB with a flexible document-oriented schema:
Core Tables
notebook
DEFINE TABLE notebook SCHEMAFULL;
DEFINE FIELD name ON TABLE notebook TYPE string;
DEFINE FIELD description ON TABLE notebook TYPE string;
DEFINE FIELD archived ON TABLE notebook TYPE bool DEFAULT false;
DEFINE FIELD created ON TABLE notebook TYPE datetime DEFAULT time::now();
DEFINE FIELD updated ON TABLE notebook TYPE datetime DEFAULT time::now();
source
DEFINE TABLE source SCHEMAFULL;
DEFINE FIELD title ON TABLE source TYPE option<string>;
DEFINE FIELD topics ON TABLE source TYPE option<array<string>>;
DEFINE FIELD asset ON TABLE source TYPE option<object>;
DEFINE FIELD full_text ON TABLE source TYPE option<string>;
DEFINE FIELD notebook_id ON TABLE source TYPE record<notebook>;
DEFINE FIELD embedding ON TABLE source TYPE option<array<number>>;
DEFINE FIELD created ON TABLE source TYPE datetime DEFAULT time::now();
DEFINE FIELD updated ON TABLE source TYPE datetime DEFAULT time::now();
note
DEFINE TABLE note SCHEMAFULL;
DEFINE FIELD title ON TABLE note TYPE option<string>;
DEFINE FIELD content ON TABLE note TYPE option<string>;
DEFINE FIELD note_type ON TABLE note TYPE option<string>;
DEFINE FIELD notebook_id ON TABLE note TYPE record<notebook>;
DEFINE FIELD embedding ON TABLE note TYPE option<array<number>>;
DEFINE FIELD created ON TABLE note TYPE datetime DEFAULT time::now();
DEFINE FIELD updated ON TABLE note TYPE datetime DEFAULT time::now();
model
DEFINE TABLE model SCHEMAFULL;
DEFINE FIELD name ON TABLE model TYPE string;
DEFINE FIELD provider ON TABLE model TYPE string;
DEFINE FIELD type ON TABLE model TYPE string;
DEFINE FIELD created ON TABLE model TYPE datetime DEFAULT time::now();
DEFINE FIELD updated ON TABLE model TYPE datetime DEFAULT time::now();
Specialized Tables
transformation
DEFINE TABLE transformation SCHEMAFULL;
DEFINE FIELD name ON TABLE transformation TYPE string;
DEFINE FIELD title ON TABLE transformation TYPE string;
DEFINE FIELD description ON TABLE transformation TYPE string;
DEFINE FIELD prompt ON TABLE transformation TYPE string;
DEFINE FIELD apply_default ON TABLE transformation TYPE bool DEFAULT false;
episode_profile (Podcast Generation)
DEFINE TABLE episode_profile SCHEMAFULL;
DEFINE FIELD name ON TABLE episode_profile TYPE string;
DEFINE FIELD description ON TABLE episode_profile TYPE option<string>;
DEFINE FIELD speaker_config ON TABLE episode_profile TYPE string;
DEFINE FIELD outline_provider ON TABLE episode_profile TYPE string;
DEFINE FIELD outline_model ON TABLE episode_profile TYPE string;
DEFINE FIELD transcript_provider ON TABLE episode_profile TYPE string;
DEFINE FIELD transcript_model ON TABLE episode_profile TYPE string;
DEFINE FIELD default_briefing ON TABLE episode_profile TYPE string;
DEFINE FIELD num_segments ON TABLE episode_profile TYPE int DEFAULT 5;
speaker_profile (Podcast Generation)
DEFINE TABLE speaker_profile SCHEMAFULL;
DEFINE FIELD name ON TABLE speaker_profile TYPE string;
DEFINE FIELD description ON TABLE speaker_profile TYPE option<string>;
DEFINE FIELD tts_provider ON TABLE speaker_profile TYPE string;
DEFINE FIELD tts_model ON TABLE speaker_profile TYPE string;
DEFINE FIELD speakers ON TABLE speaker_profile TYPE array<object>;
DEFINE FIELD speakers.*.name ON TABLE speaker_profile TYPE string;
DEFINE FIELD speakers.*.voice_id ON TABLE speaker_profile TYPE option<string>;
DEFINE FIELD speakers.*.backstory ON TABLE speaker_profile TYPE option<string>;
DEFINE FIELD speakers.*.personality ON TABLE speaker_profile TYPE option<string>;
Relationships
Record Links (SurrealDB native relationships):
source.notebook_id→notebookrecordsnote.notebook_id→notebookrecordsepisode.command→commandrecords
Embedding Relationships:
- Sources and notes can have vector embeddings for semantic search
- Embeddings are stored as arrays of numbers in the same record
🔄 Service Communication
API Communication Flow
graph TB
A[Client Request] --> B[FastAPI Router]
B --> C[Service Layer]
C --> D[Domain Model]
D --> E[Repository]
E --> F[SurrealDB]
F --> E
E --> D
D --> C
C --> B
B --> A
AI Processing Flow
graph TB
A[Content Input] --> B[Source Processing]
B --> C[Content Extraction]
C --> D[Embedding Generation]
D --> E[Database Storage]
E --> F[Search Index]
G[User Query] --> H[Vector Search]
H --> I[Context Retrieval]
I --> J[AI Model Processing]
J --> K[Response Generation]
Background Job Processing
graph TB
A[API Request] --> B[Command Creation]
B --> C[Job Queue]
C --> D[Background Worker]
D --> E[Job Execution]
E --> F[Status Updates]
F --> G[Result Storage]
G --> H[Client Notification]
🔧 Configuration Management
Environment Variables
Database Configuration:
SURREAL_URL=ws://localhost:8000/rpc
SURREAL_USER=root
SURREAL_PASSWORD=password
SURREAL_NAMESPACE=open_notebook
SURREAL_DATABASE=main
AI Provider Configuration:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AI...
Application Configuration:
APP_PASSWORD=optional_password
DEBUG=false
LOG_LEVEL=INFO
Configuration Loading
Configuration is managed through the open_notebook/config.py module:
class Config:
"""Application configuration with environment variable support."""
# Database settings
database_url: str = os.getenv("SURREAL_URL", "ws://localhost:8000/rpc")
database_user: str = os.getenv("SURREAL_USER", "root")
database_password: str = os.getenv("SURREAL_PASSWORD", "password")
# AI provider settings
openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
anthropic_api_key: Optional[str] = os.getenv("ANTHROPIC_API_KEY")
# Application settings
app_password: Optional[str] = os.getenv("APP_PASSWORD")
debug: bool = os.getenv("DEBUG", "false").lower() == "true"
🔍 Search Architecture
Multi-Modal Search System
Open Notebook implements both full-text and vector search:
Full-Text Search:
- SurrealDB native text search capabilities
- Keyword-based matching across content
- Fast and lightweight for exact matches
Vector Search:
- Semantic similarity using embeddings
- Cosine similarity scoring
- Context-aware result ranking
Search Implementation
async def vector_search(
keyword: str,
results: int = 10,
source: bool = True,
note: bool = True,
minimum_score: float = 0.2
) -> List[Dict[str, Any]]:
"""Perform vector search across sources and notes."""
# 1. Generate query embedding
# 2. Calculate similarity scores
# 3. Filter by minimum score
# 4. Rank and return results
🎙️ Podcast Generation Architecture
Multi-Speaker Podcast System
The podcast generation feature uses a sophisticated multi-step process:
Episode Profiles: Define the structure and style of podcasts
- Speaker configuration
- Content outline generation
- Transcript creation
- Audio synthesis
Speaker Profiles: Define individual speaker characteristics
- Voice selection (TTS models)
- Personality traits
- Background information
- Speaking patterns
Podcast Generation Flow
graph TB
A[Content Input] --> B[Episode Profile Selection]
B --> C[Outline Generation]
C --> D[Transcript Creation]
D --> E[Speaker Assignment]
E --> F[Audio Synthesis]
F --> G[Audio Post-Processing]
G --> H[Final Podcast]
📊 Performance Considerations
Async/Await Patterns
Open Notebook uses async/await throughout for optimal performance:
async def process_content(content: str) -> ProcessedContent:
"""Process content asynchronously."""
# Concurrent processing of multiple steps
embedding_task = asyncio.create_task(generate_embedding(content))
extraction_task = asyncio.create_task(extract_metadata(content))
embedding, metadata = await asyncio.gather(embedding_task, extraction_task)
return ProcessedContent(embedding=embedding, metadata=metadata)
Database Optimization
Connection Pooling: Efficient database connection management Query Optimization: Indexed queries and optimized SurrealQL Batch Operations: Bulk insert/update operations where possible
Caching Strategy
- In-Memory Caching: Model instances and configuration
- Result Caching: Expensive AI operations
- Content Caching: Processed documents and embeddings
🔒 Security Architecture
Authentication
Password-Based Authentication:
- Optional application-level password protection
- Middleware-based authentication
- Session management
Data Security
Privacy-First Design:
- Local data storage by default
- No external data transmission (except to chosen AI providers)
- Configurable AI provider selection
Input Validation:
- Pydantic model validation
- SQL injection prevention
- File upload security
🚀 Deployment Architecture
Container Architecture
# Multi-stage build for optimal size
FROM python:3.11-slim as builder
# Build dependencies
FROM python:3.11-slim as runtime
# Runtime environment
Service Orchestration
Docker Compose Configuration:
- Application container
- SurrealDB container
- Shared volume for data persistence
- Environment variable management
Scaling Considerations
Horizontal Scaling:
- Stateless API design
- Shared database backend
- Load balancer compatibility
Vertical Scaling:
- Async processing for CPU-intensive tasks
- Memory optimization for large documents
- Efficient embedding storage
This architecture provides a solid foundation for Open Notebook's current capabilities while supporting future enhancements and scaling requirements. The modular design allows for easy extension and modification of individual components without affecting the overall system.