Co-authored-by: meizhong986 <148393222+meizhong986@users.noreply.github.com> |
||
|---|---|---|
| .github | ||
| installer | ||
| notebook | ||
| tests | ||
| whisperjav | ||
| .gitignore | ||
| LICENSE | ||
| MANIFEST.in | ||
| README.md | ||
| RELEASE_NOTES_v1.5.1.md | ||
| pyproject.toml | ||
| requirements.txt | ||
| setup.py | ||
README.md
WhisperJAV - Japanese Adult Video Subtitle Generator
WhisperJAV is a subtitle generation tool optimized for Japanese Adult Videos (JAV). It uses custom enhancements specifically tailored for the audio characteristics, and sound patterns in JAV media.
🌟 Key Features
- Three Processing Modes: Optimized pipelines for different content types and quality requirements.
- Japanese Language Processing: Custom post-processing for natural dialogue segmentation.
- Scene Detection: Automatic scene splitting for better transcription accuracy.
- VAD Integration: Voice Activity Detection for improved speech recognition.
- Hallucination Removal: Specialized filters for common JAV transcription errors.
- AI Translation: Built-in subtitle translation powered by DeepSeek, Gemini, Claude, and more.
- GUI and CLI: User-friendly interface and command-line options.
- Batch Processing: Process multiple files with progress tracking.
📋 Table of Contents
- Installation
- Quick Start
- AI Translation
- Processing Modes Guide
- Sensitivity Settings
- Advanced Japanese Language Features
- Usage Examples
- Configuration
- GUI Interface
- Troubleshooting
- Contributing
- License
- Acknowledgments
- Disclaimer
🔧 Installation
Prerequisites
Please see the details at the end of this readme for more details.
- Python 3.9 - 3.12 (Python 3.13+ is not compatible with openai-whisper)
- CUDA-capable GPU, drivers, CUDA Toolkit, cuDNN (CUDA > 11.7)
- CUDA-version of pytorch, torchaudio and torchvision
- FFmpeg installed and in your system's PATH
- PIP and git installation packages
Install from Source
# Standard installation (RECOMMENDED - use the latest commit from main)
pip install git+https://github.com/meizhong986/whisperjav.git@main
# For users with existing installations, Update:
pip install -U --no-deps git+https://github.com/meizhong986/whisperjav.git@main
### ⚠️ Important Note
Please make sure that you have installed cuda enabled pytorch, and pyaudio before installing whisperjav. Otherwise, openai-whisper will automatically installs a CPU torch version which is 8 times slower. You don't want that!!!
Example for CUDA 12.4 torch 2.5.1 (the version WhisperJAv has been tested for):
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
Dependencies
The main dependencies will be automatically installed:
openai-whisperorfaster-whisperstable-tstorch(with CUDA support)pysrttqdmnumpysoundfile
🚀 Quick Start
Command Line
# Basic usage with default settings
whisperjav video.mp4
# Specify mode and output directory
whisperjav audio.wav --mode faster --output-dir ./subtitles
# Process multiple files with specific sensitivity
whisperjav *.mp3 --mode balanced --sensitivity aggressive
# Generate English subtitles
whisperjav video.mp4 --subs-language english-direct
GUI
whisperjav-gui
🌐 AI Translation
WhisperJAV includes built-in AI-powered subtitle translation via whisperjav-translate.
Quick Translation Setup
-
Get an API key from DeepSeek (recommended, ~$0.10 per 100k tokens)
-
Set your API key:
# Windows (PowerShell) $env:DEEPSEEK_API_KEY="sk-..." # Linux/Mac export DEEPSEEK_API_KEY="sk-..." -
Translate subtitles:
# Standalone translation whisperjav-translate -i movie.srt # Generate and translate in one command whisperjav video.mp4 --translate
Translation Features
- Multiple AI Providers: DeepSeek (default), Gemini, Claude, GPT-4, OpenRouter
- Smart Caching: Instructions fetched from Gist with local caching
- Settings File: Save your preferences for repeated use
- Tone Styles: Standard or Pornify (explicit content) translation styles
- Multiple Languages: Translate to English, Spanish, Chinese, Indonesian
- Metadata-aware prompts: Optional movie title, plot, and actress name to improve translation
- Sampling controls: Temperature and top_p supported; pornify tone applies sensible defaults
Translation Examples
# Translate to Spanish with pornify style
whisperjav-translate -i movie.srt -t spanish --tone pornify
# Use Gemini provider
whisperjav-translate -i movie.srt --provider gemini
# Configure translation preferences interactively
whisperjav-translate --configure
# View API key setup instructions
whisperjav-translate --print-env
# Advanced: Pass provider-specific options
whisperjav-translate -i movie.srt --provider-option temperature=0.7 --provider-option max_tokens=2000
# Generate subtitles and translate together
whisperjav video.mp4 --translate --translate-provider deepseek
# Use local custom instructions instead of defaults
whisperjav-translate -i movie.srt --instructions-file C:\path\to\my_instructions.txt
# Add helpful metadata context (optional)
whisperjav-translate -i movie.srt \
--movie-title "JAV-123: After-work Massage" \
--actress "Yua Mikami" \
--movie-plot "Office worker gets an after-hours massage that turns intimate"
# Explicitly set sampling params (override tone defaults)
whisperjav-translate -i movie.srt --temperature 0.5 --top-p 0.85
Configuration
WhisperJAV-Translate supports persistent settings to avoid repeating common flags:
# Interactive configuration wizard (RECOMMENDED)
whisperjav-translate --configure
# View API key setup instructions
whisperjav-translate --print-env
# Show current settings
whisperjav-translate --show-settings
Settings are stored in:
- Windows:
%AppData%\WhisperJAV\translate\settings.json - Linux:
~/.config/WhisperJAV/translate/settings.json - Mac:
~/Library/Application Support/WhisperJAV/translate/settings.json
Configuration Precedence (highest to lowest):
- CLI flags (e.g.,
--provider gemini) - Environment variables (e.g.,
DEEPSEEK_API_KEY) - Settings file
- Built-in defaults
Instructions source precedence:
- Local file via
--instructions-file - Settings file mapping for source/tone
- Default remote Gist (with ETag cache) → local cache → bundled fallback
Supported Providers
| Provider | Cost | Setup | API Key Env Var |
|---|---|---|---|
| DeepSeek (default) | ~$0.10/100k tokens | platform.deepseek.com | DEEPSEEK_API_KEY |
| Gemini | Free tier available | makersuite.google.com | GEMINI_API_KEY |
| Claude | Pay-as-you-go | console.anthropic.com | ANTHROPIC_API_KEY |
| GPT-4 | Pay-as-you-go | platform.openai.com | OPENAI_API_KEY |
| OpenRouter | Varies | openrouter.ai | OPENROUTER_API_KEY |
For detailed translation options, run: whisperjav-translate --help
📊 Processing Modes Guide
Choose the appropriate mode based on your content type and requirements:
| Mode | Best For | Characteristics | Processing Speed | Accuracy |
|---|---|---|---|---|
| Faster | • Whole-file runs where speed matters | • Faster‑Whisper backend • No scene splitting • Internal VAD (Stable‑TS packed) • Batched inference (higher throughput) |
⚡⚡⚡ Fast | Adequate |
| Fast | • Mixed content quality with variable audio | • Faster‑Whisper backend • Scene detection enabled (mandatory splitting) • Internal VAD (Stable‑TS packed) • Non‑batched inference (batch_size=1) |
⚡⚡ Medium | Satisfactory |
| Balanced | • Max accuracy for dialogue timing and noisy audio | • Scene detection + separate VAD + WhisperPro (OpenAI Whisper) | ||
| • Most accurate timestamps | ⚡ Slower | Good |
Content-Specific Recommendations
| Genre | Recommended Mode | Recommended Sensitivity |
|---|---|---|
| Drama/Dialogue Heavy | balanced | aggressive |
| Group/3p/4p Scenes | faster | conservative |
| Amateur/Homemade | fast | conservative |
| Vintage (pre-2000) | fast | balanced |
| ASMR/VR Content | balanced | aggressive |
| Compilation/Omnibus | faster | conservative |
| Heavy Background Music | balanced | conservative |
| Outdoor/Public Scenes | fast | balanced |
🎚️ Sensitivity Settings
The sensitivity parameter controls the trade-off between capturing detail and avoiding noise/hallucinations:
Conservative
- Fewer false positives: Reduces hallucinated text and repetitions.
- Higher confidence threshold: Only includes clearly spoken words.
- Best for:
- Poor audio quality recordings
- Heavy background noise or music
- Vintage/degraded content
- Content with lots of non-speech sounds
- Trade-off: May miss some quiet or unclear speech.
Balanced (Default)
- Optimal balance: Good detection with reasonable filtering.
- Moderate thresholds: Captures most speech while filtering obvious errors.
- Best for:
- Standard quality recordings
- Mixed content types
- General-purpose transcription
- First-time users
- Trade-off: A balanced approach to all aspects.
Aggressive
- Maximum detail capture: Attempts to transcribe everything.
- Lower confidence threshold: Includes uncertain segments.
- Best for:
- High-quality audio
- ASMR or whisper content
- Content where every utterance matters
- Professional recordings with clear audio
- Trade-off: May include more false positives and hallucinations.
Sensitivity Selection Matrix
| Audio Quality | Background Noise | Speech Clarity | Recommended Sensitivity |
|---|---|---|---|
| Poor | High | Unclear | Conservative |
| Average | Moderate | Mixed | Balanced |
| Excellent | Low | Clear | Aggressive |
| Variable | Variable | Variable | Balanced |
🗾 Advanced Japanese Language Features
WhisperJAV includes sophisticated Japanese language processing specifically optimized for adult content dialogue.
Dialogue-Optimized Segmentation
The system uses advanced stable-ts regrouping algorithms customized for Japanese conversational patterns.
# Automatic application of Japanese-specific rules:
# - Sentence-ending particles (ね, よ, わ, の, ぞ, ぜ, さ, か)
# - Polite forms (です, ます, でした, ましょう)
# - Question particles detection
# - Emotional expressions and interjections
# - Casual contractions (ちゃ, じゃ, きゃ)
Specialized Pattern Recognition
- Aizuchi and Fillers: Automatically identifies and handles:
あの,ええと,まあ,なんか(filler words)うん,はい,ええ,そう(acknowledgments)
- Emotional Expressions: Preserves important non-lexical vocalizations:
ああ,うう,はあ,ふう(sighs, moans)- Maintains timing for emotional context
- Dialect Support: Recognizes common dialect patterns:
- Kansai-ben endings (
わ,で,ねん,や) - Feminine speech patterns (
かしら,わね,のよ) - Masculine speech patterns (
ぜ,ぞ,だい)
- Kansai-ben endings (
Custom Regrouping Strategies
The system automatically selects appropriate regrouping based on content:
# These are applied automatically based on mode and sensitivity:
--mode balanced # Applies comprehensive regrouping
--sensitivity aggressive # Includes more nuanced patterns
Timing Optimization for Natural Speech
- Gap-based merging: Combines segments with natural speech pauses.
- Punctuation-aware splitting: Respects Japanese punctuation (
。,、,!,?). - Maximum subtitle duration: Ensures readability (default 7-8 seconds).
- Minimum duration filtering: Removes micro-segments.
📖 Usage Examples
Basic Transcription
# Generate Japanese subtitles (default)
whisperjav video.mp4
# Generate English translation
whisperjav video.mp4 --subs-language english-direct
Batch Processing
# Process an entire directory
whisperjav /path/to/videos/*.mp4 --output-dir ./output
# Process with specific settings
whisperjav *.mp4 --mode balanced --sensitivity aggressive --output-dir ./subs
Advanced Options
# Keep temporary files for debugging
whisperjav video.mp4 --keep-temp
# Enable all enhancement features
whisperjav video.mp4 --adaptive-classification --adaptive-audio-enhancement --smart-postprocessing
# Use a custom configuration file
whisperjav video.mp4 --config my_config.json
# Specify a different Whisper model (WiP)
whisperjav video.mp4 --model large-v2
Output Options
# Save processing statistics to a file
whisperjav video.mp4 --stats-file stats.json
# Disable progress bars
whisperjav video.mp4 --no-progress
# Use a custom temporary directory (e.g., on a fast SSD)
whisperjav video.mp4 --temp-dir /fast/ssd/temp
⚙️ Configuration
Configuration File Format (Work in Progress --subject to change)
Create a custom config.json to override default settings:
{
"modes": {
"balanced": {
"scene_detection": {
"max_duration": 30.0,
"min_duration": 0.2,
"max_silence": 2.0
},
"vad_options": {
"threshold": 0.4,
"min_speech_duration_ms": 150
}
}
},
"sensitivity_profiles": {
"aggressive": {
"hallucination_threshold": 0.8,
"repetition_threshold": 3,
"min_confidence": 0.5
}
}
}
🖥️ GUI Interface
The PyWebView-based GUI provides a modern, responsive interface for users who prefer not to use the command line.
Features
- Modern HTML/CSS/JS interface with professional look and feel
- Drag-and-drop file and folder selection
- Real-time progress monitoring and log streaming
- Visual mode and sensitivity selection with descriptions
- Advanced settings in tabbed interface
- Keyboard shortcuts (Ctrl+O, Ctrl+R, F1, Esc, F5)
- Console output display with real-time updates
System Requirements
- Windows: Requires WebView2 runtime (automatically installed with Microsoft Edge browser)
- macOS: Uses native WebKit (built-in)
- Linux: Uses GTK WebKit2
For detailed GUI usage instructions, see GUI_USER_GUIDE.md.
Status of Adaptive Features (WIP)
The following optional features are present in the UI/CLI switches but are currently work in progress and not yet fully functional end-to-end:
- Adaptive scene classification (
--adaptive-classification) - Adaptive audio enhancement (
--adaptive-audio-enhancement) - Smart post‑processing (
--smart-postprocessing)
You can toggle them, but expect incomplete behavior or no effect in some pipelines. We’ll remove this note once they’re production‑ready.
GUI Quick Start
- Launch the GUI:
whisperjav-gui - Select files using the "Add Files" button or drag and drop.
- Choose your Processing Mode (Faster/Fast/Balanced).
- Select the Sensitivity (Conservative/Balanced/Aggressive).
- Choose the Output Language (Japanese/English).
- Click "Start" to begin processing.
🔍 Troubleshooting
Common Issues
```
- Issue:
FFmpeg not found- Solution: Install FFmpeg and ensure it's in your system's PATH.
# Ubuntu/Debian sudo apt install ffmpeg # Windows (using Chocolatey) choco install ffmpeg # macOS (using Homebrew) brew install ffmpeg
- Solution: Install FFmpeg and ensure it's in your system's PATH.
- Issue: Slow processing or GUI looks hanged
- Solution: Often it is caused by wrong pytorch version
Remove CPU version of pytorch if exist
- Solution: Often it is caused by wrong pytorch version
Performance Tips
- GPU Acceleration: Ensure CUDA is properly installed for a 3-5x speed improvement.
- SSD Storage: Use an SSD for temporary files via the
--temp-dirargument for faster I/O. - Batch Processing: Process multiple files in one run to avoid reloading the model for each file.
- Memory Usage: Close other memory-intensive applications when processing large files.
🤝 Contributing
We welcome contributions! Please see our CONTRIBUTING.md for details on how to get started.
Development Setup
git clone [https://github.com/yourusername/whisperjav.git](https://github.com/yourusername/whisperjav.git)
cd whisperjav
# Install in editable mode with development dependencies
pip install -e .[dev]
# Run tests
python -m pytest tests/
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- The OpenAI Whisper team for the base ASR technology.
- The stable-ts project for enhanced timestamp features.
- The faster-whisper project for optimized inference.
- The JAV transcription community for their invaluable feedback and testing.
⚠️ Disclaimer
This tool is designed for creating accessibility subtitles and for use as a language-learning material. Users are solely responsible for compliance with all applicable local and international laws and regulations regarding the content they choose to process.
Prerequisites:
✅ Tools You MUST Install First (Prerequisites)
Install these in the order listed. If you already have them, ensure they are up-to-date.
🎮 NVIDIA CUDA Platform (Drivers, CUDA Toolkit, cuDNN)
What you need:
Your NVIDIA Graphics Card Drivers, the CUDA Toolkit, and the cuDNN library.
All three are essential for WhisperJAV to use your GPU.
🔧 How to install:
1. NVIDIA Graphics Driver
- Ensure you have the latest drivers for your NVIDIA GPU.
- 📥 Download from: https://www.nvidia.com/drivers
2. CUDA Toolkit
- Open Command Prompt (CMD) and type:
nvidia-smi - Note the
CUDA Version:(e.g., 12.3). - 📥 Go to: https://developer.nvidia.com/cuda-downloads
- Select Windows, then choose a CUDA Toolkit version equal to or lower than what
nvidia-smishowed. - Download and install it.
3. cuDNN (CUDA Deep Neural Network library)
- 📥 Go to: https://developer.nvidia.com/cudnn
- ⚠️ You need a free NVIDIA Developer Program account.
- Download the cuDNN version that matches your installed CUDA Toolkit (e.g., “cuDNN v9.x.x for CUDA 12.x”).
- Choose the “Windows (x86_64) Zip”.
Extract and Copy (Crucial!)
- Extract the cuDNN
.zipfile. - You’ll find folders:
bin,include,lib - Copy all contents from cuDNN’s
binfolder into your CUDA Toolkit’sbinfolder:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin - Do the same for the
includeandlibfolders.
📌 Restart your PC after copying cuDNN files.
🐍 Python 3.9 or Higher
Download:
Install:
- During installation, CHECK THE BOX:
✅ “Add Python.exe to PATH” (on the first screen).
🧬 Git for Windows
Download:
Install:
- The default options are usually fine.
🎞 FFmpeg (For Video & Audio Processing)
Download:
- 📥 https://www.gyan.dev/ffmpeg/builds
- Download
ffmpeg-git-full.7zor.zip.
Extract & Move:
- Extract the archive.
- Rename the inner folder to
ffmpeg. - Move it to:
C:\ffmpeg
Add to PATH (Crucial!):
-
Open Command Prompt as Administrator
-
Paste and run:
setx /M PATH "C:\ffmpeg\bin;%PATH%" -
Close and reopen all Command Prompt / PowerShell windows.
Verify:
- Open a new regular Command Prompt and type:
ffmpeg -version - You should see version info if installed correctly.