Go to file
copilot-swe-agent[bot] c208ba32fb Add encoding='utf-8' to all subprocess.run calls with text=True
Co-authored-by: meizhong986 <148393222+meizhong986@users.noreply.github.com>
2025-11-06 19:22:27 +00:00
.github fixed correct positional argument for init_project pysubtrans 2025-10-29 15:21:03 +00:00
installer new installer build 1.5.1 2025-11-06 17:55:45 +00:00
notebook Updated the headline 2025-09-29 16:28:20 +01:00
tests Fix CUDA version comparison in preflight check 2025-11-06 18:08:20 +00:00
whisperjav Add encoding='utf-8' to all subprocess.run calls with text=True 2025-11-06 19:22:27 +00:00
.gitignore Version 1.5.1 cleanup 2025-10-31 14:33:30 +00:00
LICENSE Initial commit 2023-10-18 21:37:14 +01:00
MANIFEST.in Recover translate module from bytecode (post-crash) 2025-10-18 11:27:57 +01:00
README.md Complete PyWebView GUI takeover - remove Tkinter GUI 2025-10-31 13:50:37 +00:00
RELEASE_NOTES_v1.5.1.md new installer build 1.5.1 2025-11-06 17:55:45 +00:00
pyproject.toml Version 1.1.1 2025-07-03 16:02:40 +01:00
requirements.txt wip saved 2025-10-20 15:02:57 +01:00
setup.py Complete PyWebView GUI takeover - remove Tkinter GUI 2025-10-31 13:50:37 +00:00

README.md

WhisperJAV - Japanese Adult Video Subtitle Generator

Version Python License

WhisperJAV is a subtitle generation tool optimized for Japanese Adult Videos (JAV). It uses custom enhancements specifically tailored for the audio characteristics, and sound patterns in JAV media.

🌟 Key Features

  • Three Processing Modes: Optimized pipelines for different content types and quality requirements.
  • Japanese Language Processing: Custom post-processing for natural dialogue segmentation.
  • Scene Detection: Automatic scene splitting for better transcription accuracy.
  • VAD Integration: Voice Activity Detection for improved speech recognition.
  • Hallucination Removal: Specialized filters for common JAV transcription errors.
  • AI Translation: Built-in subtitle translation powered by DeepSeek, Gemini, Claude, and more.
  • GUI and CLI: User-friendly interface and command-line options.
  • Batch Processing: Process multiple files with progress tracking.

📋 Table of Contents

🔧 Installation

Prerequisites

Please see the details at the end of this readme for more details.

  • Python 3.9 - 3.12 (Python 3.13+ is not compatible with openai-whisper)
  • CUDA-capable GPU, drivers, CUDA Toolkit, cuDNN (CUDA > 11.7)
  • CUDA-version of pytorch, torchaudio and torchvision
  • FFmpeg installed and in your system's PATH
  • PIP and git installation packages

Install from Source


# Standard installation (RECOMMENDED - use the latest commit from main)
pip install git+https://github.com/meizhong986/whisperjav.git@main



# For users with existing installations, Update:
pip install -U --no-deps git+https://github.com/meizhong986/whisperjav.git@main



### ⚠️ Important Note
Please make sure that you have installed cuda enabled pytorch, and pyaudio before installing whisperjav. Otherwise, openai-whisper will automatically installs a CPU torch version which is 8 times slower. You don't want that!!! 
Example for CUDA 12.4 torch 2.5.1 (the version WhisperJAv has been tested for): 
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Dependencies

The main dependencies will be automatically installed:

  • openai-whisper or faster-whisper
  • stable-ts
  • torch (with CUDA support)
  • pysrt
  • tqdm
  • numpy
  • soundfile

🚀 Quick Start

Command Line

# Basic usage with default settings
whisperjav video.mp4

# Specify mode and output directory
whisperjav audio.wav --mode faster --output-dir ./subtitles

# Process multiple files with specific sensitivity
whisperjav *.mp3 --mode balanced --sensitivity aggressive

# Generate English subtitles
whisperjav video.mp4 --subs-language english-direct

GUI

whisperjav-gui

🌐 AI Translation

WhisperJAV includes built-in AI-powered subtitle translation via whisperjav-translate.

Quick Translation Setup

  1. Get an API key from DeepSeek (recommended, ~$0.10 per 100k tokens)

  2. Set your API key:

    # Windows (PowerShell)
    $env:DEEPSEEK_API_KEY="sk-..."
    
    # Linux/Mac
    export DEEPSEEK_API_KEY="sk-..."
    
  3. Translate subtitles:

    # Standalone translation
    whisperjav-translate -i movie.srt
    
    # Generate and translate in one command
    whisperjav video.mp4 --translate
    

Translation Features

  • Multiple AI Providers: DeepSeek (default), Gemini, Claude, GPT-4, OpenRouter
  • Smart Caching: Instructions fetched from Gist with local caching
  • Settings File: Save your preferences for repeated use
  • Tone Styles: Standard or Pornify (explicit content) translation styles
  • Multiple Languages: Translate to English, Spanish, Chinese, Indonesian
  • Metadata-aware prompts: Optional movie title, plot, and actress name to improve translation
  • Sampling controls: Temperature and top_p supported; pornify tone applies sensible defaults

Translation Examples

# Translate to Spanish with pornify style
whisperjav-translate -i movie.srt -t spanish --tone pornify

# Use Gemini provider
whisperjav-translate -i movie.srt --provider gemini

# Configure translation preferences interactively
whisperjav-translate --configure

# View API key setup instructions
whisperjav-translate --print-env

# Advanced: Pass provider-specific options
whisperjav-translate -i movie.srt --provider-option temperature=0.7 --provider-option max_tokens=2000

# Generate subtitles and translate together
whisperjav video.mp4 --translate --translate-provider deepseek

# Use local custom instructions instead of defaults
whisperjav-translate -i movie.srt --instructions-file C:\path\to\my_instructions.txt

# Add helpful metadata context (optional)
whisperjav-translate -i movie.srt \
  --movie-title "JAV-123: After-work Massage" \
  --actress "Yua Mikami" \
  --movie-plot "Office worker gets an after-hours massage that turns intimate"

# Explicitly set sampling params (override tone defaults)
whisperjav-translate -i movie.srt --temperature 0.5 --top-p 0.85

Configuration

WhisperJAV-Translate supports persistent settings to avoid repeating common flags:

# Interactive configuration wizard (RECOMMENDED)
whisperjav-translate --configure

# View API key setup instructions
whisperjav-translate --print-env

# Show current settings
whisperjav-translate --show-settings

Settings are stored in:

  • Windows: %AppData%\WhisperJAV\translate\settings.json
  • Linux: ~/.config/WhisperJAV/translate/settings.json
  • Mac: ~/Library/Application Support/WhisperJAV/translate/settings.json

Configuration Precedence (highest to lowest):

  1. CLI flags (e.g., --provider gemini)
  2. Environment variables (e.g., DEEPSEEK_API_KEY)
  3. Settings file
  4. Built-in defaults

Instructions source precedence:

  • Local file via --instructions-file
  • Settings file mapping for source/tone
  • Default remote Gist (with ETag cache) → local cache → bundled fallback

Supported Providers

Provider Cost Setup API Key Env Var
DeepSeek (default) ~$0.10/100k tokens platform.deepseek.com DEEPSEEK_API_KEY
Gemini Free tier available makersuite.google.com GEMINI_API_KEY
Claude Pay-as-you-go console.anthropic.com ANTHROPIC_API_KEY
GPT-4 Pay-as-you-go platform.openai.com OPENAI_API_KEY
OpenRouter Varies openrouter.ai OPENROUTER_API_KEY

For detailed translation options, run: whisperjav-translate --help

📊 Processing Modes Guide

Choose the appropriate mode based on your content type and requirements:

Mode Best For Characteristics Processing Speed Accuracy
Faster • Whole-file runs where speed matters • FasterWhisper backend
• No scene splitting
• Internal VAD (StableTS packed)
• Batched inference (higher throughput)
Fast Adequate
Fast • Mixed content quality with variable audio • FasterWhisper backend
• Scene detection enabled (mandatory splitting)
• Internal VAD (StableTS packed)
• Nonbatched inference (batch_size=1)
Medium Satisfactory
Balanced • Max accuracy for dialogue timing and noisy audio • Scene detection + separate VAD + WhisperPro (OpenAI Whisper)
• Most accurate timestamps Slower Good

Content-Specific Recommendations

Genre Recommended Mode Recommended Sensitivity
Drama/Dialogue Heavy balanced aggressive
Group/3p/4p Scenes faster conservative
Amateur/Homemade fast conservative
Vintage (pre-2000) fast balanced
ASMR/VR Content balanced aggressive
Compilation/Omnibus faster conservative
Heavy Background Music balanced conservative
Outdoor/Public Scenes fast balanced

🎚️ Sensitivity Settings

The sensitivity parameter controls the trade-off between capturing detail and avoiding noise/hallucinations:

Conservative

  • Fewer false positives: Reduces hallucinated text and repetitions.
  • Higher confidence threshold: Only includes clearly spoken words.
  • Best for:
    • Poor audio quality recordings
    • Heavy background noise or music
    • Vintage/degraded content
    • Content with lots of non-speech sounds
  • Trade-off: May miss some quiet or unclear speech.

Balanced (Default)

  • Optimal balance: Good detection with reasonable filtering.
  • Moderate thresholds: Captures most speech while filtering obvious errors.
  • Best for:
    • Standard quality recordings
    • Mixed content types
    • General-purpose transcription
    • First-time users
  • Trade-off: A balanced approach to all aspects.

Aggressive

  • Maximum detail capture: Attempts to transcribe everything.
  • Lower confidence threshold: Includes uncertain segments.
  • Best for:
    • High-quality audio
    • ASMR or whisper content
    • Content where every utterance matters
    • Professional recordings with clear audio
  • Trade-off: May include more false positives and hallucinations.

Sensitivity Selection Matrix

Audio Quality Background Noise Speech Clarity Recommended Sensitivity
Poor High Unclear Conservative
Average Moderate Mixed Balanced
Excellent Low Clear Aggressive
Variable Variable Variable Balanced

🗾 Advanced Japanese Language Features

WhisperJAV includes sophisticated Japanese language processing specifically optimized for adult content dialogue.

Dialogue-Optimized Segmentation

The system uses advanced stable-ts regrouping algorithms customized for Japanese conversational patterns.

# Automatic application of Japanese-specific rules:
# - Sentence-ending particles (ね, よ, わ, の, ぞ, ぜ, さ, か)
# - Polite forms (です, ます, でした, ましょう)
# - Question particles detection
# - Emotional expressions and interjections
# - Casual contractions (ちゃ, じゃ, きゃ)

Specialized Pattern Recognition

  • Aizuchi and Fillers: Automatically identifies and handles:
    • あの, ええと, まあ, なんか (filler words)
    • うん, はい, ええ, そう (acknowledgments)
  • Emotional Expressions: Preserves important non-lexical vocalizations:
    • ああ, うう, はあ, ふう (sighs, moans)
    • Maintains timing for emotional context
  • Dialect Support: Recognizes common dialect patterns:
    • Kansai-ben endings (, , ねん, )
    • Feminine speech patterns (かしら, わね, のよ)
    • Masculine speech patterns (, , だい)

Custom Regrouping Strategies

The system automatically selects appropriate regrouping based on content:

# These are applied automatically based on mode and sensitivity:
--mode balanced      # Applies comprehensive regrouping
--sensitivity aggressive # Includes more nuanced patterns

Timing Optimization for Natural Speech

  • Gap-based merging: Combines segments with natural speech pauses.
  • Punctuation-aware splitting: Respects Japanese punctuation (, , , ).
  • Maximum subtitle duration: Ensures readability (default 7-8 seconds).
  • Minimum duration filtering: Removes micro-segments.

📖 Usage Examples

Basic Transcription

# Generate Japanese subtitles (default)
whisperjav video.mp4

# Generate English translation
whisperjav video.mp4 --subs-language english-direct

Batch Processing

# Process an entire directory
whisperjav /path/to/videos/*.mp4 --output-dir ./output

# Process with specific settings
whisperjav *.mp4 --mode balanced --sensitivity aggressive --output-dir ./subs

Advanced Options

# Keep temporary files for debugging
whisperjav video.mp4 --keep-temp

# Enable all enhancement features
whisperjav video.mp4 --adaptive-classification --adaptive-audio-enhancement --smart-postprocessing

# Use a custom configuration file
whisperjav video.mp4 --config my_config.json

# Specify a different Whisper model (WiP)
whisperjav video.mp4 --model large-v2

Output Options

# Save processing statistics to a file
whisperjav video.mp4 --stats-file stats.json

# Disable progress bars
whisperjav video.mp4 --no-progress

# Use a custom temporary directory (e.g., on a fast SSD)
whisperjav video.mp4 --temp-dir /fast/ssd/temp

⚙️ Configuration

Configuration File Format (Work in Progress --subject to change)

Create a custom config.json to override default settings:

{
  "modes": {
    "balanced": {
      "scene_detection": {
        "max_duration": 30.0,
        "min_duration": 0.2,
        "max_silence": 2.0
      },
      "vad_options": {
        "threshold": 0.4,
        "min_speech_duration_ms": 150
      }
    }
  },
  "sensitivity_profiles": {
    "aggressive": {
      "hallucination_threshold": 0.8,
      "repetition_threshold": 3,
      "min_confidence": 0.5
    }
  }
}

🖥️ GUI Interface

The PyWebView-based GUI provides a modern, responsive interface for users who prefer not to use the command line.

Features

  • Modern HTML/CSS/JS interface with professional look and feel
  • Drag-and-drop file and folder selection
  • Real-time progress monitoring and log streaming
  • Visual mode and sensitivity selection with descriptions
  • Advanced settings in tabbed interface
  • Keyboard shortcuts (Ctrl+O, Ctrl+R, F1, Esc, F5)
  • Console output display with real-time updates

System Requirements

  • Windows: Requires WebView2 runtime (automatically installed with Microsoft Edge browser)
  • macOS: Uses native WebKit (built-in)
  • Linux: Uses GTK WebKit2

For detailed GUI usage instructions, see GUI_USER_GUIDE.md.

Status of Adaptive Features (WIP)

The following optional features are present in the UI/CLI switches but are currently work in progress and not yet fully functional end-to-end:

  • Adaptive scene classification (--adaptive-classification)
  • Adaptive audio enhancement (--adaptive-audio-enhancement)
  • Smart postprocessing (--smart-postprocessing)

You can toggle them, but expect incomplete behavior or no effect in some pipelines. Well remove this note once theyre productionready.

GUI Quick Start

  1. Launch the GUI: whisperjav-gui
  2. Select files using the "Add Files" button or drag and drop.
  3. Choose your Processing Mode (Faster/Fast/Balanced).
  4. Select the Sensitivity (Conservative/Balanced/Aggressive).
  5. Choose the Output Language (Japanese/English).
  6. Click "Start" to begin processing.

🔍 Troubleshooting

Common Issues

    ```
  • Issue: FFmpeg not found
    • Solution: Install FFmpeg and ensure it's in your system's PATH.
      # Ubuntu/Debian
      sudo apt install ffmpeg
      # Windows (using Chocolatey)
      choco install ffmpeg
      # macOS (using Homebrew)
      brew install ffmpeg
      
  • Issue: Slow processing or GUI looks hanged
    • Solution: Often it is caused by wrong pytorch version
      Remove CPU version of pytorch if exist
      

Performance Tips

  • GPU Acceleration: Ensure CUDA is properly installed for a 3-5x speed improvement.
  • SSD Storage: Use an SSD for temporary files via the --temp-dir argument for faster I/O.
  • Batch Processing: Process multiple files in one run to avoid reloading the model for each file.
  • Memory Usage: Close other memory-intensive applications when processing large files.

🤝 Contributing

We welcome contributions! Please see our CONTRIBUTING.md for details on how to get started.

Development Setup

git clone [https://github.com/yourusername/whisperjav.git](https://github.com/yourusername/whisperjav.git)
cd whisperjav
# Install in editable mode with development dependencies
pip install -e .[dev]
# Run tests
python -m pytest tests/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • The OpenAI Whisper team for the base ASR technology.
  • The stable-ts project for enhanced timestamp features.
  • The faster-whisper project for optimized inference.
  • The JAV transcription community for their invaluable feedback and testing.

⚠️ Disclaimer

This tool is designed for creating accessibility subtitles and for use as a language-learning material. Users are solely responsible for compliance with all applicable local and international laws and regulations regarding the content they choose to process.

Prerequisites:

Tools You MUST Install First (Prerequisites)

Install these in the order listed. If you already have them, ensure they are up-to-date.


🎮 NVIDIA CUDA Platform (Drivers, CUDA Toolkit, cuDNN)

What you need:
Your NVIDIA Graphics Card Drivers, the CUDA Toolkit, and the cuDNN library.
All three are essential for WhisperJAV to use your GPU.

🔧 How to install:

1. NVIDIA Graphics Driver

2. CUDA Toolkit

  • Open Command Prompt (CMD) and type:
    nvidia-smi
    
  • Note the CUDA Version: (e.g., 12.3).
  • 📥 Go to: https://developer.nvidia.com/cuda-downloads
  • Select Windows, then choose a CUDA Toolkit version equal to or lower than what nvidia-smi showed.
  • Download and install it.

3. cuDNN (CUDA Deep Neural Network library)

  • 📥 Go to: https://developer.nvidia.com/cudnn
  • ⚠️ You need a free NVIDIA Developer Program account.
  • Download the cuDNN version that matches your installed CUDA Toolkit (e.g., “cuDNN v9.x.x for CUDA 12.x”).
  • Choose the “Windows (x86_64) Zip”.

Extract and Copy (Crucial!)

  • Extract the cuDNN .zip file.
  • Youll find folders: bin, include, lib
  • Copy all contents from cuDNNs bin folder into your CUDA Toolkits bin folder:
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin
    
  • Do the same for the include and lib folders.

📌 Restart your PC after copying cuDNN files.


🐍 Python 3.9 or Higher

Download:

Install:

  • During installation, CHECK THE BOX:
    “Add Python.exe to PATH” (on the first screen).

🧬 Git for Windows

Download:

Install:

  • The default options are usually fine.

🎞 FFmpeg (For Video & Audio Processing)

Download:

Extract & Move:

  • Extract the archive.
  • Rename the inner folder to ffmpeg.
  • Move it to:
    C:\ffmpeg
    

Add to PATH (Crucial!):

  1. Open Command Prompt as Administrator

  2. Paste and run:

    setx /M PATH "C:\ffmpeg\bin;%PATH%"
    
  3. Close and reopen all Command Prompt / PowerShell windows.

Verify:

  • Open a new regular Command Prompt and type:
    ffmpeg -version
    
  • You should see version info if installed correctly.