Go to file

copilot-swe-agent[bot] c208ba32fb Add encoding='utf-8' to all subprocess.run calls with text=True Co-authored-by: meizhong986 <148393222+meizhong986@users.noreply.github.com>		2025-11-06 19:22:27 +00:00
.github	fixed correct positional argument for init_project pysubtrans	2025-10-29 15:21:03 +00:00
installer	new installer build 1.5.1	2025-11-06 17:55:45 +00:00
notebook	Updated the headline	2025-09-29 16:28:20 +01:00
tests	Fix CUDA version comparison in preflight check	2025-11-06 18:08:20 +00:00
whisperjav	Add encoding='utf-8' to all subprocess.run calls with text=True	2025-11-06 19:22:27 +00:00
.gitignore	Version 1.5.1 cleanup	2025-10-31 14:33:30 +00:00
LICENSE	Initial commit	2023-10-18 21:37:14 +01:00
MANIFEST.in	Recover translate module from bytecode (post-crash)	2025-10-18 11:27:57 +01:00
README.md	Complete PyWebView GUI takeover - remove Tkinter GUI	2025-10-31 13:50:37 +00:00
RELEASE_NOTES_v1.5.1.md	new installer build 1.5.1	2025-11-06 17:55:45 +00:00
pyproject.toml	Version 1.1.1	2025-07-03 16:02:40 +01:00
requirements.txt	wip saved	2025-10-20 15:02:57 +01:00
setup.py	Complete PyWebView GUI takeover - remove Tkinter GUI	2025-10-31 13:50:37 +00:00

README.md

WhisperJAV - Japanese Adult Video Subtitle Generator

WhisperJAV is a subtitle generation tool optimized for Japanese Adult Videos (JAV). It uses custom enhancements specifically tailored for the audio characteristics, and sound patterns in JAV media.

🌟 Key Features

Three Processing Modes: Optimized pipelines for different content types and quality requirements.
Japanese Language Processing: Custom post-processing for natural dialogue segmentation.
Scene Detection: Automatic scene splitting for better transcription accuracy.
VAD Integration: Voice Activity Detection for improved speech recognition.
Hallucination Removal: Specialized filters for common JAV transcription errors.
AI Translation: Built-in subtitle translation powered by DeepSeek, Gemini, Claude, and more.
GUI and CLI: User-friendly interface and command-line options.
Batch Processing: Process multiple files with progress tracking.

🔧 Installation

Prerequisites

Please see the details at the end of this readme for more details.

Python 3.9 - 3.12 (Python 3.13+ is not compatible with openai-whisper)
CUDA-capable GPU, drivers, CUDA Toolkit, cuDNN (CUDA > 11.7)
CUDA-version of pytorch, torchaudio and torchvision
FFmpeg installed and in your system's PATH
PIP and git installation packages

Install from Source


# Standard installation (RECOMMENDED - use the latest commit from main)
pip install git+https://github.com/meizhong986/whisperjav.git@main



# For users with existing installations, Update:
pip install -U --no-deps git+https://github.com/meizhong986/whisperjav.git@main



### ⚠️ Important Note
Please make sure that you have installed cuda enabled pytorch, and pyaudio before installing whisperjav. Otherwise, openai-whisper will automatically installs a CPU torch version which is 8 times slower. You don't want that!!! 
Example for CUDA 12.4 torch 2.5.1 (the version WhisperJAv has been tested for): 
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Dependencies

The main dependencies will be automatically installed:

openai-whisper or faster-whisper
stable-ts
torch (with CUDA support)
pysrt
tqdm
numpy
soundfile

🚀 Quick Start

Command Line

# Basic usage with default settings
whisperjav video.mp4

# Specify mode and output directory
whisperjav audio.wav --mode faster --output-dir ./subtitles

# Process multiple files with specific sensitivity
whisperjav *.mp3 --mode balanced --sensitivity aggressive

# Generate English subtitles
whisperjav video.mp4 --subs-language english-direct

GUI

whisperjav-gui

🌐 AI Translation

WhisperJAV includes built-in AI-powered subtitle translation via whisperjav-translate.

Quick Translation Setup

Get an API key from DeepSeek (recommended, ~$0.10 per 100k tokens)

Set your API key:

# Windows (PowerShell)
$env:DEEPSEEK_API_KEY="sk-..."

# Linux/Mac
export DEEPSEEK_API_KEY="sk-..."

Translate subtitles:

# Standalone translation
whisperjav-translate -i movie.srt

# Generate and translate in one command
whisperjav video.mp4 --translate

Translation Features

Multiple AI Providers: DeepSeek (default), Gemini, Claude, GPT-4, OpenRouter
Smart Caching: Instructions fetched from Gist with local caching
Settings File: Save your preferences for repeated use
Tone Styles: Standard or Pornify (explicit content) translation styles
Multiple Languages: Translate to English, Spanish, Chinese, Indonesian
Metadata-aware prompts: Optional movie title, plot, and actress name to improve translation
Sampling controls: Temperature and top_p supported; pornify tone applies sensible defaults

Translation Examples

# Translate to Spanish with pornify style
whisperjav-translate -i movie.srt -t spanish --tone pornify

# Use Gemini provider
whisperjav-translate -i movie.srt --provider gemini

# Configure translation preferences interactively
whisperjav-translate --configure

# View API key setup instructions
whisperjav-translate --print-env

# Advanced: Pass provider-specific options
whisperjav-translate -i movie.srt --provider-option temperature=0.7 --provider-option max_tokens=2000

# Generate subtitles and translate together
whisperjav video.mp4 --translate --translate-provider deepseek

# Use local custom instructions instead of defaults
whisperjav-translate -i movie.srt --instructions-file C:\path\to\my_instructions.txt

# Add helpful metadata context (optional)
whisperjav-translate -i movie.srt \
  --movie-title "JAV-123: After-work Massage" \
  --actress "Yua Mikami" \
  --movie-plot "Office worker gets an after-hours massage that turns intimate"

# Explicitly set sampling params (override tone defaults)
whisperjav-translate -i movie.srt --temperature 0.5 --top-p 0.85

Configuration

WhisperJAV-Translate supports persistent settings to avoid repeating common flags:

# Interactive configuration wizard (RECOMMENDED)
whisperjav-translate --configure

# View API key setup instructions
whisperjav-translate --print-env

# Show current settings
whisperjav-translate --show-settings

Settings are stored in:

Windows: %AppData%\WhisperJAV\translate\settings.json
Linux: ~/.config/WhisperJAV/translate/settings.json
Mac: ~/Library/Application Support/WhisperJAV/translate/settings.json

Configuration Precedence (highest to lowest):

CLI flags (e.g., --provider gemini)
Environment variables (e.g., DEEPSEEK_API_KEY)
Settings file
Built-in defaults

Instructions source precedence:

Local file via --instructions-file
Settings file mapping for source/tone
Default remote Gist (with ETag cache) → local cache → bundled fallback

Supported Providers

Provider	Cost	Setup	API Key Env Var
DeepSeek (default)	~$0.10/100k tokens	platform.deepseek.com	`DEEPSEEK_API_KEY`
Gemini	Free tier available	makersuite.google.com	`GEMINI_API_KEY`
Claude	Pay-as-you-go	console.anthropic.com	`ANTHROPIC_API_KEY`
GPT-4	Pay-as-you-go	platform.openai.com	`OPENAI_API_KEY`
OpenRouter	Varies	openrouter.ai	`OPENROUTER_API_KEY`

For detailed translation options, run: whisperjav-translate --help

📊 Processing Modes Guide

Choose the appropriate mode based on your content type and requirements:

Mode	Best For	Characteristics	Processing Speed	Accuracy
Faster	• Whole-file runs where speed matters	• Faster‑Whisper backend • No scene splitting • Internal VAD (Stable‑TS packed) • Batched inference (higher throughput)	⚡⚡⚡ Fast	Adequate
Fast	• Mixed content quality with variable audio	• Faster‑Whisper backend • Scene detection enabled (mandatory splitting) • Internal VAD (Stable‑TS packed) • Non‑batched inference (batch_size=1)	⚡⚡ Medium	Satisfactory
Balanced	• Max accuracy for dialogue timing and noisy audio	• Scene detection + separate VAD + WhisperPro (OpenAI Whisper)
• Most accurate timestamps	⚡ Slower	Good

Content-Specific Recommendations

Genre	Recommended Mode	Recommended Sensitivity
Drama/Dialogue Heavy	balanced	aggressive
Group/3p/4p Scenes	faster	conservative
Amateur/Homemade	fast	conservative
Vintage (pre-2000)	fast	balanced
ASMR/VR Content	balanced	aggressive
Compilation/Omnibus	faster	conservative
Heavy Background Music	balanced	conservative
Outdoor/Public Scenes	fast	balanced

🎚️ Sensitivity Settings

The sensitivity parameter controls the trade-off between capturing detail and avoiding noise/hallucinations:

Conservative

Fewer false positives: Reduces hallucinated text and repetitions.
Higher confidence threshold: Only includes clearly spoken words.
Best for:
- Poor audio quality recordings
- Heavy background noise or music
- Vintage/degraded content
- Content with lots of non-speech sounds
Trade-off: May miss some quiet or unclear speech.

Balanced (Default)

Optimal balance: Good detection with reasonable filtering.
Moderate thresholds: Captures most speech while filtering obvious errors.
Best for:
- Standard quality recordings
- Mixed content types
- General-purpose transcription
- First-time users
Trade-off: A balanced approach to all aspects.

Aggressive

Maximum detail capture: Attempts to transcribe everything.
Lower confidence threshold: Includes uncertain segments.
Best for:
- High-quality audio
- ASMR or whisper content
- Content where every utterance matters
- Professional recordings with clear audio
Trade-off: May include more false positives and hallucinations.

Sensitivity Selection Matrix

Audio Quality	Background Noise	Speech Clarity	Recommended Sensitivity
Poor	High	Unclear	Conservative
Average	Moderate	Mixed	Balanced
Excellent	Low	Clear	Aggressive
Variable	Variable	Variable	Balanced

🗾 Advanced Japanese Language Features

WhisperJAV includes sophisticated Japanese language processing specifically optimized for adult content dialogue.

Dialogue-Optimized Segmentation

The system uses advanced stable-ts regrouping algorithms customized for Japanese conversational patterns.

# Automatic application of Japanese-specific rules:
# - Sentence-ending particles (ね, よ, わ, の, ぞ, ぜ, さ, か)
# - Polite forms (です, ます, でした, ましょう)
# - Question particles detection
# - Emotional expressions and interjections
# - Casual contractions (ちゃ, じゃ, きゃ)

Specialized Pattern Recognition

Aizuchi and Fillers: Automatically identifies and handles:
- あの, ええと, まあ, なんか (filler words)
- うん, はい, ええ, そう (acknowledgments)
Emotional Expressions: Preserves important non-lexical vocalizations:
- ああ, うう, はあ, ふう (sighs, moans)
- Maintains timing for emotional context
Dialect Support: Recognizes common dialect patterns:
- Kansai-ben endings (わ, で, ねん, や)
- Feminine speech patterns (かしら, わね, のよ)
- Masculine speech patterns (ぜ, ぞ, だい)

Custom Regrouping Strategies

The system automatically selects appropriate regrouping based on content:

# These are applied automatically based on mode and sensitivity:
--mode balanced      # Applies comprehensive regrouping
--sensitivity aggressive # Includes more nuanced patterns

Timing Optimization for Natural Speech

Gap-based merging: Combines segments with natural speech pauses.
Punctuation-aware splitting: Respects Japanese punctuation (。, 、, ！, ？).
Maximum subtitle duration: Ensures readability (default 7-8 seconds).
Minimum duration filtering: Removes micro-segments.

📖 Usage Examples

Basic Transcription

# Generate Japanese subtitles (default)
whisperjav video.mp4

# Generate English translation
whisperjav video.mp4 --subs-language english-direct

Batch Processing

# Process an entire directory
whisperjav /path/to/videos/*.mp4 --output-dir ./output

# Process with specific settings
whisperjav *.mp4 --mode balanced --sensitivity aggressive --output-dir ./subs

Advanced Options

# Keep temporary files for debugging
whisperjav video.mp4 --keep-temp

# Enable all enhancement features
whisperjav video.mp4 --adaptive-classification --adaptive-audio-enhancement --smart-postprocessing

# Use a custom configuration file
whisperjav video.mp4 --config my_config.json

# Specify a different Whisper model (WiP)
whisperjav video.mp4 --model large-v2

Output Options

# Save processing statistics to a file
whisperjav video.mp4 --stats-file stats.json

# Disable progress bars
whisperjav video.mp4 --no-progress

# Use a custom temporary directory (e.g., on a fast SSD)
whisperjav video.mp4 --temp-dir /fast/ssd/temp

⚙️ Configuration

Configuration File Format (Work in Progress --subject to change)

Create a custom config.json to override default settings:

{
  "modes": {
    "balanced": {
      "scene_detection": {
        "max_duration": 30.0,
        "min_duration": 0.2,
        "max_silence": 2.0
      },
      "vad_options": {
        "threshold": 0.4,
        "min_speech_duration_ms": 150
      }
    }
  },
  "sensitivity_profiles": {
    "aggressive": {
      "hallucination_threshold": 0.8,
      "repetition_threshold": 3,
      "min_confidence": 0.5
    }
  }
}

🖥️ GUI Interface

The PyWebView-based GUI provides a modern, responsive interface for users who prefer not to use the command line.

Features

Modern HTML/CSS/JS interface with professional look and feel
Drag-and-drop file and folder selection
Real-time progress monitoring and log streaming
Visual mode and sensitivity selection with descriptions
Advanced settings in tabbed interface
Keyboard shortcuts (Ctrl+O, Ctrl+R, F1, Esc, F5)
Console output display with real-time updates

System Requirements

Windows: Requires WebView2 runtime (automatically installed with Microsoft Edge browser)
macOS: Uses native WebKit (built-in)
Linux: Uses GTK WebKit2

For detailed GUI usage instructions, see GUI_USER_GUIDE.md.

Status of Adaptive Features (WIP)

The following optional features are present in the UI/CLI switches but are currently work in progress and not yet fully functional end-to-end:

Adaptive scene classification (--adaptive-classification)
Adaptive audio enhancement (--adaptive-audio-enhancement)
Smart post‑processing (--smart-postprocessing)

You can toggle them, but expect incomplete behavior or no effect in some pipelines. We’ll remove this note once they’re production‑ready.

GUI Quick Start

Launch the GUI: whisperjav-gui
Select files using the "Add Files" button or drag and drop.
Choose your Processing Mode (Faster/Fast/Balanced).
Select the Sensitivity (Conservative/Balanced/Aggressive).
Choose the Output Language (Japanese/English).
Click "Start" to begin processing.

🔍 Troubleshooting

Common Issues

```

Issue: FFmpeg not found

Solution: Install FFmpeg and ensure it's in your system's PATH.

# Ubuntu/Debian
sudo apt install ffmpeg
# Windows (using Chocolatey)
choco install ffmpeg
# macOS (using Homebrew)
brew install ffmpeg

Issue: Slow processing or GUI looks hanged
- Solution: Often it is caused by wrong pytorch version
```
Remove CPU version of pytorch if exist
```

Performance Tips

GPU Acceleration: Ensure CUDA is properly installed for a 3-5x speed improvement.
SSD Storage: Use an SSD for temporary files via the --temp-dir argument for faster I/O.
Batch Processing: Process multiple files in one run to avoid reloading the model for each file.
Memory Usage: Close other memory-intensive applications when processing large files.

🤝 Contributing

We welcome contributions! Please see our CONTRIBUTING.md for details on how to get started.

Development Setup

git clone [https://github.com/yourusername/whisperjav.git](https://github.com/yourusername/whisperjav.git)
cd whisperjav
# Install in editable mode with development dependencies
pip install -e .[dev]
# Run tests
python -m pytest tests/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

The OpenAI Whisper team for the base ASR technology.
The stable-ts project for enhanced timestamp features.
The faster-whisper project for optimized inference.
The JAV transcription community for their invaluable feedback and testing.

⚠️ Disclaimer

This tool is designed for creating accessibility subtitles and for use as a language-learning material. Users are solely responsible for compliance with all applicable local and international laws and regulations regarding the content they choose to process.

Prerequisites:

✅ Tools You MUST Install First (Prerequisites)

Install these in the order listed. If you already have them, ensure they are up-to-date.

🎮 NVIDIA CUDA Platform (Drivers, CUDA Toolkit, cuDNN)

What you need:
Your NVIDIA Graphics Card Drivers, the CUDA Toolkit, and the cuDNN library.
All three are essential for WhisperJAV to use your GPU.

🔧 How to install:

1. NVIDIA Graphics Driver

Ensure you have the latest drivers for your NVIDIA GPU.
📥 Download from: https://www.nvidia.com/drivers

2. CUDA Toolkit

Open Command Prompt (CMD) and type:
```
nvidia-smi
```
Note the CUDA Version: (e.g., 12.3).
📥 Go to: https://developer.nvidia.com/cuda-downloads
Select Windows, then choose a CUDA Toolkit version equal to or lower than what nvidia-smi showed.
Download and install it.

3. cuDNN (CUDA Deep Neural Network library)

📥 Go to: https://developer.nvidia.com/cudnn
⚠️ You need a free NVIDIA Developer Program account.
Download the cuDNN version that matches your installed CUDA Toolkit (e.g., “cuDNN v9.x.x for CUDA 12.x”).
Choose the “Windows (x86_64) Zip”.

Extract and Copy (Crucial!)

Extract the cuDNN .zip file.
You’ll find folders: bin, include, lib
Copy all contents from cuDNN’s bin folder into your CUDA Toolkit’s bin folder:
```
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin
```
Do the same for the include and lib folders.

📌 Restart your PC after copying cuDNN files.

🐍 Python 3.9 or Higher

Download:

📥 https://www.python.org/downloads/windows

Install:

During installation, CHECK THE BOX:
✅ “Add Python.exe to PATH” (on the first screen).

🧬 Git for Windows

Download:

📥 https://git-scm.com/download/win

Install:

The default options are usually fine.

🎞 FFmpeg (For Video & Audio Processing)

Download:

📥 https://www.gyan.dev/ffmpeg/builds
Download ffmpeg-git-full.7z or .zip.

Extract & Move:

Extract the archive.
Rename the inner folder to ffmpeg.
Move it to:
```
C:\ffmpeg
```

Add to PATH (Crucial!):

Open Command Prompt as Administrator
Paste and run:
```
setx /M PATH "C:\ffmpeg\bin;%PATH%"
```
Close and reopen all Command Prompt / PowerShell windows.

Verify:

Open a new regular Command Prompt and type:
```
ffmpeg -version
```
You should see version info if installed correctly.

README.md Unescape Escape

WhisperJAV - Japanese Adult Video Subtitle Generator

🌟 Key Features

📋 Table of Contents

🔧 Installation

Prerequisites

Install from Source

Dependencies

🚀 Quick Start

Command Line

GUI

🌐 AI Translation

Quick Translation Setup

Translation Features

Translation Examples

Configuration

Supported Providers

📊 Processing Modes Guide

Content-Specific Recommendations

🎚️ Sensitivity Settings

Sensitivity Selection Matrix

🗾 Advanced Japanese Language Features

Dialogue-Optimized Segmentation

Specialized Pattern Recognition

Custom Regrouping Strategies

Timing Optimization for Natural Speech

📖 Usage Examples

Basic Transcription

Batch Processing

Advanced Options

Output Options

⚙️ Configuration

Configuration File Format (Work in Progress --subject to change)

🖥️ GUI Interface

Features

System Requirements

Status of Adaptive Features (WIP)

GUI Quick Start

🔍 Troubleshooting

Common Issues

Performance Tips

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

⚠️ Disclaimer

✅ Tools You MUST Install First (Prerequisites)

🎮 NVIDIA CUDA Platform (Drivers, CUDA Toolkit, cuDNN)

🔧 How to install:

1. NVIDIA Graphics Driver

2. CUDA Toolkit

3. cuDNN (CUDA Deep Neural Network library)

🐍 Python 3.9 or Higher

Download:

Install:

🧬 Git for Windows

Download:

Install:

🎞 FFmpeg (For Video & Audio Processing)

Download:

Extract & Move:

Add to PATH (Crucial!):

Verify:

README.md