Merge pull request #3616 from alibaba/feature/sherpa-mnn
android / android_build (push) Has been cancelled Details
ios / ios_build (push) Has been cancelled Details
linux / linux_buil_test (push) Has been cancelled Details
macos / macos_buil_test (push) Has been cancelled Details
windows / windows_build_test (push) Has been cancelled Details

Apps:Feature: Add sherpa-mnn
This commit is contained in:
jxt1234 2025-06-11 19:16:20 +08:00 committed by GitHub
commit 1a3ed2bc14
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
773 changed files with 94869 additions and 0 deletions

5
apps/frameworks/sherpa-mnn/.gitignore vendored Normal file
View File

@ -0,0 +1,5 @@
SourcePackages
build-*
*.xcworkspace
!build-*.sh
*.lock

View File

@ -0,0 +1,475 @@
## 1.10.46
# Fix kokoro lexicon. (#1886)
# speaker-identification-with-vad-non-streaming-asr.py Lack of support for sense_voice. (#1884)
# Fix generating Chinese lexicon for Kokoro TTS 1.0 (#1888)
# Reduce vad-whisper-c-api example code. (#1891)
# JNI Exception Handling (#1452)
# Fix #1901: UnicodeEncodeError running export_bpe_vocab.py (#1902)
# Fix publishing pre-built windows libraries (#1905)
# Fixing Whisper Model Token Normalization (#1904)
# feat: add mic example for better compatibility (#1909)
# Add onnxruntime 1.18.1 for Linux aarch64 GPU (#1914)
# Add C++ API for streaming zipformer ASR on RK NPU (#1908)
# change [1<<28] to [1<<10], to fix build issues on GOARCH=386 that [1<<28] too large (#1916)
# Flutter Config toJson/fromJson (#1893)
# Fix publishing linux pre-built artifacts (#1919)
# go.mod set to use go 1.17, and use unsafe.Slice to optimize the code (#1920)
# fix: AddPunct panic for Go(#1921)
# Fix publishing macos pre-built artifacts (#1922)
# Minor fixes for rknn (#1925)
# Build wheels for rknn linux aarch64 (#1928)
## 1.10.45
* [update] fixed bug: create golang instance succeed while the c struct create failed (#1860)
* fixed typo in RTF calculations (#1861)
* Export FireRedASR to sherpa-onnx. (#1865)
* Add C++ and Python API for FireRedASR AED models (#1867)
* Add Kotlin and Java API for FireRedAsr AED model (#1870)
* Add C API for FireRedAsr AED model. (#1871)
* Add CXX API for FireRedAsr (#1872)
* Add JavaScript API (node-addon) for FireRedAsr (#1873)
* Add JavaScript API (WebAssembly) for FireRedAsr model. (#1874)
* Add C# API for FireRedAsr Model (#1875)
* Add C# API for FireRedAsr Model (#1875)
* Add Swift API for FireRedAsr AED Model (#1876)
* Add Dart API for FireRedAsr AED Model (#1877)
* Add Go API for FireRedAsr AED Model (#1879)
* Add Pascal API for FireRedAsr AED Model (#1880)
## 1.10.44
* Export MatchaTTS fa-en model to sherpa-onnx (#1832)
* Add C++ support for MatchaTTS models not from icefall. (#1834)
* OfflineRecognizer supports create stream with hotwords (#1833)
* Add PengChengStarling models to sherpa-onnx (#1835)
* Support specifying voice in espeak-ng for kokoro tts models. (#1836)
* Fix: made print sherpa_onnx_loge when it is in debug mode (#1838)
* Add Go API for audio tagging (#1840)
* Fix CI (#1841)
* Update readme to contain links for pre-built Apps (#1853)
* Modify the model used (#1855)
* Flutter OnlinePunctuation (#1854)
* Fix spliting text by languages for kokoro tts. (#1849)
## 1.10.43
* Add MFC example for Kokoro TTS 1.0 (#1815)
* Update sherpa-onnx-tts.js VitsModelConfig.model can be none (#1817)
* Fix passing gb2312 encoded strings to tts on Windows (#1819)
* Support scaling the duration of a pause in TTS. (#1820)
* Fix building wheels for linux aarch64. (#1821)
* Fix CI for Linux aarch64. (#1822)
## 1.10.42
* Fix publishing wheels (#1746)
* Update README to include https://github.com/xinhecuican/QSmartAssistant (#1755)
* Add Kokoro TTS to MFC examples (#1760)
* Refactor node-addon C++ code. (#1768)
* Add keyword spotter C API for HarmonyOS (#1769)
* Add ArkTS API for Keyword spotting. (#1775)
* Add Flutter example for Kokoro TTS (#1776)
* Initialize the audio session for iOS ASR example (#1786)
* Fix: Prepend 0 to tokenization to prevent word skipping for Kokoro. (#1787)
* Export Kokoro 1.0 to sherpa-onnx (#1788)
* Add C++ and Python API for Kokoro 1.0 multilingual TTS model (#1795)
* Add Java and Koltin API for Kokoro TTS 1.0 (#1798)
* Add Android demo for Kokoro TTS 1.0 (#1799)
* Add C API for Kokoro TTS 1.0 (#1801)
* Add CXX API for Kokoro TTS 1.0 (#1802)
* Add Swift API for Kokoro TTS 1.0 (#1803)
* Add Go API for Kokoro TTS 1.0 (#1804)
* Add C# API for Kokoro TTS 1.0 (#1805)
* Add Dart API for Kokoro TTS 1.0 (#1806)
* Add Pascal API for Kokoro TTS 1.0 (#1807)
* Add JavaScript API (node-addon) for Kokoro TTS 1.0 (#1808)
* Add JavaScript API (WebAssembly) for Kokoro TTS 1.0 (#1809)
* Add Flutter example for Kokoro TTS 1.0 (#1810)
* Add iOS demo for Kokoro TTS 1.0 (#1812)
* Add HarmonyOS demo for Kokoro TTS 1.0 (#1813)
## 1.10.41
* Fix UI for Android TTS Engine. (#1735)
* Add iOS TTS example for MatchaTTS (#1736)
* Add iOS example for Kokoro TTS (#1737)
* Fix dither binding in Pybind11 to ensure independence from high_freq in FeatureExtractorConfig (#1739)
* Fix keyword spotting. (#1689)
* Update readme to include https://github.com/hfyydd/sherpa-onnx-server (#1741)
* Reduce vad-moonshine-c-api example code. (#1742)
* Support Kokoro TTS for HarmonyOS. (#1743)
## 1.10.40
* Fix building wheels (#1703)
* Export kokoro to sherpa-onnx (#1713)
* Add C++ and Python API for Kokoro TTS models. (#1715)
* Add C API for Kokoro TTS models (#1717)
* Fix style issues (#1718)
* Add C# API for Kokoro TTS models (#1720)
* Add Swift API for Kokoro TTS models (#1721)
* Add Go API for Kokoro TTS models (#1722)
* Add Dart API for Kokoro TTS models (#1723)
* Add Pascal API for Kokoro TTS models (#1724)
* Add JavaScript API (node-addon) for Kokoro TTS models (#1725)
* Add JavaScript (WebAssembly) API for Kokoro TTS models. (#1726)
* Add Koltin and Java API for Kokoro TTS models (#1728)
* Update README.md for KWS to not use git lfs. (#1729)
## 1.10.39
* Fix building without TTS (#1691)
* Add README for android libs. (#1693)
* Fix: export-onnx.py(expected all tensors to be on the same device) (#1699)
* Fix passing strings from C# to C. (#1701)
## 1.10.38
* Fix initializing TTS in Python. (#1664)
* Remove spaces after punctuations for TTS (#1666)
* Add constructor fromPtr() for all flutter class with factory ctor. (#1667)
* Add Kotlin API for Matcha-TTS models. (#1668)
* Support Matcha-TTS models using espeak-ng (#1672)
* Add Java API for Matcha-TTS models. (#1673)
* Avoid adding tail padding for VAD in generate-subtitles.py (#1674)
* Add C API for MatchaTTS models (#1675)
* Add CXX API for MatchaTTS models (#1676)
* Add JavaScript API (node-addon-api) for MatchaTTS models. (#1677)
* Add HarmonyOS examples for MatchaTTS. (#1678)
* Upgraded to .NET 8 and made code style a little more internally consistent. (#1680)
* Update workflows to use .NET 8.0 also. (#1681)
* Add C# and JavaScript (wasm) API for MatchaTTS models (#1682)
* Add Android demo for MatchaTTS models. (#1683)
* Add Swift API for MatchaTTS models. (#1684)
* Add Go API for MatchaTTS models (#1685)
* Add Pascal API for MatchaTTS models. (#1686)
* Add Dart API for MatchaTTS models (#1687)
## 1.10.37
* Add new tts models for Latvia and Persian+English (#1644)
* Add a byte-level BPE Chinese+English non-streaming zipformer model (#1645)
* Support removing invalid utf-8 sequences. (#1648)
* Add TeleSpeech CTC to non_streaming_server.py (#1649)
* Fix building macOS libs (#1656)
* Add Go API for Keyword spotting (#1662)
* Add Swift online punctuation (#1661)
* Add C++ runtime for Matcha-TTS (#1627)
## 1.10.36
* Update AAR version in Android Java demo (#1618)
* Support linking onnxruntime statically for Android (#1619)
* Update readme to include Open-LLM-VTuber (#1622)
* Rename maxNumStences to maxNumSentences (#1625)
* Support using onnxruntime 1.16.0 with CUDA 11.4 on Jetson Orin NX (Linux arm64 GPU). (#1630)
* Update readme to include jetson orin nx and nano b01 (#1631)
* feat: add checksum action (#1632)
* Support decoding with byte-level BPE (bbpe) models. (#1633)
* feat: enable c api for android ci (#1635)
* Update README.md (#1640)
* SherpaOnnxVadAsr: Offload runSecondPass to background thread for improved real-time audio processing (#1638)
* Fix GitHub actions. (#1642)
## 1.10.35
* Add missing changes about speaker identfication demo for HarmonyOS (#1612)
* Provide sherpa-onnx.aar for Android (#1615)
* Use aar in Android Java demo. (#1616)
## 1.10.34
* Fix building node-addon package (#1598)
* Update doc links for HarmonyOS (#1601)
* Add on-device real-time ASR demo for HarmonyOS (#1606)
* Add speaker identification APIs for HarmonyOS (#1607)
* Add speaker identification demo for HarmonyOS (#1608)
* Add speaker diarization API for HarmonyOS. (#1609)
* Add speaker diarization demo for HarmonyOS (#1610)
## 1.10.33
* Add non-streaming ASR support for HarmonyOS. (#1564)
* Add streaming ASR support for HarmonyOS. (#1565)
* Fix building for Android (#1568)
* Publish `sherpa_onnx.har` for HarmonyOS (#1572)
* Add VAD+ASR demo for HarmonyOS (#1573)
* Fix publishing har packages for HarmonyOS (#1576)
* Add CI to build HAPs for HarmonyOS (#1578)
* Add microphone demo about VAD+ASR for HarmonyOS (#1581)
* Fix getting microphone permission for HarmonyOS VAD+ASR example (#1582)
* Add HarmonyOS support for text-to-speech. (#1584)
* Fix: support both old and new websockets request headers format (#1588)
* Add on-device tex-to-speech (TTS) demo for HarmonyOS (#1590)
## 1.10.32
* Support cross-compiling for HarmonyOS (#1553)
* HarmonyOS support for VAD. (#1561)
* Fix publishing flutter iOS app to appstore (#1563).
## 1.10.31
* Publish pre-built wheels for Python 3.13 (#1485)
* Publish pre-built macos xcframework (#1490)
* Fix reading tokens.txt on Windows. (#1497)
* Add two-pass ASR Android APKs for Moonshine models. (#1499)
* Support building GPU-capable sherpa-onnx on Linux aarch64. (#1500)
* Publish pre-built wheels with CUDA support for Linux aarch64. (#1507)
* Export the English TTS model from MeloTTS (#1509)
* Add Lazarus example for Moonshine models. (#1532)
* Add isolate_tts demo (#1529)
* Add WebAssembly example for VAD + Moonshine models. (#1535)
* Add Android APK for streaming Paraformer ASR (#1538)
* Support static build for windows arm64. (#1539)
* Use xcframework for Flutter iOS plugin to support iOS simulators.
## 1.10.30
* Fix building node-addon for Windows x86. (#1469)
* Begin to support https://github.com/usefulsensors/moonshine (#1470)
* Publish pre-built JNI libs for Linux aarch64 (#1472)
* Add C++ runtime and Python APIs for Moonshine models (#1473)
* Add Kotlin and Java API for Moonshine models (#1474)
* Add C and C++ API for Moonshine models (#1476)
* Add Swift API for Moonshine models. (#1477)
* Add Go API examples for adding punctuations to text. (#1478)
* Add Go API for Moonshine models (#1479)
* Add JavaScript API for Moonshine models (#1480)
* Add Dart API for Moonshine models. (#1481)
* Add Pascal API for Moonshine models (#1482)
* Add C# API for Moonshine models. (#1483)
## 1.10.29
* Add Go API for offline punctuation models (#1434)
* Support https://huggingface.co/Revai/reverb-diarization-v1 (#1437)
* Add more models for speaker diarization (#1440)
* Add Java API example for hotwords. (#1442)
* Add java android demo (#1454)
* Add C++ API for streaming ASR. (#1455)
* Add C++ API for non-streaming ASR (#1456)
* Handle NaN embeddings in speaker diarization. (#1461)
* Add speaker identification with VAD and non-streaming ASR using ALSA (#1463)
* Support GigaAM CTC models for Russian ASR (#1464)
* Add GigaAM NeMo transducer model for Russian ASR (#1467)
## 1.10.28
* Fix swift example for generating subtitles. (#1362)
* Allow more online models to load tokens file from the memory (#1352)
* Fix CI errors introduced by supporting loading keywords from buffers (#1366)
* Fix running MeloTTS models on GPU. (#1379)
* Support Parakeet models from NeMo (#1381)
* Export Pyannote speaker segmentation models to onnx (#1382)
* Support Agglomerative clustering. (#1384)
* Add Python API for clustering (#1385)
* support whisper turbo (#1390)
* context_state is not set correctly when previous context is passed after reset (#1393)
* Speaker diarization example with onnxruntime Python API (#1395)
* C++ API for speaker diarization (#1396)
* Python API for speaker diarization. (#1400)
* C API for speaker diarization (#1402)
* docs(nodejs-addon-examples): add guide for pnpm user (#1401)
* Go API for speaker diarization (#1403)
* Swift API for speaker diarization (#1404)
* Update readme to include more external projects using sherpa-onnx (#1405)
* C# API for speaker diarization (#1407)
* JavaScript API (node-addon) for speaker diarization (#1408)
* WebAssembly exmaple for speaker diarization (#1411)
* Handle audio files less than 10s long for speaker diarization. (#1412)
* JavaScript API with WebAssembly for speaker diarization (#1414)
* Kotlin API for speaker diarization (#1415)
* Java API for speaker diarization (#1416)
* Dart API for speaker diarization (#1418)
* Pascal API for speaker diarization (#1420)
* Android JNI support for speaker diarization (#1421)
* Android demo for speaker diarization (#1423)
## 1.10.27
* Add non-streaming ONNX models for Russian ASR (#1358)
* Fix building Flutter TTS examples for Linux (#1356)
* Support passing utf-8 strings from JavaScript to C++. (#1355)
* Fix sherpa_onnx.go to support returning empty recognition results (#1353)
## 1.10.26
* Add links to projects using sherpa-onnx. (#1345)
* Support lang/emotion/event results from SenseVoice in Swift API. (#1346)
* Support specifying max speech duration for VAD. (#1348)
* Add APIs about max speech duration in VAD for various programming languages (#1349)
## 1.10.25
* Allow tokens and hotwords to be loaded from buffered string driectly (#1339)
* Fix computing features for CED audio tagging models. (#1341)
* Preserve previous result as context for next segment (#1335)
* Add Python binding for online punctuation models (#1312)
* Fix vad.Flush(). (#1329)
* Fix wasm app for streaming paraformer (#1328)
* Build websocket related binaries for embedded systems. (#1327)
* Fixed the C api calls and created the TTS project file (#1324)
* Re-implement LM rescore for online transducer (#1231)
## 1.10.24
* Add VAD and keyword spotting for the Node package with WebAssembly (#1286)
* Fix releasing npm package and fix building Android VAD+ASR example (#1288)
* add Tokens []string, Timestamps []float32, Lang string, Emotion string, Event string (#1277)
* add vad+sense voice example for C API (#1291)
* ADD VAD+ASR example for dart with CircularBuffer. (#1293)
* Fix VAD+ASR example for Dart API. (#1294)
* Avoid SherpaOnnxSpeakerEmbeddingManagerFreeBestMatches freeing null. (#1296)
* Fix releasing wasm app for vad+asr (#1300)
* remove extra files from linux/macos/windows jni libs (#1301)
* two-pass Android APK for SenseVoice (#1302)
* Downgrade flutter sdk versions. (#1305)
* Reduce onnxruntime log output. (#1306)
* Provide prebuilt .jar files for different java versions. (#1307)
## 1.10.23
* flutter: add lang, emotion, event to OfflineRecognizerResult (#1268)
* Use a separate thread to initialize models for lazarus examples. (#1270)
* Object pascal examples for recording and playing audio with portaudio. (#1271)
* Text to speech API for Object Pascal. (#1273)
* update kotlin api for better release native object and add user-friendly apis. (#1275)
* Update wave-reader.cc to support 8/16/32-bit waves (#1278)
* Add WebAssembly for VAD (#1281)
* WebAssembly example for VAD + Non-streaming ASR (#1284)
## 1.10.22
* Add Pascal API for reading wave files (#1243)
* Pascal API for streaming ASR (#1246)
* Pascal API for non-streaming ASR (#1247)
* Pascal API for VAD (#1249)
* Add more C API examples (#1255)
* Add emotion, event of SenseVoice. (#1257)
* Support reading multi-channel wave files with 8/16/32-bit encoded samples (#1258)
* Enable IPO only for Release build. (#1261)
* Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR (#1251)
* Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266)
## 1.10.21
* Fix ffmpeg c api example (#1185)
* Fix splitting sentences for MeloTTS (#1186)
* Non-streaming WebSocket client for Java. (#1190)
* Fix copying asset files for flutter examples. (#1191)
* Add Chinese+English tts example for flutter (#1192)
* Add speaker identification and verification exmaple for Dart API (#1194)
* Fix reading non-standard wav files. (#1199)
* Add ReazonSpeech Japanese pre-trained model (#1203)
* Describe how to add new words for MeloTTS models (#1209)
* Remove libonnxruntime_providers_cuda.so as a dependency. (#1210)
* Fix setting SenseVoice language. (#1214)
* Support passing TTS callback in Swift API (#1218)
* Add MeloTTS example for ios (#1223)
* Add online punctuation and casing prediction model for English language (#1224)
* Fix python two pass ASR examples (#1230)
* Add blank penalty for various language bindings
## 1.10.20
* Add Dart API for audio tagging
* Add Dart API for adding punctuations to text
## 1.10.19
* Prefix all C API functions with SherpaOnnx
## 1.10.18
* Fix the case when recognition results contain the symbol `"`. It caused
issues when converting results to a json string.
## 1.10.17
* Support SenseVoice CTC models.
* Add Dart API for keyword spotter.
## 1.10.16
* Support zh-en TTS model from MeloTTS.
## 1.10.15
* Downgrade onnxruntime from v1.18.1 to v1.17.1
## 1.10.14
* Support whisper large v3
* Update onnxruntime from v1.18.0 to v1.18.1
* Fix invalid utf8 sequence from Whisper for Dart API.
## 1.10.13
* Update onnxruntime from 1.17.1 to 1.18.0
* Add C# API for Keyword spotting
## 1.10.12
* Add Flush to VAD so that the last speech segment can be detected. See also
https://github.com/k2-fsa/sherpa-onnx/discussions/1077#discussioncomment-9979740
## 1.10.11
* Support the iOS platform for Flutter.
## 1.10.10
* Build sherpa-onnx into a single shared library.
## 1.10.9
* Fix released packages. piper-phonemize was not included in v1.10.8.
## 1.10.8
* Fix released packages. There should be a lib directory.
## 1.10.7
* Support Android for Flutter.
## 1.10.2
* Fix passing C# string to C++
## 1.10.1
* Enable to stop TTS generation
## 1.10.0
* Add inverse text normalization
## 1.9.30
* Add TTS
## 1.9.29
* Publish with CI
## 0.0.3
* Fix path separator on Windows.
## 0.0.2
* Support specifying lib path.
## 0.0.1
* Initial release.

View File

@ -0,0 +1,452 @@
cmake_minimum_required(VERSION 3.13 FATAL_ERROR)
set(CMAKE_OSX_DEPLOYMENT_TARGET "10.14" CACHE STRING "Minimum OS X deployment version. Used only for macOS")
set(CMAKE_POLICY_DEFAULT_CMP0063 NEW)
set(CMAKE_POLICY_DEFAULT_CMP0069 NEW)
project(sherpa-mnn)
message(STATUS "MNN's dir: ${MNN_LIB_DIR}")
include_directories(${MNN_LIB_DIR}/include)
link_directories(${MNN_LIB_DIR}/lib)
# Remember to update
# ./CHANGELOG.md
# ./new-release.sh
set(SHERPA_MNN_VERSION "1.10.46")
# Disable warning about
#
# "The DOWNLOAD_EXTRACT_TIMESTAMP option was not given and policy CMP0135 is
# not set.
if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
cmake_policy(SET CMP0135 NEW)
endif()
option(SHERPA_MNN_ENABLE_PYTHON "Whether to build Python" OFF)
option(SHERPA_MNN_ENABLE_TESTS "Whether to build tests" OFF)
option(SHERPA_MNN_ENABLE_CHECK "Whether to build with assert" OFF)
option(BUILD_SHARED_LIBS "Whether to build shared libraries" OFF)
option(SHERPA_MNN_ENABLE_PORTAUDIO "Whether to build with portaudio" ON)
option(SHERPA_MNN_ENABLE_JNI "Whether to build JNI internface" OFF)
option(SHERPA_MNN_ENABLE_C_API "Whether to build C API" ON)
option(SHERPA_MNN_ENABLE_WEBSOCKET "Whether to build webscoket server/client" ON)
option(SHERPA_MNN_ENABLE_GPU "Enable ONNX Runtime GPU support" OFF)
option(SHERPA_MNN_ENABLE_DIRECTML "Enable ONNX Runtime DirectML support" OFF)
option(SHERPA_MNN_ENABLE_WASM "Whether to enable WASM" OFF)
option(SHERPA_MNN_ENABLE_WASM_SPEAKER_DIARIZATION "Whether to enable WASM for speaker diarization" OFF)
option(SHERPA_MNN_ENABLE_WASM_TTS "Whether to enable WASM for TTS" OFF)
option(SHERPA_MNN_ENABLE_WASM_ASR "Whether to enable WASM for ASR" OFF)
option(SHERPA_MNN_ENABLE_WASM_KWS "Whether to enable WASM for KWS" OFF)
option(SHERPA_MNN_ENABLE_WASM_VAD "Whether to enable WASM for VAD" OFF)
option(SHERPA_MNN_ENABLE_WASM_VAD_ASR "Whether to enable WASM for VAD+ASR" OFF)
option(SHERPA_MNN_ENABLE_WASM_NODEJS "Whether to enable WASM for NodeJS" OFF)
option(SHERPA_MNN_ENABLE_BINARY "Whether to build binaries" ON)
option(SHERPA_MNN_ENABLE_TTS "Whether to build TTS related code" ON)
option(SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION "Whether to build speaker diarization related code" ON)
option(SHERPA_MNN_LINK_LIBSTDCPP_STATICALLY "True to link libstdc++ statically. Used only when BUILD_SHARED_LIBS is OFF on Linux" ON)
option(SHERPA_MNN_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE "True to use pre-installed onnxruntime if available" ON)
option(SHERPA_MNN_ENABLE_SANITIZER "Whether to enable ubsan and asan" OFF)
option(SHERPA_MNN_BUILD_C_API_EXAMPLES "Whether to enable C API examples" ON)
option(SHERPA_MNN_ENABLE_RKNN "Whether to build for RKNN NPU " OFF)
set(SHERPA_MNN_LINUX_ARM64_GPU_ONNXRUNTIME_VERSION "1.11.0" CACHE STRING "Used only for Linux ARM64 GPU. If you use Jetson nano b01, then please set it to 1.11.0. If you use Jetson Orin NX, then set it to 1.16.0.If you use NVIDIA Jetson Orin Nano Engineering Reference Developer Kit
Super - Jetpack 6.2 [L4T 36.4.3], then set it to 1.18.1")
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin")
if(NOT WIN32)
set(CMAKE_SKIP_BUILD_RPATH FALSE)
set(BUILD_RPATH_USE_ORIGIN TRUE)
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
endif()
if(NOT APPLE)
set(SHERPA_MNN_RPATH_ORIGIN "$ORIGIN")
else()
set(SHERPA_MNN_RPATH_ORIGIN "@loader_path")
endif()
if(NOT WIN32)
set(CMAKE_INSTALL_RPATH ${SHERPA_MNN_RPATH_ORIGIN})
set(CMAKE_BUILD_RPATH ${SHERPA_MNN_RPATH_ORIGIN})
endif()
if(NOT CMAKE_BUILD_TYPE)
message(STATUS "No CMAKE_BUILD_TYPE given, default to Release")
set(CMAKE_BUILD_TYPE Release)
endif()
if(DEFINED ANDROID_ABI AND NOT SHERPA_MNN_ENABLE_JNI AND NOT SHERPA_MNN_ENABLE_C_API)
message(STATUS "Set SHERPA_MNN_ENABLE_JNI to ON for Android")
set(SHERPA_MNN_ENABLE_JNI ON CACHE BOOL "" FORCE)
endif()
if(SHERPA_MNN_ENABLE_PYTHON AND NOT BUILD_SHARED_LIBS)
message(STATUS "Set BUILD_SHARED_LIBS to ON since SHERPA_MNN_ENABLE_PYTHON is ON")
set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
endif()
if(SHERPA_MNN_ENABLE_GPU)
message(WARNING "\
Compiling for NVIDIA GPU is enabled. Please make sure cudatoolkit
is installed on your system. Otherwise, you will get errors at runtime.
Hint: You don't need sudo permission to install CUDA toolkit. Please refer to
https://k2-fsa.github.io/k2/installation/cuda-cudnn.html
to install CUDA toolkit if you have not installed it.")
if(NOT BUILD_SHARED_LIBS)
message(STATUS "Set BUILD_SHARED_LIBS to ON since SHERPA_MNN_ENABLE_GPU is ON")
set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
endif()
endif()
if(SHERPA_MNN_ENABLE_DIRECTML)
message(WARNING "\
Compiling with DirectML enabled. Please make sure Windows 10 SDK
is installed on your system. Otherwise, you will get errors at runtime.
Please refer to
https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html#requirements
to install Windows 10 SDK if you have not installed it.")
if(NOT BUILD_SHARED_LIBS)
message(STATUS "Set BUILD_SHARED_LIBS to ON since SHERPA_MNN_ENABLE_DIRECTML is ON")
set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
endif()
endif()
# see https://cmake.org/cmake/help/latest/prop_tgt/MSVC_RUNTIME_LIBRARY.html
# https://stackoverflow.com/questions/14172856/compile-with-mt-instead-of-md-using-cmake
if(MSVC)
add_compile_options(
$<$<CONFIG:>:/MT> #---------|
$<$<CONFIG:Debug>:/MTd> #---|-- Statically link the runtime libraries
$<$<CONFIG:Release>:/MT> #--|
$<$<CONFIG:RelWithDebInfo>:/MT>
$<$<CONFIG:MinSizeRel>:/MT>
)
endif()
if(CMAKE_SYSTEM_NAME STREQUAL OHOS)
set(CMAKE_CXX_FLAGS "-Wno-unused-command-line-argument ${CMAKE_CXX_FLAGS}")
set(CMAKE_C_FLAGS "-Wno-unused-command-line-argument ${CMAKE_C_FLAGS}")
endif()
message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}")
message(STATUS "BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS}")
message(STATUS "SHERPA_MNN_ENABLE_PYTHON ${SHERPA_MNN_ENABLE_PYTHON}")
message(STATUS "SHERPA_MNN_ENABLE_TESTS ${SHERPA_MNN_ENABLE_TESTS}")
message(STATUS "SHERPA_MNN_ENABLE_CHECK ${SHERPA_MNN_ENABLE_CHECK}")
message(STATUS "SHERPA_MNN_ENABLE_PORTAUDIO ${SHERPA_MNN_ENABLE_PORTAUDIO}")
message(STATUS "SHERPA_MNN_ENABLE_JNI ${SHERPA_MNN_ENABLE_JNI}")
message(STATUS "SHERPA_MNN_ENABLE_C_API ${SHERPA_MNN_ENABLE_C_API}")
message(STATUS "SHERPA_MNN_ENABLE_WEBSOCKET ${SHERPA_MNN_ENABLE_WEBSOCKET}")
message(STATUS "SHERPA_MNN_ENABLE_GPU ${SHERPA_MNN_ENABLE_GPU}")
message(STATUS "SHERPA_MNN_ENABLE_WASM ${SHERPA_MNN_ENABLE_WASM}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_SPEAKER_DIARIZATION ${SHERPA_MNN_ENABLE_WASM_SPEAKER_DIARIZATION}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_TTS ${SHERPA_MNN_ENABLE_WASM_TTS}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_ASR ${SHERPA_MNN_ENABLE_WASM_ASR}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_KWS ${SHERPA_MNN_ENABLE_WASM_KWS}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_VAD ${SHERPA_MNN_ENABLE_WASM_VAD}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_VAD_ASR ${SHERPA_MNN_ENABLE_WASM_VAD_ASR}")
message(STATUS "SHERPA_MNN_ENABLE_WASM_NODEJS ${SHERPA_MNN_ENABLE_WASM_NODEJS}")
message(STATUS "SHERPA_MNN_ENABLE_BINARY ${SHERPA_MNN_ENABLE_BINARY}")
message(STATUS "SHERPA_MNN_ENABLE_TTS ${SHERPA_MNN_ENABLE_TTS}")
message(STATUS "SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION ${SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION}")
message(STATUS "SHERPA_MNN_LINK_LIBSTDCPP_STATICALLY ${SHERPA_MNN_LINK_LIBSTDCPP_STATICALLY}")
message(STATUS "SHERPA_MNN_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE ${SHERPA_MNN_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE}")
message(STATUS "SHERPA_MNN_ENABLE_SANITIZER: ${SHERPA_MNN_ENABLE_SANITIZER}")
message(STATUS "SHERPA_MNN_BUILD_C_API_EXAMPLES: ${SHERPA_MNN_BUILD_C_API_EXAMPLES}")
message(STATUS "SHERPA_MNN_ENABLE_RKNN: ${SHERPA_MNN_ENABLE_RKNN}")
if(BUILD_SHARED_LIBS OR SHERPA_MNN_ENABLE_JNI)
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
set(CMAKE_VISIBILITY_INLINES_HIDDEN 1)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
endif()
if(BUILD_SHARED_LIBS AND NOT CMAKE_SYSTEM_NAME STREQUAL iOS AND CMAKE_BUILD_TYPE STREQUAL Release)
# Don't use LTO for iOS since it causes the following error
# error: unable to find any architecture information in the binary
# at '/Users/fangjun/open-source/sherpa-onnx/build-ios/build/os64/sherpa-onnx.a':
# Unknown header: 0xb17c0de
# See also https://forums.developer.apple.com/forums/thread/714324
include(CheckIPOSupported)
check_ipo_supported(RESULT ipo)
if(ipo)
message(STATUS "IPO is enabled")
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION ON)
else()
message(STATUS "IPO is not available")
endif()
endif()
if(SHERPA_MNN_ENABLE_TTS)
message(STATUS "TTS is enabled")
add_definitions(-DSHERPA_MNN_ENABLE_TTS=1)
else()
message(WARNING "TTS is disabled")
add_definitions(-DSHERPA_MNN_ENABLE_TTS=0)
endif()
if(SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION)
message(STATUS "speaker diarization is enabled")
add_definitions(-DSHERPA_MNN_ENABLE_SPEAKER_DIARIZATION=1)
else()
message(WARNING "speaker diarization is disabled")
add_definitions(-DSHERPA_MNN_ENABLE_SPEAKER_DIARIZATION=0)
endif()
if(SHERPA_MNN_ENABLE_DIRECTML)
message(STATUS "DirectML is enabled")
add_definitions(-DSHERPA_MNN_ENABLE_DIRECTML=1)
else()
message(STATUS "DirectML is disabled")
add_definitions(-DSHERPA_MNN_ENABLE_DIRECTML=0)
endif()
if(SHERPA_MNN_ENABLE_WASM_SPEAKER_DIARIZATION)
if(NOT SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION to ON if you want to build WASM for speaker diarization")
endif()
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for speaker diarization")
endif()
endif()
if(SHERPA_MNN_ENABLE_WASM_TTS)
if(NOT SHERPA_MNN_ENABLE_TTS)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_TTS to ON if you want to build WASM for TTS")
endif()
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for TTS")
endif()
endif()
if(SHERPA_MNN_ENABLE_WASM_ASR)
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for ASR")
endif()
endif()
if(SHERPA_MNN_ENABLE_WASM_NODEJS)
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for NodeJS")
endif()
add_definitions(-DSHERPA_MNN_ENABLE_WASM_KWS=1)
endif()
if(SHERPA_MNN_ENABLE_WASM)
add_definitions(-DSHERPA_MNN_ENABLE_WASM=1)
endif()
if(SHERPA_MNN_ENABLE_WASM_KWS)
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for KWS")
endif()
add_definitions(-DSHERPA_MNN_ENABLE_WASM_KWS=1)
endif()
if(SHERPA_MNN_ENABLE_WASM_VAD)
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for VAD")
endif()
endif()
if(SHERPA_MNN_ENABLE_WASM_VAD_ASR)
if(NOT SHERPA_MNN_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_MNN_ENABLE_WASM to ON if you enable WASM for VAD+ASR")
endif()
endif()
if(NOT CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 17 CACHE STRING "The C++ version to be used.")
endif()
set(CMAKE_CXX_EXTENSIONS OFF)
message(STATUS "C++ Standard version: ${CMAKE_CXX_STANDARD}")
include(CheckIncludeFileCXX)
if(SHERPA_MNN_ENABLE_RKNN)
add_definitions(-DSHERPA_MNN_ENABLE_RKNN=1)
endif()
if(UNIX AND NOT APPLE AND NOT SHERPA_MNN_ENABLE_WASM AND NOT CMAKE_SYSTEM_NAME STREQUAL Android AND NOT CMAKE_SYSTEM_NAME STREQUAL OHOS)
check_include_file_cxx(alsa/asoundlib.h SHERPA_MNN_HAS_ALSA)
if(SHERPA_MNN_HAS_ALSA)
message(STATUS "With Alsa")
add_definitions(-DSHERPA_MNN_ENABLE_ALSA=1)
else()
message(WARNING "\
Could not find alsa/asoundlib.h !
We won't build sherpa-onnx-alsa
To fix that, please do:
(1) sudo apt-get install alsa-utils libasound2-dev
(2) rm -rf build
(3) re-try
")
endif()
endif()
check_include_file_cxx(cxxabi.h SHERPA_MNN_HAVE_CXXABI_H)
check_include_file_cxx(execinfo.h SHERPA_MNN_HAVE_EXECINFO_H)
if(WIN32)
add_definitions(-DNOMINMAX) # Otherwise, std::max() and std::min() won't work
endif()
if(WIN32 AND MSVC)
# disable various warnings for MSVC
# 4244: 'return': conversion from 'unsigned __int64' to 'int', possible loss of data
# 4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
# 4305: 'argument': truncation from 'double' to 'const float'
# 4334: '<<': result of 32-bit shift implicitly converted to 64 bits
# 4800: 'int': forcing value to bool 'true' or 'false'
# 4996: 'fopen': This function or variable may be unsafe
set(disabled_warnings
/wd4244
/wd4267
/wd4305
/wd4334
/wd4800
/wd4996
)
message(STATUS "Disabled warnings: ${disabled_warnings}")
foreach(w IN LISTS disabled_warnings)
string(APPEND CMAKE_CXX_FLAGS " ${w} ")
endforeach()
add_compile_options("$<$<C_COMPILER_ID:MSVC>:/utf-8>")
add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")
endif()
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules)
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
if(SHERPA_MNN_ENABLE_WASM)
# Enable it for debugging in case there is something wrong.
# string(APPEND CMAKE_CXX_FLAGS " -g4 -s ASSERTIONS=2 -s SAFE_HEAP=1 -s STACK_OVERFLOW_CHECK=1 ")
endif()
if(NOT BUILD_SHARED_LIBS AND CMAKE_SYSTEM_NAME STREQUAL Linux)
if(SHERPA_MNN_LINK_LIBSTDCPP_STATICALLY)
message(STATUS "Link libstdc++ statically")
set(CMAKE_CXX_FLAGS " ${CMAKE_CXX_FLAGS} -static-libstdc++ -static-libgcc ")
else()
message(STATUS "Link libstdc++ dynamically")
endif()
endif()
include(kaldi-native-fbank)
include(kaldi-decoder)
include(simple-sentencepiece)
set(ONNXRUNTIME_DIR ${onnxruntime_SOURCE_DIR})
message(STATUS "ONNXRUNTIME_DIR: ${ONNXRUNTIME_DIR}")
if(SHERPA_MNN_ENABLE_PORTAUDIO AND SHERPA_MNN_ENABLE_BINARY)
# portaudio is used only in building demo binaries and the sherpa-onnx-core
# library does not depend on it.
include(portaudio)
endif()
if(SHERPA_MNN_ENABLE_PYTHON)
include(pybind11)
endif()
if(SHERPA_MNN_ENABLE_TESTS)
enable_testing()
include(googletest)
endif()
if(SHERPA_MNN_ENABLE_WEBSOCKET)
include(websocketpp)
include(asio)
endif()
if(SHERPA_MNN_ENABLE_TTS)
include(espeak-ng-for-piper)
set(ESPEAK_NG_DIR ${espeak_ng_SOURCE_DIR})
message(STATUS "ESPEAK_NG_DIR: ${ESPEAK_NG_DIR}")
include(piper-phonemize)
include(cppjieba) # For Chinese TTS. It is a header-only C++ library
endif()
if(SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION)
include(hclust-cpp)
endif()
# if(NOT MSVC AND CMAKE_BUILD_TYPE STREQUAL Debug AND (CMAKE_CXX_COMPILER_ID STREQUAL "Clang" OR CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang"))
if(SHERPA_MNN_ENABLE_SANITIZER)
message(WARNING "enable ubsan and asan")
set(CMAKE_REQUIRED_LIBRARIES -lubsan -lasan)
include(CheckCCompilerFlag)
set(flags -fsanitize=undefined )
string(APPEND flags " -fno-sanitize-recover=undefined ")
string(APPEND flags " -fsanitize=integer ")
string(APPEND flags " -fsanitize=nullability ")
string(APPEND flags " -fsanitize=implicit-conversion ")
string(APPEND flags " -fsanitize=bounds ")
string(APPEND flags " -fsanitize=address ")
if(OFF)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${flags} -Wall -Wextra")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${flags} -Wall -Wextra")
else()
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${flags}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${flags}")
endif()
set(CMAKE_EXECUTBLE_LINKER_FLAGS "${CMAKE_EXECUTBLE_LINKER_FLAGS} ${flags}")
add_compile_options(-fno-omit-frame-pointer)
endif()
add_subdirectory(sherpa-mnn)
if(SHERPA_MNN_ENABLE_C_API AND SHERPA_MNN_ENABLE_BINARY AND SHERPA_MNN_BUILD_C_API_EXAMPLES)
set(SHERPA_MNN_PKG_WITH_CARGS "-lcargs")
add_subdirectory(c-api-examples)
add_subdirectory(cxx-api-examples)
endif()
if(SHERPA_MNN_ENABLE_WASM)
add_subdirectory(wasm)
endif()
message(STATUS "CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
if(NOT BUILD_SHARED_LIBS)
if(APPLE)
set(SHERPA_MNN_PKG_CONFIG_EXTRA_LIBS "-lc++ -framework Foundation")
endif()
if(UNIX AND NOT APPLE)
set(SHERPA_MNN_PKG_CONFIG_EXTRA_LIBS "-lstdc++ -lm -pthread -ldl")
endif()
endif()
if(NOT BUILD_SHARED_LIBS)
# See https://people.freedesktop.org/~dbn/pkg-config-guide.html
if(SHERPA_MNN_ENABLE_TTS)
configure_file(cmake/sherpa-onnx-static.pc.in ${PROJECT_BINARY_DIR}/sherpa-onnx.pc @ONLY)
else()
configure_file(cmake/sherpa-onnx-static-no-tts.pc.in ${PROJECT_BINARY_DIR}/sherpa-onnx.pc @ONLY)
endif()
else()
configure_file(cmake/sherpa-onnx-shared.pc.in ${PROJECT_BINARY_DIR}/sherpa-onnx.pc @ONLY)
endif()
install(
FILES
${PROJECT_BINARY_DIR}/sherpa-onnx.pc
DESTINATION
./
)
message(STATUS "CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")

View File

@ -0,0 +1 @@
filter=-./mfc-examples

View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -0,0 +1,12 @@
include LICENSE
include README.md
include CMakeLists.txt
recursive-include c-api-examples *.*
recursive-include sherpa-onnx *.*
recursive-include cmake *.*
prune */__pycache__
prune android
prune sherpa-onnx/java-api
prune ios-swift
prune ios-swiftui

View File

@ -0,0 +1,19 @@
# NOTICE
## Project Info
- ** Name **:sherpa-mnn
- **License**: Apache 2.0
## Dependencies
- [MNN](https://github.com/alibaba/MNN/)
## Modifications
This project is derived from sherpa-onnx (https://github.com/k2-fsa/sherpa-onnx)
Key changes include:
- Use MNN instead of onnxruntime to do deeplearning model inference
- Rename sherpa-onnx to sherpa-mnn
## Copyright
Copyright (c) 2022-2023 Xiaomi Corporation. All rights reserved. Copyright (c) 2025, MNN Team.

View File

@ -0,0 +1,93 @@
# sherpa-mnn
本工程基于 sherpa-onnx 改造而得,将 onnxruntime 的调用全部替换为 MNN
## MNN 环境和模型准备
### MNN 编译
下载 MNN : https://github.com/alibaba/MNN/
在编译 MNN 时额外加上 `-DMNN_SEP_BUILD=OFF``-DCMAKE_INSTALL_PREFIX=.` :
```
mkdir build
cd build
cmake .. -DMNN_LOW_MEMORY=ON -DMNN_SEP_BUILD=OFF -DCMAKE_INSTALL_PREFIX=. -DMNN_BUILD_CONVERTER=ON
make -j4
make install
```
### 模型转换
在 编译好 MNNConvert 的目录下上文的build目录按如下命令逐个把下载好的 onnx FP32 模型转换成 mnn 建议转换时量化一下可以降低模型大小并在MNN库开启`MNN_LOW_MEMORY`编译的情况下降低运行内存并提升运行性能,不要直接转换 int8 的 onnx 模型。
```
mkdir sherpa-mnn-models
./MNNConvert -f ONNX --modelFile sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx --MNNModel sherpa-mnn-models/encode.mnn --weightQuantBits=8 --weightQuantBlock=64
./MNNConvert -f ONNX --modelFile sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx --MNNModel sherpa-mnn-models/decode.mnn --weightQuantBits=8 --weightQuantBlock=64
./MNNConvert -f ONNX --modelFile sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx --MNNModel sherpa-mnn-models/joiner.mnn --weightQuantBits=8 --weightQuantBlock=64
```
## 本地编译和运行测试
### 编译
回到 sherpa-mnn 根目录
执行如下操作, `MNN_LIB_DIR`后面的内容按自己的编译目录修改
```
mkdir build
cmake .. -DMNN_LIB_DIR=/Users/xtjiang/alicnn/AliNNPrivate/build
make -j16
```
### 测试
回到 sherpa-mnn 根目录以sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 这个模型为例
```
./build/bin/sherpa-mnn --tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt --encoder=./sherpa-mnn-models/encode.mnn --decoder=./sherpa-mnn-models/decode.mnn --joiner=./sherpa-mnn-models/joiner.mnn ./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/1.wav
```
正常的话会打印如下信息
```
Number of threads: 1, Elapsed seconds: 0.27, Audio duration (s): 5.1, Real time factor (RTF) = 0.27/5.1 = 0.053
这是第一种第二种叫与 ALWAYS ALWAYS什么意思
{ "text": "这是第一种第二种叫与 ALWAYS ALWAYS什么意思", "tokens": ["这", "是", "第", "一", "种", "第", "二", "种", "叫", "与", " ALWAYS", " ALWAYS", "什", "么", "意", "思"], "timestamps": [0.96, 1.04, 1.28, 1.40, 1.48, 1.72, 1.84, 2.04, 2.44, 3.64, 3.84, 4.36, 4.72, 4.76, 4.92, 5.04], "ys_probs": [-0.884769, -0.858386, -1.106216, -0.626572, -1.101773, -0.359962, -0.745972, -0.267809, -0.826859, -1.076653, -0.683002, -0.869667, -0.593140, -0.469688, -0.256882, -0.442532], "lm_probs": [], "context_scores": [], "segment": 0, "words": [], "start_time": 0.00, "is_final": false}
```
## 编译 Android
### MNN Android 编译
进入 MNN 目录后操作
```
cd project/android
mkdir build_64
../build_64.sh -DMNN_LOW_MEMORY=ON -DMNN_SEP_BUILD_OFF -DCMAKE_INSTALL_PREFIX=.
make install
```
### sherpa-mnn Android 编译
修改 build-android-arm64-v8a.sh 脚本
`MNN_LIB_DIR`后面的内容修改为上面的编译目录
然后执行 build-android-arm64-v8a.sh
如果编译出来的 so 体积较大,可以用 android ndk 工具 strip 一下
## 编译 iOS
修改 build-ios.sh 脚本
`MNN_LIB_DIR`后面的内容修改为 MNN 根目录(保证能找到 MNN 头文件即可)
运行 build-ios.sh 脚本
```
export MNN_LIB_DIR=/path/to/MNN
sh build-ios.sh
```
编译出 build-ios/sherpa-mnn.xcframework
## 编译 MacOs framework
类似 iOS 编译过程,修改 build-swift-macos.sh
`MNN_LIB_DIR`后面的内容修改为 MNN 根目录(保证能找到 MNN 头文件即可)
运行 build-swift-macos.sh
编译出 build-swift-macos/sherpa-mnn.xcframework/

View File

@ -0,0 +1,446 @@
### Supported functions
|Speech recognition| Speech synthesis |
|------------------|------------------|
| ✔️ | ✔️ |
|Speaker identification| Speaker diarization | Speaker verification |
|----------------------|-------------------- |------------------------|
| ✔️ | ✔️ | ✔️ |
| Spoken Language identification | Audio tagging | Voice activity detection |
|--------------------------------|---------------|--------------------------|
| ✔️ | ✔️ | ✔️ |
| Keyword spotting | Add punctuation | Speech enhancement |
|------------------|-----------------|--------------------|
| ✔️ | ✔️ | ✔️ |
### Supported platforms
|Architecture| Android | iOS | Windows | macOS | linux | HarmonyOS |
|------------|---------|---------|------------|-------|-------|-----------|
| x64 | ✔️ | | ✔️ | ✔️ | ✔️ | ✔️ |
| x86 | ✔️ | | ✔️ | | | |
| arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| arm32 | ✔️ | | | | ✔️ | ✔️ |
| riscv64 | | | | | ✔️ | |
### Supported programming languages
| 1. C++ | 2. C | 3. Python | 4. JavaScript |
|--------|-------|-----------|---------------|
| ✔️ | ✔️ | ✔️ | ✔️ |
|5. Java | 6. C# | 7. Kotlin | 8. Swift |
|--------|-------|-----------|----------|
| ✔️ | ✔️ | ✔️ | ✔️ |
| 9. Go | 10. Dart | 11. Rust | 12. Pascal |
|-------|----------|----------|------------|
| ✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see [sherpa-rs][sherpa-rs]
It also supports WebAssembly.
## Introduction
This repository supports running the following functions **locally**
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker diarization
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., [silero-vad][silero-vad])
- Keyword spotting
on the following platforms and operating systems:
- x86, ``x86_64``, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
- Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- HarmonyOS
- NodeJS
- WebAssembly
- [NVIDIA Jetson Orin NX][NVIDIA Jetson Orin NX] (Support running on both CPU and GPU)
- [NVIDIA Jetson Nano B01][NVIDIA Jetson Nano B01] (Support running on both CPU and GPU)
- [Raspberry Pi][Raspberry Pi]
- [RV1126][RV1126]
- [LicheePi4A][LicheePi4A]
- [VisionFive 2][VisionFive 2]
- [旭日X3派][旭日X3派]
- [爱芯派][爱芯派]
- etc
with the following APIs
- C++, C, Python, Go, ``C#``
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
### Links for Huggingface Spaces
<details>
<summary>You can visit the following Huggingface spaces to try sherpa-onnx without
installing anything. All you need is a browser.</summary>
| Description | URL |
|-------------------------------------------------------|-----------------------------------------|
| Speaker diarization | [Click me][hf-space-speaker-diarization]|
| Speech recognition | [Click me][hf-space-asr] |
| Speech recognition with [Whisper][Whisper] | [Click me][hf-space-asr-whisper] |
| Speech synthesis | [Click me][hf-space-tts] |
| Generate subtitles | [Click me][hf-space-subtitle] |
| Audio tagging | [Click me][hf-space-audio-tagging] |
| Spoken language identification with [Whisper][Whisper]| [Click me][hf-space-slid-whisper] |
We also have spaces built using WebAssembly. They are listed below:
| Description | Huggingface space| ModelScope space|
|------------------------------------------------------------------------------------------|------------------|-----------------|
|Voice activity detection with [silero-vad][silero-vad] | [Click me][wasm-hf-vad]|[地址][wasm-ms-vad]|
|Real-time speech recognition (Chinese + English) with Zipformer | [Click me][wasm-hf-streaming-asr-zh-en-zipformer]|[地址][wasm-hf-streaming-asr-zh-en-zipformer]|
|Real-time speech recognition (Chinese + English) with Paraformer |[Click me][wasm-hf-streaming-asr-zh-en-paraformer]| [地址][wasm-ms-streaming-asr-zh-en-paraformer]|
|Real-time speech recognition (Chinese + English + Cantonese) with [Paraformer-large][Paraformer-large]|[Click me][wasm-hf-streaming-asr-zh-en-yue-paraformer]| [地址][wasm-ms-streaming-asr-zh-en-yue-paraformer]|
|Real-time speech recognition (English) |[Click me][wasm-hf-streaming-asr-en-zipformer] |[地址][wasm-ms-streaming-asr-en-zipformer]|
|VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with [SenseVoice][SenseVoice]|[Click me][wasm-hf-vad-asr-zh-en-ko-ja-yue-sense-voice]| [地址][wasm-ms-vad-asr-zh-en-ko-ja-yue-sense-voice]|
|VAD + speech recognition (English) with [Whisper][Whisper] tiny.en|[Click me][wasm-hf-vad-asr-en-whisper-tiny-en]| [地址][wasm-ms-vad-asr-en-whisper-tiny-en]|
|VAD + speech recognition (English) with [Moonshine tiny][Moonshine tiny]|[Click me][wasm-hf-vad-asr-en-moonshine-tiny-en]| [地址][wasm-ms-vad-asr-en-moonshine-tiny-en]|
|VAD + speech recognition (English) with Zipformer trained with [GigaSpeech][GigaSpeech] |[Click me][wasm-hf-vad-asr-en-zipformer-gigaspeech]| [地址][wasm-ms-vad-asr-en-zipformer-gigaspeech]|
|VAD + speech recognition (Chinese) with Zipformer trained with [WenetSpeech][WenetSpeech] |[Click me][wasm-hf-vad-asr-zh-zipformer-wenetspeech]| [地址][wasm-ms-vad-asr-zh-zipformer-wenetspeech]|
|VAD + speech recognition (Japanese) with Zipformer trained with [ReazonSpeech][ReazonSpeech]|[Click me][wasm-hf-vad-asr-ja-zipformer-reazonspeech]| [地址][wasm-ms-vad-asr-ja-zipformer-reazonspeech]|
|VAD + speech recognition (Thai) with Zipformer trained with [GigaSpeech2][GigaSpeech2] |[Click me][wasm-hf-vad-asr-th-zipformer-gigaspeech2]| [地址][wasm-ms-vad-asr-th-zipformer-gigaspeech2]|
|VAD + speech recognition (Chinese 多种方言) with a [TeleSpeech-ASR][TeleSpeech-ASR] CTC model|[Click me][wasm-hf-vad-asr-zh-telespeech]| [地址][wasm-ms-vad-asr-zh-telespeech]|
|VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large |[Click me][wasm-hf-vad-asr-zh-en-paraformer-large]| [地址][wasm-ms-vad-asr-zh-en-paraformer-large]|
|VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small |[Click me][wasm-hf-vad-asr-zh-en-paraformer-small]| [地址][wasm-ms-vad-asr-zh-en-paraformer-small]|
|Speech synthesis (English) |[Click me][wasm-hf-tts-piper-en]| [地址][wasm-ms-tts-piper-en]|
|Speech synthesis (German) |[Click me][wasm-hf-tts-piper-de]| [地址][wasm-ms-tts-piper-de]|
|Speaker diarization |[Click me][wasm-hf-speaker-diarization]|[地址][wasm-ms-speaker-diarization]|
</details>
### Links for pre-built Android APKs
<details>
<summary>You can find pre-built Android APKs for this repository in the following table</summary>
| Description | URL | 中国用户 |
|----------------------------------------|------------------------------------|-----------------------------------|
| Speaker diarization | [Address][apk-speaker-diarization] | [点此][apk-speaker-diarization-cn]|
| Streaming speech recognition | [Address][apk-streaming-asr] | [点此][apk-streaming-asr-cn] |
| Text-to-speech | [Address][apk-tts] | [点此][apk-tts-cn] |
| Voice activity detection (VAD) | [Address][apk-vad] | [点此][apk-vad-cn] |
| VAD + non-streaming speech recognition | [Address][apk-vad-asr] | [点此][apk-vad-asr-cn] |
| Two-pass speech recognition | [Address][apk-2pass] | [点此][apk-2pass-cn] |
| Audio tagging | [Address][apk-at] | [点此][apk-at-cn] |
| Audio tagging (WearOS) | [Address][apk-at-wearos] | [点此][apk-at-wearos-cn] |
| Speaker identification | [Address][apk-sid] | [点此][apk-sid-cn] |
| Spoken language identification | [Address][apk-slid] | [点此][apk-slid-cn] |
| Keyword spotting | [Address][apk-kws] | [点此][apk-kws-cn] |
</details>
### Links for pre-built Flutter APPs
<details>
#### Real-time speech recognition
| Description | URL | 中国用户 |
|--------------------------------|-------------------------------------|-------------------------------------|
| Streaming speech recognition | [Address][apk-flutter-streaming-asr]| [点此][apk-flutter-streaming-asr-cn]|
#### Text-to-speech
| Description | URL | 中国用户 |
|------------------------------------------|------------------------------------|------------------------------------|
| Android (arm64-v8a, armeabi-v7a, x86_64) | [Address][flutter-tts-android] | [点此][flutter-tts-android-cn] |
| Linux (x64) | [Address][flutter-tts-linux] | [点此][flutter-tts-linux-cn] |
| macOS (x64) | [Address][flutter-tts-macos-x64] | [点此][flutter-tts-macos-arm64-cn] |
| macOS (arm64) | [Address][flutter-tts-macos-arm64] | [点此][flutter-tts-macos-x64-cn] |
| Windows (x64) | [Address][flutter-tts-win-x64] | [点此][flutter-tts-win-x64-cn] |
> Note: You need to build from source for iOS.
</details>
### Links for pre-built Lazarus APPs
<details>
#### Generating subtitles
| Description | URL | 中国用户 |
|--------------------------------|----------------------------|----------------------------|
| Generate subtitles (生成字幕) | [Address][lazarus-subtitle]| [点此][lazarus-subtitle-cn]|
</details>
### Links for pre-trained models
<details>
| Description | URL |
|---------------------------------------------|---------------------------------------------------------------------------------------|
| Speech recognition (speech to text, ASR) | [Address][asr-models] |
| Text-to-speech (TTS) | [Address][tts-models] |
| VAD | [Address][vad-models] |
| Keyword spotting | [Address][kws-models] |
| Audio tagging | [Address][at-models] |
| Speaker identification (Speaker ID) | [Address][sid-models] |
| Spoken language identification (Language ID)| See multi-lingual [Whisper][Whisper] ASR models from [Speech recognition][asr-models]|
| Punctuation | [Address][punct-models] |
| Speaker segmentation | [Address][speaker-segmentation-models] |
| Speech enhancement | [Address][speech-enhancement-models] |
</details>
#### Some pre-trained ASR models (Streaming)
<details>
Please see
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-paraformer/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-ctc/index.html>
for more models. The following table lists only **SOME** of them.
|Name | Supported Languages| Description|
|-----|-----|----|
|[sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20][sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20]| Chinese, English| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#csukuangfj-sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-bilingual-chinese-english)|
|[sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16][sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16]| Chinese, English| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16-bilingual-chinese-english)|
|[sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23][sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23]|Chinese| Suitable for Cortex A7 CPU. See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-zh-14m-2023-02-23)|
|[sherpa-onnx-streaming-zipformer-en-20M-2023-02-17][sherpa-onnx-streaming-zipformer-en-20M-2023-02-17]|English|Suitable for Cortex A7 CPU. See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-en-20m-2023-02-17)|
|[sherpa-onnx-streaming-zipformer-korean-2024-06-16][sherpa-onnx-streaming-zipformer-korean-2024-06-16]|Korean| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-korean-2024-06-16-korean)|
|[sherpa-onnx-streaming-zipformer-fr-2023-04-14][sherpa-onnx-streaming-zipformer-fr-2023-04-14]|French| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#shaojieli-sherpa-onnx-streaming-zipformer-fr-2023-04-14-french)|
</details>
#### Some pre-trained ASR models (Non-Streaming)
<details>
Please see
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-paraformer/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/telespeech/index.html>
- <https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/index.html>
for more models. The following table lists only **SOME** of them.
|Name | Supported Languages| Description|
|-----|-----|----|
|[Whisper tiny.en](https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2)|English| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/tiny.en.html)|
|[Moonshine tiny][Moonshine tiny]|English|See [also](https://github.com/usefulsensors/moonshine)|
|[sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17][sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17]|Chinese, Cantonese, English, Korean, Japanese| 支持多种中文方言. See [also](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html)|
|[sherpa-onnx-paraformer-zh-2024-03-09][sherpa-onnx-paraformer-zh-2024-03-09]|Chinese, English| 也支持多种中文方言. See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-paraformer/paraformer-models.html#csukuangfj-sherpa-onnx-paraformer-zh-2024-03-09-chinese-english)|
|[sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01][sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01]|Japanese|See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01-japanese)|
|[sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24][sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24]|Russian|See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/nemo-transducer-models.html#sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24-russian)|
|[sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24][sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24]|Russian| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/nemo/russian.html#sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24)|
|[sherpa-onnx-zipformer-ru-2024-09-18][sherpa-onnx-zipformer-ru-2024-09-18]|Russian|See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-ru-2024-09-18-russian)|
|[sherpa-onnx-zipformer-korean-2024-06-24][sherpa-onnx-zipformer-korean-2024-06-24]|Korean|See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-korean-2024-06-24-korean)|
|[sherpa-onnx-zipformer-thai-2024-06-20][sherpa-onnx-zipformer-thai-2024-06-20]|Thai| See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-thai-2024-06-20-thai)|
|[sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04][sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04]|Chinese| 支持多种方言. See [also](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/telespeech/models.html#sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04)|
</details>
### Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
### How to reach us
Please see
https://k2-fsa.github.io/sherpa/social-groups.html
for 新一代 Kaldi **微信交流群** and **QQ 交流群**.
## Projects using sherpa-onnx
### [Open-LLM-VTuber](https://github.com/t41372/Open-LLM-VTuber)
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking
face running locally across platforms
See also <https://github.com/t41372/Open-LLM-VTuber/pull/50>
### [voiceapi](https://github.com/ruzhila/voiceapi)
<details>
<summary>Streaming ASR and TTS based on FastAPI</summary>
It shows how to use the ASR and TTS Python APIs with FastAPI.
</details>
### [腾讯会议摸鱼工具 TMSpeech](https://github.com/jxlpzqc/TMSpeech)
Uses streaming ASR in C# with graphical user interface.
Video demo in Chinese: [【开源】Windows实时字幕软件网课/开会必备)](https://www.bilibili.com/video/BV1rX4y1p7Nx)
### [lol互动助手](https://github.com/l1veIn/lol-wom-electron)
It uses the JavaScript API of sherpa-onnx along with [Electron](https://electronjs.org/)
Video demo in Chinese: [爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!](https://www.bilibili.com/video/BV142tje9E74)
### [Sherpa-ONNX 语音识别服务器](https://github.com/hfyydd/sherpa-onnx-server)
A server based on nodejs providing Restful API for speech recognition.
### [QSmartAssistant](https://github.com/xinhecuican/QSmartAssistant)
一个模块化,全过程可离线,低占用率的对话机器人/智能音箱
It uses QT. Both [ASR](https://github.com/xinhecuican/QSmartAssistant/blob/master/doc/%E5%AE%89%E8%A3%85.md#asr)
and [TTS](https://github.com/xinhecuican/QSmartAssistant/blob/master/doc/%E5%AE%89%E8%A3%85.md#tts)
are used.
### [Flutter-EasySpeechRecognition](https://github.com/Jason-chen-coder/Flutter-EasySpeechRecognition)
It extends [./flutter-examples/streaming_asr](./flutter-examples/streaming_asr) by
downloading models inside the app to reduce the size of the app.
### [sherpa-onnx-unity](https://github.com/xue-fei/sherpa-onnx-unity)
sherpa-onnx in Unity. See also [#1695](https://github.com/k2-fsa/sherpa-onnx/issues/1695),
[#1892](https://github.com/k2-fsa/sherpa-onnx/issues/1892), and [#1859](https://github.com/k2-fsa/sherpa-onnx/issues/1859)
[sherpa-rs]: https://github.com/thewh1teagle/sherpa-rs
[silero-vad]: https://github.com/snakers4/silero-vad
[Raspberry Pi]: https://www.raspberrypi.com/
[RV1126]: https://www.rock-chips.com/uploads/pdf/2022.8.26/191/RV1126%20Brief%20Datasheet.pdf
[LicheePi4A]: https://sipeed.com/licheepi4a
[VisionFive 2]: https://www.starfivetech.com/en/site/boards
[旭日X3派]: https://developer.horizon.ai/api/v1/fileData/documents_pi/index.html
[爱芯派]: https://wiki.sipeed.com/hardware/zh/maixIII/ax-pi/axpi.html
[hf-space-speaker-diarization]: https://huggingface.co/spaces/k2-fsa/speaker-diarization
[hf-space-asr]: https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
[Whisper]: https://github.com/openai/whisper
[hf-space-asr-whisper]: https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition-with-whisper
[hf-space-tts]: https://huggingface.co/spaces/k2-fsa/text-to-speech
[hf-space-subtitle]: https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos
[hf-space-audio-tagging]: https://huggingface.co/spaces/k2-fsa/audio-tagging
[hf-space-slid-whisper]: https://huggingface.co/spaces/k2-fsa/spoken-language-identification
[wasm-hf-vad]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-sherpa-onnx
[wasm-ms-vad]: https://modelscope.cn/studios/csukuangfj/web-assembly-vad-sherpa-onnx
[wasm-hf-streaming-asr-zh-en-zipformer]: https://huggingface.co/spaces/k2-fsa/web-assembly-asr-sherpa-onnx-zh-en
[wasm-ms-streaming-asr-zh-en-zipformer]: https://modelscope.cn/studios/k2-fsa/web-assembly-asr-sherpa-onnx-zh-en
[wasm-hf-streaming-asr-zh-en-paraformer]: https://huggingface.co/spaces/k2-fsa/web-assembly-asr-sherpa-onnx-zh-en-paraformer
[wasm-ms-streaming-asr-zh-en-paraformer]: https://modelscope.cn/studios/k2-fsa/web-assembly-asr-sherpa-onnx-zh-en-paraformer
[Paraformer-large]: https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary
[wasm-hf-streaming-asr-zh-en-yue-paraformer]: https://huggingface.co/spaces/k2-fsa/web-assembly-asr-sherpa-onnx-zh-cantonese-en-paraformer
[wasm-ms-streaming-asr-zh-en-yue-paraformer]: https://modelscope.cn/studios/k2-fsa/web-assembly-asr-sherpa-onnx-zh-cantonese-en-paraformer
[wasm-hf-streaming-asr-en-zipformer]: https://huggingface.co/spaces/k2-fsa/web-assembly-asr-sherpa-onnx-en
[wasm-ms-streaming-asr-en-zipformer]: https://modelscope.cn/studios/k2-fsa/web-assembly-asr-sherpa-onnx-en
[SenseVoice]: https://github.com/FunAudioLLM/SenseVoice
[wasm-hf-vad-asr-zh-en-ko-ja-yue-sense-voice]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-en-ja-ko-cantonese-sense-voice
[wasm-ms-vad-asr-zh-en-ko-ja-yue-sense-voice]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-zh-en-jp-ko-cantonese-sense-voice
[wasm-hf-vad-asr-en-whisper-tiny-en]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-en-whisper-tiny
[wasm-ms-vad-asr-en-whisper-tiny-en]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-en-whisper-tiny
[wasm-hf-vad-asr-en-moonshine-tiny-en]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-en-moonshine-tiny
[wasm-ms-vad-asr-en-moonshine-tiny-en]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-en-moonshine-tiny
[wasm-hf-vad-asr-en-zipformer-gigaspeech]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-en-zipformer-gigaspeech
[wasm-ms-vad-asr-en-zipformer-gigaspeech]: https://www.modelscope.cn/studios/k2-fsa/web-assembly-vad-asr-sherpa-onnx-en-zipformer-gigaspeech
[wasm-hf-vad-asr-zh-zipformer-wenetspeech]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-zipformer-wenetspeech
[wasm-ms-vad-asr-zh-zipformer-wenetspeech]: https://www.modelscope.cn/studios/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-zipformer-wenetspeech
[ReazonSpeech]: https://research.reazon.jp/_static/reazonspeech_nlp2023.pdf
[wasm-hf-vad-asr-ja-zipformer-reazonspeech]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-ja-zipformer
[wasm-ms-vad-asr-ja-zipformer-reazonspeech]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-ja-zipformer
[GigaSpeech2]: https://github.com/SpeechColab/GigaSpeech2
[wasm-hf-vad-asr-th-zipformer-gigaspeech2]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-th-zipformer
[wasm-ms-vad-asr-th-zipformer-gigaspeech2]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-vad-asr-sherpa-onnx-th-zipformer
[TeleSpeech-ASR]: https://github.com/Tele-AI/TeleSpeech-ASR
[wasm-hf-vad-asr-zh-telespeech]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-telespeech
[wasm-ms-vad-asr-zh-telespeech]: https://www.modelscope.cn/studios/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-telespeech
[wasm-hf-vad-asr-zh-en-paraformer-large]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-en-paraformer
[wasm-ms-vad-asr-zh-en-paraformer-large]: https://www.modelscope.cn/studios/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-en-paraformer
[wasm-hf-vad-asr-zh-en-paraformer-small]: https://huggingface.co/spaces/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-en-paraformer-small
[wasm-ms-vad-asr-zh-en-paraformer-small]: https://www.modelscope.cn/studios/k2-fsa/web-assembly-vad-asr-sherpa-onnx-zh-en-paraformer-small
[wasm-hf-tts-piper-en]: https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-en
[wasm-ms-tts-piper-en]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-en
[wasm-hf-tts-piper-de]: https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-ms-tts-piper-de]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-hf-speaker-diarization]: https://huggingface.co/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx
[wasm-ms-speaker-diarization]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx
[apk-speaker-diarization]: https://k2-fsa.github.io/sherpa/onnx/speaker-diarization/apk.html
[apk-speaker-diarization-cn]: https://k2-fsa.github.io/sherpa/onnx/speaker-diarization/apk-cn.html
[apk-streaming-asr]: https://k2-fsa.github.io/sherpa/onnx/android/apk.html
[apk-streaming-asr-cn]: https://k2-fsa.github.io/sherpa/onnx/android/apk-cn.html
[apk-tts]: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
[apk-tts-cn]: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine-cn.html
[apk-vad]: https://k2-fsa.github.io/sherpa/onnx/vad/apk.html
[apk-vad-cn]: https://k2-fsa.github.io/sherpa/onnx/vad/apk-cn.html
[apk-vad-asr]: https://k2-fsa.github.io/sherpa/onnx/vad/apk-asr.html
[apk-vad-asr-cn]: https://k2-fsa.github.io/sherpa/onnx/vad/apk-asr-cn.html
[apk-2pass]: https://k2-fsa.github.io/sherpa/onnx/android/apk-2pass.html
[apk-2pass-cn]: https://k2-fsa.github.io/sherpa/onnx/android/apk-2pass-cn.html
[apk-at]: https://k2-fsa.github.io/sherpa/onnx/audio-tagging/apk.html
[apk-at-cn]: https://k2-fsa.github.io/sherpa/onnx/audio-tagging/apk-cn.html
[apk-at-wearos]: https://k2-fsa.github.io/sherpa/onnx/audio-tagging/apk-wearos.html
[apk-at-wearos-cn]: https://k2-fsa.github.io/sherpa/onnx/audio-tagging/apk-wearos-cn.html
[apk-sid]: https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
[apk-sid-cn]: https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk-cn.html
[apk-slid]: https://k2-fsa.github.io/sherpa/onnx/spoken-language-identification/apk.html
[apk-slid-cn]: https://k2-fsa.github.io/sherpa/onnx/spoken-language-identification/apk-cn.html
[apk-kws]: https://k2-fsa.github.io/sherpa/onnx/kws/apk.html
[apk-kws-cn]: https://k2-fsa.github.io/sherpa/onnx/kws/apk-cn.html
[apk-flutter-streaming-asr]: https://k2-fsa.github.io/sherpa/onnx/flutter/asr/app.html
[apk-flutter-streaming-asr-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/asr/app-cn.html
[flutter-tts-android]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-android.html
[flutter-tts-android-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-android-cn.html
[flutter-tts-linux]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-linux.html
[flutter-tts-linux-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-linux-cn.html
[flutter-tts-macos-x64]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-macos-x64.html
[flutter-tts-macos-arm64-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-macos-x64-cn.html
[flutter-tts-macos-arm64]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-macos-arm64.html
[flutter-tts-macos-x64-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-macos-arm64-cn.html
[flutter-tts-win-x64]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-win.html
[flutter-tts-win-x64-cn]: https://k2-fsa.github.io/sherpa/onnx/flutter/tts-win-cn.html
[lazarus-subtitle]: https://k2-fsa.github.io/sherpa/onnx/lazarus/download-generated-subtitles.html
[lazarus-subtitle-cn]: https://k2-fsa.github.io/sherpa/onnx/lazarus/download-generated-subtitles-cn.html
[asr-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
[tts-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
[vad-models]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
[kws-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/kws-models
[at-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/audio-tagging-models
[sid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[slid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[punct-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/punctuation-models
[speaker-segmentation-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
[GigaSpeech]: https://github.com/SpeechColab/GigaSpeech
[WenetSpeech]: https://github.com/wenet-e2e/WenetSpeech
[sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
[sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
[sherpa-onnx-streaming-zipformer-korean-2024-06-16]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-korean-2024-06-16.tar.bz2
[sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
[sherpa-onnx-streaming-zipformer-en-20M-2023-02-17]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
[sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01.tar.bz2
[sherpa-onnx-zipformer-ru-2024-09-18]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-ru-2024-09-18.tar.bz2
[sherpa-onnx-zipformer-korean-2024-06-24]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-korean-2024-06-24.tar.bz2
[sherpa-onnx-zipformer-thai-2024-06-20]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-thai-2024-06-20.tar.bz2
[sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2
[sherpa-onnx-paraformer-zh-2024-03-09]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2
[sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24.tar.bz2
[sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2
[sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
[sherpa-onnx-streaming-zipformer-fr-2023-04-14]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-fr-2023-04-14.tar.bz2
[Moonshine tiny]: https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
[NVIDIA Jetson Orin NX]: https://developer.download.nvidia.com/assets/embedded/secure/jetson/orin_nx/docs/Jetson_Orin_NX_DS-10712-001_v0.5.pdf?RCPGu9Q6OVAOv7a7vgtwc9-BLScXRIWq6cSLuditMALECJ_dOj27DgnqAPGVnT2VpiNpQan9SyFy-9zRykR58CokzbXwjSA7Gj819e91AXPrWkGZR3oS1VLxiDEpJa_Y0lr7UT-N4GnXtb8NlUkP4GkCkkF_FQivGPrAucCUywL481GH_WpP_p7ziHU1Wg==&t=eyJscyI6ImdzZW8iLCJsc2QiOiJodHRwczovL3d3dy5nb29nbGUuY29tLmhrLyJ9
[NVIDIA Jetson Nano B01]: https://www.seeedstudio.com/blog/2020/01/16/new-revision-of-jetson-nano-dev-kit-now-supports-new-jetson-nano-module/
[speech-enhancement-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speech-enhancement-models

View File

@ -0,0 +1,158 @@
#!/usr/bin/env bash
set -ex
# If BUILD_SHARED_LIBS is ON, we use libonnxruntime.so
# If BUILD_SHARED_LIBS is OFF, we use libonnxruntime.a
#
# In any case, we will have libsherpa-onnx-jni.so
#
# If BUILD_SHARED_LIBS is OFF, then libonnxruntime.a is linked into libsherpa-onnx-jni.so
# and you only need to copy libsherpa-onnx-jni.so to your Android projects.
#
# If BUILD_SHARED_LIBS is ON, then you need to copy both libsherpa-onnx-jni.so
# and libonnxruntime.so to your Android projects
#
BUILD_SHARED_LIBS=ON
if [ $BUILD_SHARED_LIBS == ON ]; then
dir=$PWD/build-android-arm64-v8a
else
dir=$PWD/build-android-arm64-v8a-static
fi
mkdir -p $dir
cd $dir
# Note from https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-android
# (optional) remove the hardcoded debug flag in Android NDK android-ndk
# issue: https://github.com/android/ndk/issues/243
#
# open $ANDROID_NDK/build/cmake/android.toolchain.cmake for ndk < r23
# or $ANDROID_NDK/build/cmake/android-legacy.toolchain.cmake for ndk >= r23
#
# delete "-g" line
#
# list(APPEND ANDROID_COMPILER_FLAGS
# -g
# -DANDROID
if [ -z $ANDROID_NDK ]; then
ANDROID_NDK=/star-fj/fangjun/software/android-sdk/ndk/22.1.7171670
if [ $BUILD_SHARED_LIBS == OFF ]; then
ANDROID_NDK=/star-fj/fangjun/software/android-sdk/ndk/27.0.11718014
fi
# or use
# ANDROID_NDK=/star-fj/fangjun/software/android-ndk
#
# Inside the $ANDROID_NDK directory, you can find a binary ndk-build
# and some other files like the file "build/cmake/android.toolchain.cmake"
if [ ! -d $ANDROID_NDK ]; then
# For macOS, I have installed Android Studio, select the menu
# Tools -> SDK manager -> Android SDK
# and set "Android SDK location" to /Users/fangjun/software/my-android
ANDROID_NDK=/Users/fangjun/software/my-android/ndk/22.1.7171670
if [ $BUILD_SHARED_LIBS == OFF ]; then
ANDROID_NDK=/Users/fangjun/software/my-android/ndk/27.0.11718014
fi
fi
fi
if [ ! -d $ANDROID_NDK ]; then
echo Please set the environment variable ANDROID_NDK before you run this script
exit 1
fi
echo "ANDROID_NDK: $ANDROID_NDK"
sleep 1
if [ -z $SHERPA_MNN_ENABLE_TTS ]; then
SHERPA_MNN_ENABLE_TTS=ON
fi
if [ -z $SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION ]; then
SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION=ON
fi
if [ -z $SHERPA_MNN_ENABLE_BINARY ]; then
SHERPA_MNN_ENABLE_BINARY=OFF
fi
if [ -z $SHERPA_MNN_ENABLE_C_API ]; then
SHERPA_MNN_ENABLE_C_API=OFF
fi
if [ -z $SHERPA_MNN_ENABLE_JNI ]; then
SHERPA_MNN_ENABLE_JNI=ON
fi
cmake -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK/build/cmake/android.toolchain.cmake" \
-DSHERPA_MNN_ENABLE_TTS=$SHERPA_MNN_ENABLE_TTS \
-DSHERPA_MNN_ENABLE_SPEAKER_DIARIZATION=$SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION \
-DSHERPA_MNN_ENABLE_BINARY=$SHERPA_MNN_ENABLE_BINARY \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DMNN_LIB_DIR=/Users/xtjiang/alicnn/AliNNPrivate/project/android/build_64 \
-DBUILD_SHARED_LIBS=$BUILD_SHARED_LIBS \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=$SHERPA_MNN_ENABLE_JNI \
-DSHERPA_MNN_LINK_LIBSTDCPP_STATICALLY=OFF \
-DSHERPA_MNN_ENABLE_C_API=$SHERPA_MNN_ENABLE_C_API \
-DCMAKE_INSTALL_PREFIX=./install \
-DANDROID_ABI="arm64-v8a" \
-DANDROID_PLATFORM=android-21 ..
# By default, it links to libc++_static.a
# -DANDROID_STL=c++_shared \
# Please use -DANDROID_PLATFORM=android-27 if you want to use Android NNAPI
# make VERBOSE=1 -j4
make -j4
make install/strip
rm -rf install/share
rm -rf install/lib/pkgconfig
rm -rf install/lib/lib*.a
if [ -f install/lib/libsherpa-onnx-c-api.so ]; then
cat >install/lib/README.md <<EOF
# Introduction
Note that if you use Android Studio, then you only need to
copy libonnxruntime.so and libsherpa-onnx-jni.so
to your jniLibs, and you don't need libsherpa-onnx-c-api.so or
libsherpa-onnx-cxx-api.so.
libsherpa-onnx-c-api.so and libsherpa-onnx-cxx-api.so are for users
who don't use JNI. In that case, libsherpa-onnx-jni.so is not needed.
In any case, libonnxruntime.is is always needed.
EOF
ls -lh install/lib/README.md
fi
# To run the generated binaries on Android, please use the following steps.
#
#
# 1. Copy sherpa-onnx and its dependencies to Android
#
# cd build-android-arm64-v8a/install/lib
# adb push ./lib*.so /data/local/tmp
# cd ../bin
# adb push ./sherpa-onnx /data/local/tmp
#
# 2. Login into Android
#
# adb shell
# cd /data/local/tmp
# ./sherpa-onnx
#
# It should show the help message of sherpa-onnx.
#
# Please use the above approach to copy model files to your phone.

View File

@ -0,0 +1,173 @@
#!/usr/bin/env bash
set -e
dir=build-ios
mkdir -p $dir
cd $dir
if [ -z ${MNN_LIB_DIR} ]; then
echo "Please export MNN_LIB_DIR=/path/to/MNN"
exit 1
fi
# First, for simulator
echo "Building for simulator (x86_64)"
# Note: We use -DENABLE_ARC=1 here to fix the linking error:
#
# The symbol _NSLog is not defined
#
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=${MNN_LIB_DIR} \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-S .. \
-DCMAKE_TOOLCHAIN_FILE=./toolchains/ios.toolchain.cmake \
-DPLATFORM=SIMULATOR64 \
-DENABLE_BITCODE=0 \
-DENABLE_ARC=1 \
-DENABLE_VISIBILITY=0 \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
-DDEPLOYMENT_TARGET=13.0 \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-B build/simulator_x86_64
cmake --build build/simulator_x86_64 -j $(nproc) --verbose
echo "Building for simulator (arm64)"
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=${MNN_LIB_DIR} \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-S .. \
-DCMAKE_TOOLCHAIN_FILE=./toolchains/ios.toolchain.cmake \
-DPLATFORM=SIMULATORARM64 \
-DENABLE_BITCODE=0 \
-DENABLE_ARC=1 \
-DENABLE_VISIBILITY=0 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=./install \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
-DDEPLOYMENT_TARGET=13.0 \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-B build/simulator_arm64
cmake --build build/simulator_arm64 -j $(nproc) --verbose
echo "Building for arm64"
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=${MNN_LIB_DIR} \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-S .. \
-DCMAKE_TOOLCHAIN_FILE=./toolchains/ios.toolchain.cmake \
-DPLATFORM=OS64 \
-DENABLE_BITCODE=0 \
-DENABLE_ARC=1 \
-DENABLE_VISIBILITY=0 \
-DCMAKE_INSTALL_PREFIX=./install \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
-DDEPLOYMENT_TARGET=13.0 \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-B build/os64
cmake --build build/os64 -j $(nproc)
# Generate headers for sherpa-mnn.xcframework
cmake --build build/os64 --target install
echo "Generate xcframework"
mkdir -p "build/simulator/lib"
for f in libkaldi-native-fbank-core.a libsherpa-mnn-c-api.a libsherpa-mnn-core.a \
libsherpa-mnn-fstfar.a libssentencepiece_core.a \
libsherpa-mnn-fst.a libsherpa-mnn-kaldifst-core.a libkaldi-decoder-core.a \
libucd.a libpiper_phonemize.a libespeak-ng.a; do
lipo -create build/simulator_arm64/lib/${f} \
build/simulator_x86_64/lib/${f} \
-output build/simulator/lib/${f}
done
# Merge archive first, because the following xcodebuild create xcframework
# cannot accept multi archive with the same architecture.
libtool -static -o build/simulator/sherpa-mnn.a \
build/simulator/lib/libkaldi-native-fbank-core.a \
build/simulator/lib/libsherpa-mnn-c-api.a \
build/simulator/lib/libsherpa-mnn-core.a \
build/simulator/lib/libsherpa-mnn-fstfar.a \
build/simulator/lib/libsherpa-mnn-fst.a \
build/simulator/lib/libsherpa-mnn-kaldifst-core.a \
build/simulator/lib/libkaldi-decoder-core.a \
build/simulator/lib/libucd.a \
build/simulator/lib/libpiper_phonemize.a \
build/simulator/lib/libespeak-ng.a \
build/simulator/lib/libssentencepiece_core.a
libtool -static -o build/os64/sherpa-mnn.a \
build/os64/lib/libkaldi-native-fbank-core.a \
build/os64/lib/libsherpa-mnn-c-api.a \
build/os64/lib/libsherpa-mnn-core.a \
build/os64/lib/libsherpa-mnn-fstfar.a \
build/os64/lib/libsherpa-mnn-fst.a \
build/os64/lib/libsherpa-mnn-kaldifst-core.a \
build/os64/lib/libkaldi-decoder-core.a \
build/os64/lib/libucd.a \
build/os64/lib/libpiper_phonemize.a \
build/os64/lib/libespeak-ng.a \
build/os64/lib/libssentencepiece_core.a
rm -rf sherpa-mnn.xcframework
xcodebuild -create-xcframework \
-library "build/os64/sherpa-mnn.a" \
-library "build/simulator/sherpa-mnn.a" \
-output sherpa-mnn.xcframework
# Copy Headers
mkdir -p sherpa-mnn.xcframework/Headers
cp -av install/include/* sherpa-mnn.xcframework/Headers
pushd sherpa-mnn.xcframework/ios-arm64_x86_64-simulator
ln -s sherpa-mnn.a libsherpa-mnn.a
popd
pushd sherpa-mnn.xcframework/ios-arm64
ln -s sherpa-mnn.a libsherpa-mnn.a

View File

@ -0,0 +1,46 @@
#!/usr/bin/env bash
set -ex
dir=build-swift-macos
mkdir -p $dir
cd $dir
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=/Users/xtjiang/alicnn/AliNNPrivate \
-DSHERPA_MNN_BUILD_C_API_EXAMPLES=OFF \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DCMAKE_INSTALL_PREFIX=./install \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
../
make VERBOSE=1 -j4
make install
rm -fv ./install/include/cargs.h
libtool -static -o ./install/lib/libsherpa-mnn.a \
./install/lib/libsherpa-mnn-c-api.a \
./install/lib/libsherpa-mnn-core.a \
./install/lib/libkaldi-native-fbank-core.a \
./install/lib/libsherpa-mnn-fstfar.a \
./install/lib/libsherpa-mnn-fst.a \
./install/lib/libsherpa-mnn-kaldifst-core.a \
./install/lib/libkaldi-decoder-core.a \
./install/lib/libucd.a \
./install/lib/libpiper_phonemize.a \
./install/lib/libespeak-ng.a \
./install/lib/libssentencepiece_core.a
xcodebuild -create-xcframework \
-library install/lib/libsherpa-mnn.a \
-headers install/include \
-output sherpa-mnn.xcframework

View File

@ -0,0 +1,142 @@
#!/usr/bin/env bash
set -e
dir=build-visionos
mkdir -p $dir
cd $dir
if [ -z ${MNN_LIB_DIR} ]; then
echo "Please export MNN_LIB_DIR=/path/to/MNN"
exit 1
fi
# Note: We use -DENABLE_ARC=1 here to fix the linking error:
#
# The symbol _NSLog is not defined
echo "Building for simulator (arm64)"
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=${MNN_LIB_DIR} \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-S .. \
-DCMAKE_TOOLCHAIN_FILE=./toolchains/ios.toolchain.cmake \
-DPLATFORM=XRSIMULATOR \
-DENABLE_BITCODE=0 \
-DENABLE_ARC=1 \
-DENABLE_VISIBILITY=0 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=./install \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
-DDEPLOYMENT_TARGET=1.0 \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-B build/simulator
cmake --build build/simulator -j $(nproc) --verbose
cmake --build build/simulator --target install
echo "Building for arm64"
cmake \
-DSHERPA_MNN_ENABLE_BINARY=OFF \
-DMNN_LIB_DIR=${MNN_LIB_DIR} \
-DBUILD_PIPER_PHONMIZE_EXE=OFF \
-DBUILD_PIPER_PHONMIZE_TESTS=OFF \
-DBUILD_ESPEAK_NG_EXE=OFF \
-DBUILD_ESPEAK_NG_TESTS=OFF \
-S .. \
-DCMAKE_TOOLCHAIN_FILE=./toolchains/ios.toolchain.cmake \
-DPLATFORM=XROS \
-DENABLE_BITCODE=0 \
-DENABLE_ARC=1 \
-DENABLE_VISIBILITY=0 \
-DCMAKE_INSTALL_PREFIX=./install \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DSHERPA_MNN_ENABLE_PYTHON=OFF \
-DSHERPA_MNN_ENABLE_TESTS=OFF \
-DSHERPA_MNN_ENABLE_CHECK=OFF \
-DSHERPA_MNN_ENABLE_PORTAUDIO=OFF \
-DSHERPA_MNN_ENABLE_JNI=OFF \
-DSHERPA_MNN_ENABLE_C_API=ON \
-DSHERPA_MNN_ENABLE_WEBSOCKET=OFF \
-DDEPLOYMENT_TARGET=1.0 \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-B build/os64
cmake --build build/os64 -j $(nproc)
# Generate headers for sherpa-mnn.xcframework
cmake --build build/os64 --target install
echo "Generate xcframework"
# mkdir -p "build/simulator/lib"
# for f in libkaldi-native-fbank-core.a libsherpa-mnn-c-api.a libsherpa-mnn-core.a \
# libsherpa-mnn-fstfar.a libssentencepiece_core.a \
# libsherpa-mnn-fst.a libsherpa-mnn-kaldifst-core.a libkaldi-decoder-core.a \
# libucd.a libpiper_phonemize.a libespeak-ng.a; do
# lipo -create build/simulator_arm64/lib/${f} \
# build/simulator_x86_64/lib/${f} \
# -output build/simulator/lib/${f}
# done
# Merge archive first, because the following xcodebuild create xcframework
# cannot accept multi archive with the same architecture.
libtool -static -o build/simulator/sherpa-mnn.a \
build/simulator/lib/libkaldi-native-fbank-core.a \
build/simulator/lib/libsherpa-mnn-c-api.a \
build/simulator/lib/libsherpa-mnn-core.a \
build/simulator/lib/libsherpa-mnn-fstfar.a \
build/simulator/lib/libsherpa-mnn-fst.a \
build/simulator/lib/libsherpa-mnn-kaldifst-core.a \
build/simulator/lib/libkaldi-decoder-core.a \
build/simulator/lib/libucd.a \
build/simulator/lib/libpiper_phonemize.a \
build/simulator/lib/libespeak-ng.a \
build/simulator/lib/libssentencepiece_core.a
libtool -static -o build/os64/sherpa-mnn.a \
build/os64/lib/libkaldi-native-fbank-core.a \
build/os64/lib/libsherpa-mnn-c-api.a \
build/os64/lib/libsherpa-mnn-core.a \
build/os64/lib/libsherpa-mnn-fstfar.a \
build/os64/lib/libsherpa-mnn-fst.a \
build/os64/lib/libsherpa-mnn-kaldifst-core.a \
build/os64/lib/libkaldi-decoder-core.a \
build/os64/lib/libucd.a \
build/os64/lib/libpiper_phonemize.a \
build/os64/lib/libespeak-ng.a \
build/os64/lib/libssentencepiece_core.a
rm -rf sherpa-mnn.xcframework
xcodebuild -create-xcframework \
-library "build/os64/sherpa-mnn.a" \
-library "build/simulator/sherpa-mnn.a" \
-output sherpa-mnn.xcframework
# Copy Headers
mkdir -p sherpa-mnn.xcframework/Headers
cp -av install/include/* sherpa-mnn.xcframework/Headers
pushd sherpa-mnn.xcframework/xros-arm64-simulator
ln -s sherpa-mnn.a libsherpa-mnn.a
popd
pushd sherpa-mnn.xcframework/xros-arm64
ln -s sherpa-mnn.a libsherpa-mnn.a

View File

@ -0,0 +1,106 @@
include(cargs)
include_directories(${CMAKE_SOURCE_DIR})
add_executable(decode-file-c-api decode-file-c-api.c)
target_link_libraries(decode-file-c-api sherpa-mnn-c-api cargs)
add_executable(kws-c-api kws-c-api.c)
target_link_libraries(kws-c-api sherpa-mnn-c-api)
add_executable(speech-enhancement-gtcrn-c-api speech-enhancement-gtcrn-c-api.c)
target_link_libraries(speech-enhancement-gtcrn-c-api sherpa-mnn-c-api)
if(SHERPA_MNN_ENABLE_TTS)
add_executable(offline-tts-c-api offline-tts-c-api.c)
target_link_libraries(offline-tts-c-api sherpa-mnn-c-api cargs)
add_executable(matcha-tts-zh-c-api matcha-tts-zh-c-api.c)
target_link_libraries(matcha-tts-zh-c-api sherpa-mnn-c-api)
add_executable(matcha-tts-en-c-api matcha-tts-en-c-api.c)
target_link_libraries(matcha-tts-en-c-api sherpa-mnn-c-api)
add_executable(kokoro-tts-en-c-api kokoro-tts-en-c-api.c)
target_link_libraries(kokoro-tts-en-c-api sherpa-mnn-c-api)
add_executable(kokoro-tts-zh-en-c-api kokoro-tts-zh-en-c-api.c)
target_link_libraries(kokoro-tts-zh-en-c-api sherpa-mnn-c-api)
endif()
if(SHERPA_MNN_ENABLE_SPEAKER_DIARIZATION)
add_executable(offline-speaker-diarization-c-api offline-speaker-diarization-c-api.c)
target_link_libraries(offline-speaker-diarization-c-api sherpa-mnn-c-api)
endif()
add_executable(spoken-language-identification-c-api spoken-language-identification-c-api.c)
target_link_libraries(spoken-language-identification-c-api sherpa-mnn-c-api)
add_executable(speaker-identification-c-api speaker-identification-c-api.c)
target_link_libraries(speaker-identification-c-api sherpa-mnn-c-api)
add_executable(streaming-hlg-decode-file-c-api streaming-hlg-decode-file-c-api.c)
target_link_libraries(streaming-hlg-decode-file-c-api sherpa-mnn-c-api)
add_executable(audio-tagging-c-api audio-tagging-c-api.c)
target_link_libraries(audio-tagging-c-api sherpa-mnn-c-api)
add_executable(add-punctuation-c-api add-punctuation-c-api.c)
target_link_libraries(add-punctuation-c-api sherpa-mnn-c-api)
add_executable(whisper-c-api whisper-c-api.c)
target_link_libraries(whisper-c-api sherpa-mnn-c-api)
add_executable(fire-red-asr-c-api fire-red-asr-c-api.c)
target_link_libraries(fire-red-asr-c-api sherpa-mnn-c-api)
add_executable(sense-voice-c-api sense-voice-c-api.c)
target_link_libraries(sense-voice-c-api sherpa-mnn-c-api)
add_executable(moonshine-c-api moonshine-c-api.c)
target_link_libraries(moonshine-c-api sherpa-mnn-c-api)
add_executable(zipformer-c-api zipformer-c-api.c)
target_link_libraries(zipformer-c-api sherpa-mnn-c-api)
add_executable(streaming-zipformer-c-api streaming-zipformer-c-api.c)
target_link_libraries(streaming-zipformer-c-api sherpa-mnn-c-api)
add_executable(paraformer-c-api paraformer-c-api.c)
target_link_libraries(paraformer-c-api sherpa-mnn-c-api)
add_executable(streaming-paraformer-c-api streaming-paraformer-c-api.c)
target_link_libraries(streaming-paraformer-c-api sherpa-mnn-c-api)
add_executable(telespeech-c-api telespeech-c-api.c)
target_link_libraries(telespeech-c-api sherpa-mnn-c-api)
add_executable(vad-sense-voice-c-api vad-sense-voice-c-api.c)
target_link_libraries(vad-sense-voice-c-api sherpa-mnn-c-api)
add_executable(vad-whisper-c-api vad-whisper-c-api.c)
target_link_libraries(vad-whisper-c-api sherpa-mnn-c-api)
add_executable(vad-moonshine-c-api vad-moonshine-c-api.c)
target_link_libraries(vad-moonshine-c-api sherpa-mnn-c-api)
add_executable(streaming-zipformer-buffered-tokens-hotwords-c-api
streaming-zipformer-buffered-tokens-hotwords-c-api.c)
target_link_libraries(streaming-zipformer-buffered-tokens-hotwords-c-api sherpa-mnn-c-api)
add_executable(streaming-paraformer-buffered-tokens-c-api
streaming-paraformer-buffered-tokens-c-api.c)
target_link_libraries(streaming-paraformer-buffered-tokens-c-api sherpa-mnn-c-api)
add_executable(streaming-ctc-buffered-tokens-c-api
streaming-ctc-buffered-tokens-c-api.c)
target_link_libraries(streaming-ctc-buffered-tokens-c-api sherpa-mnn-c-api)
add_executable(keywords-spotter-buffered-tokens-keywords-c-api
keywords-spotter-buffered-tokens-keywords-c-api.c)
target_link_libraries(keywords-spotter-buffered-tokens-keywords-c-api sherpa-mnn-c-api)
if(SHERPA_MNN_HAS_ALSA)
add_subdirectory(./asr-microphone-example)
elseif((UNIX AND NOT APPLE) OR LINUX)
message(WARNING "Not include ./asr-microphone-example since alsa is not available")
endif()

View File

@ -0,0 +1,18 @@
# Introduction
This folder contains C API examples for [sherpa-onnx][sherpa-onnx].
Please refer to the documentation
https://k2-fsa.github.io/sherpa/onnx/c-api/index.html
for details.
## File descriptions
- [decode-file-c-api.c](./decode-file-c-api.c) This file shows how to use the C API
for speech recognition with a streaming model.
- [offline-tts-c-api.c](./offline-tts-c-api.c) This file shows how to use the C API
to convert text to speech with a non-streaming model.
[sherpa-onnx]: https://github.com/k2-fsa/sherpa-onnx

View File

@ -0,0 +1,67 @@
// c-api-examples/add-punctuation-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// We assume you have pre-downloaded the model files for testing
// from https://github.com/k2-fsa/sherpa-onnx/releases/tag/punctuation-models
//
// An example is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/punctuation-models/sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
// tar xvf sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
// rm sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
SherpaMnnOfflinePunctuationConfig config;
memset(&config, 0, sizeof(config));
// clang-format off
config.model.ct_transformer = "./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx";
// clang-format on
config.model.num_threads = 1;
config.model.debug = 1;
config.model.provider = "cpu";
const SherpaMnnOfflinePunctuation *punct =
SherpaMnnCreateOfflinePunctuation(&config);
if (!punct) {
fprintf(stderr,
"Failed to create OfflinePunctuation. Please check your config");
return -1;
}
const char *texts[] = {
"这是一个测试你好吗How are you我很好thank you are you ok谢谢你",
"我们都是木头人不会说话不会动",
("The African blogosphere is rapidly expanding bringing more voices "
"online in the form of commentaries opinions analyses rants and poetry"),
};
int32_t n = sizeof(texts) / sizeof(const char *);
fprintf(stderr, "n: %d\n", n);
fprintf(stderr, "--------------------\n");
for (int32_t i = 0; i != n; ++i) {
const char *text_with_punct =
SherpaOfflinePunctuationAddPunct(punct, texts[i]);
fprintf(stderr, "Input text: %s\n", texts[i]);
fprintf(stderr, "Output text: %s\n", text_with_punct);
SherpaOfflinePunctuationFreeText(text_with_punct);
fprintf(stderr, "--------------------\n");
}
SherpaMnnDestroyOfflinePunctuation(punct);
return 0;
};

View File

@ -0,0 +1,9 @@
add_executable(c-api-alsa c-api-alsa.cc alsa.cc)
target_link_libraries(c-api-alsa sherpa-onnx-c-api cargs)
if(DEFINED ENV{SHERPA_MNN_ALSA_LIB_DIR})
target_link_libraries(c-api-alsa -L$ENV{SHERPA_MNN_ALSA_LIB_DIR} -lasound)
else()
target_link_libraries(c-api-alsa asound)
endif()

View File

@ -0,0 +1 @@
exclude_files=alsa.cc|alsa.h

View File

@ -0,0 +1,12 @@
# Introduction
This folder contains examples for real-time speech recognition from a microphone
using sherpa-onnx C API.
**Note**: You can call C API from C++ files.
## ./c-api-alsa.cc
This file uses alsa to read a microphone. It runs only on Linux. This file
does not support macOS or Windows.

View File

@ -0,0 +1 @@
../../sherpa-onnx/csrc/alsa.cc

View File

@ -0,0 +1 @@
../../sherpa-onnx/csrc/alsa.h

View File

@ -0,0 +1,259 @@
// c-api-examples/asr-microphone-example/c-api-alsa.cc
// Copyright (c) 2022-2024 Xiaomi Corporation
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <algorithm>
#include <cctype> // std::tolower
#include <cstdint>
#include <string>
#include "c-api-examples/asr-microphone-example/alsa.h"
// NOTE: You don't need to use cargs.h in your own project.
// We use it in this file to parse commandline arguments
#include "cargs.h" // NOLINT
#include "sherpa-mnn/c-api/c-api.h"
static struct cag_option options[] = {
{/*.identifier =*/'h',
/*.access_letters =*/"h",
/*.access_name =*/"help",
/*.value_name =*/"help",
/*.description =*/"Show help"},
{/*.identifier =*/'t',
/*.access_letters =*/NULL,
/*.access_name =*/"tokens",
/*.value_name =*/"tokens",
/*.description =*/"Tokens file"},
{/*.identifier =*/'e',
/*.access_letters =*/NULL,
/*.access_name =*/"encoder",
/*.value_name =*/"encoder",
/*.description =*/"Encoder ONNX file"},
{/*.identifier =*/'d',
/*.access_letters =*/NULL,
/*.access_name =*/"decoder",
/*.value_name =*/"decoder",
/*.description =*/"Decoder ONNX file"},
{/*.identifier =*/'j',
/*.access_letters =*/NULL,
/*.access_name =*/"joiner",
/*.value_name =*/"joiner",
/*.description =*/"Joiner ONNX file"},
{/*.identifier =*/'n',
/*.access_letters =*/NULL,
/*.access_name =*/"num-threads",
/*.value_name =*/"num-threads",
/*.description =*/"Number of threads"},
{/*.identifier =*/'p',
/*.access_letters =*/NULL,
/*.access_name =*/"provider",
/*.value_name =*/"provider",
/*.description =*/"Provider: cpu (default), cuda, coreml"},
{/*.identifier =*/'m',
/*.access_letters =*/NULL,
/*.access_name =*/"decoding-method",
/*.value_name =*/"decoding-method",
/*.description =*/
"Decoding method: greedy_search (default), modified_beam_search"},
{/*.identifier =*/'f',
/*.access_letters =*/NULL,
/*.access_name =*/"hotwords-file",
/*.value_name =*/"hotwords-file",
/*.description =*/
"The file containing hotwords, one words/phrases per line, and for each "
"phrase the bpe/cjkchar are separated by a space. For example: ▁HE LL O "
"▁WORLD, 你 好 世 界"},
{/*.identifier =*/'s',
/*.access_letters =*/NULL,
/*.access_name =*/"hotwords-score",
/*.value_name =*/"hotwords-score",
/*.description =*/
"The bonus score for each token in hotwords. Used only when "
"decoding_method is modified_beam_search"},
};
const char *kUsage =
R"(
Usage:
./bin/c-api-alsa \
--tokens=/path/to/tokens.txt \
--encoder=/path/to/encoder.onnx \
--decoder=/path/to/decoder.onnx \
--joiner=/path/to/decoder.onnx \
device_name
The device name specifies which microphone to use in case there are several
on your system. You can use
arecord -l
to find all available microphones on your computer. For instance, if it outputs
**** List of CAPTURE Hardware Devices ****
card 3: UACDemoV10 [UACDemoV1.0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
and if you want to select card 3 and device 0 on that card, please use:
plughw:3,0
as the device_name.
)";
bool stop = false;
static void Handler(int sig) {
stop = true;
fprintf(stderr, "\nCaught Ctrl + C. Exiting...\n");
}
int32_t main(int32_t argc, char *argv[]) {
if (argc < 6) {
fprintf(stderr, "%s\n", kUsage);
exit(0);
}
signal(SIGINT, Handler);
SherpaMnnOnlineRecognizerConfig config;
memset(&config, 0, sizeof(config));
config.model_config.debug = 0;
config.model_config.num_threads = 1;
config.model_config.provider = "cpu";
config.decoding_method = "greedy_search";
config.max_active_paths = 4;
config.feat_config.sample_rate = 16000;
config.feat_config.feature_dim = 80;
config.enable_endpoint = 1;
config.rule1_min_trailing_silence = 2.4;
config.rule2_min_trailing_silence = 1.2;
config.rule3_min_utterance_length = 300;
cag_option_context context;
char identifier;
const char *value;
cag_option_prepare(&context, options, CAG_ARRAY_SIZE(options), argc, argv);
while (cag_option_fetch(&context)) {
identifier = cag_option_get(&context);
value = cag_option_get_value(&context);
switch (identifier) {
case 't':
config.model_config.tokens = value;
break;
case 'e':
config.model_config.transducer.encoder = value;
break;
case 'd':
config.model_config.transducer.decoder = value;
break;
case 'j':
config.model_config.transducer.joiner = value;
break;
case 'n':
config.model_config.num_threads = atoi(value);
break;
case 'p':
config.model_config.provider = value;
break;
case 'm':
config.decoding_method = value;
break;
case 'f':
config.hotwords_file = value;
break;
case 's':
config.hotwords_score = atof(value);
break;
case 'h': {
fprintf(stderr, "%s\n", kUsage);
exit(0);
break;
}
default:
// do nothing as config already has valid default values
break;
}
}
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&config);
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
const char *device_name = argv[context.index];
sherpa_mnn::Alsa alsa(device_name);
fprintf(stderr, "Use recording device: %s\n", device_name);
fprintf(stderr,
"Please \033[32m\033[1mspeak\033[0m! Press \033[31m\033[1mCtrl + "
"C\033[0m to exit\n");
int32_t expected_sample_rate = 16000;
if (alsa.GetExpectedSampleRate() != expected_sample_rate) {
fprintf(stderr, "sample rate: %d != %d\n", alsa.GetExpectedSampleRate(),
expected_sample_rate);
exit(-1);
}
int32_t chunk = 0.1 * alsa.GetActualSampleRate();
std::string last_text;
int32_t segment_index = 0;
while (!stop) {
const std::vector<float> &samples = alsa.Read(chunk);
SherpaMnnOnlineStreamAcceptWaveform(stream, expected_sample_rate,
samples.data(), samples.size());
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
std::string text = r->text;
SherpaMnnDestroyOnlineRecognizerResult(r);
if (!text.empty() && last_text != text) {
last_text = text;
std::transform(text.begin(), text.end(), text.begin(),
[](auto c) { return std::tolower(c); });
SherpaMnnPrint(display, segment_index, text.c_str());
fflush(stderr);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (!text.empty()) {
++segment_index;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
}
// free allocated resources
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,79 @@
// c-api-examples/audio-tagging-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// We assume you have pre-downloaded the model files for testing
// from https://github.com/k2-fsa/sherpa-onnx/releases/tag/audio-tagging-models
//
// An example is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
// tar xvf sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
// rm sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
SherpaMnnAudioTaggingConfig config;
memset(&config, 0, sizeof(config));
config.model.zipformer.model =
"./sherpa-onnx-zipformer-audio-tagging-2024-04-09/model.int8.onnx";
config.model.num_threads = 1;
config.model.debug = 1;
config.model.provider = "cpu";
// clang-format off
config.labels = "./sherpa-onnx-zipformer-audio-tagging-2024-04-09/class_labels_indices.csv";
// clang-format on
const SherpaMnnAudioTagging *tagger = SherpaMnnCreateAudioTagging(&config);
if (!tagger) {
fprintf(stderr, "Failed to create audio tagger. Please check your config");
return -1;
}
// You can find more test waves from
// https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-audio-tagging-2024-04-09.tar.bz2
const char *wav_filename =
"./sherpa-onnx-zipformer-audio-tagging-2024-04-09/test_wavs/1.wav";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnAudioTaggingCreateOfflineStream(tagger);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
int32_t top_k = 5;
const SherpaMnnAudioEvent *const *results =
SherpaMnnAudioTaggingCompute(tagger, stream, top_k);
fprintf(stderr, "--------------------------------------------------\n");
fprintf(stderr, "Index\t\tProbability\t\tEvent name\n");
fprintf(stderr, "--------------------------------------------------\n");
for (int32_t i = 0; i != top_k; ++i) {
fprintf(stderr, "%d\t\t%.3f\t\t\t%s\n", i, results[i]->prob,
results[i]->name);
}
fprintf(stderr, "--------------------------------------------------\n");
SherpaMnnAudioTaggingFreeResults(results);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnFreeWave(wave);
SherpaMnnDestroyAudioTagging(tagger);
return 0;
};

View File

@ -0,0 +1,244 @@
// c-api-examples/decode-file-c-api.c
//
// Copyright (c) 2023 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// to decode a file.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "cargs.h"
#include "sherpa-mnn/c-api/c-api.h"
static struct cag_option options[] = {
{.identifier = 'h',
.access_letters = "h",
.access_name = "help",
.description = "Show help"},
{.identifier = 't',
.access_letters = NULL,
.access_name = "tokens",
.value_name = "tokens",
.description = "Tokens file"},
{.identifier = 'e',
.access_letters = NULL,
.access_name = "encoder",
.value_name = "encoder",
.description = "Encoder ONNX file"},
{.identifier = 'd',
.access_letters = NULL,
.access_name = "decoder",
.value_name = "decoder",
.description = "Decoder ONNX file"},
{.identifier = 'j',
.access_letters = NULL,
.access_name = "joiner",
.value_name = "joiner",
.description = "Joiner ONNX file"},
{.identifier = 'n',
.access_letters = NULL,
.access_name = "num-threads",
.value_name = "num-threads",
.description = "Number of threads"},
{.identifier = 'p',
.access_letters = NULL,
.access_name = "provider",
.value_name = "provider",
.description = "Provider: cpu (default), cuda, coreml"},
{.identifier = 'm',
.access_letters = NULL,
.access_name = "decoding-method",
.value_name = "decoding-method",
.description =
"Decoding method: greedy_search (default), modified_beam_search"},
{.identifier = 'f',
.access_letters = NULL,
.access_name = "hotwords-file",
.value_name = "hotwords-file",
.description = "The file containing hotwords, one words/phrases per line, "
"and for each phrase the bpe/cjkchar are separated by a "
"space. For example: ▁HE LL O ▁WORLD, 你 好 世 界"},
{.identifier = 's',
.access_letters = NULL,
.access_name = "hotwords-score",
.value_name = "hotwords-score",
.description = "The bonus score for each token in hotwords. Used only "
"when decoding_method is modified_beam_search"},
};
const char *kUsage =
"\n"
"Usage:\n "
" ./bin/decode-file-c-api \\\n"
" --tokens=/path/to/tokens.txt \\\n"
" --encoder=/path/to/encoder.onnx \\\n"
" --decoder=/path/to/decoder.onnx \\\n"
" --joiner=/path/to/joiner.onnx \\\n"
" --provider=cpu \\\n"
" /path/to/foo.wav\n"
"\n\n"
"Default num_threads is 1.\n"
"Valid decoding_method: greedy_search (default), modified_beam_search\n\n"
"Valid provider: cpu (default), cuda, coreml\n\n"
"Please refer to \n"
"https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/"
"index.html\n"
"for a list of pre-trained models to download.\n"
"\n"
"Note that this file supports only streaming transducer models.\n";
int32_t main(int32_t argc, char *argv[]) {
if (argc < 6) {
fprintf(stderr, "%s\n", kUsage);
exit(0);
}
SherpaMnnOnlineRecognizerConfig config;
memset(&config, 0, sizeof(config));
config.model_config.debug = 0;
config.model_config.num_threads = 1;
config.model_config.provider = "cpu";
config.decoding_method = "greedy_search";
config.max_active_paths = 4;
config.feat_config.sample_rate = 16000;
config.feat_config.feature_dim = 80;
config.enable_endpoint = 1;
config.rule1_min_trailing_silence = 2.4;
config.rule2_min_trailing_silence = 1.2;
config.rule3_min_utterance_length = 300;
cag_option_context context;
char identifier;
const char *value;
cag_option_prepare(&context, options, CAG_ARRAY_SIZE(options), argc, argv);
while (cag_option_fetch(&context)) {
identifier = cag_option_get(&context);
value = cag_option_get_value(&context);
switch (identifier) {
case 't':
config.model_config.tokens = value;
break;
case 'e':
config.model_config.transducer.encoder = value;
break;
case 'd':
config.model_config.transducer.decoder = value;
break;
case 'j':
config.model_config.transducer.joiner = value;
break;
case 'n':
config.model_config.num_threads = atoi(value);
break;
case 'p':
config.model_config.provider = value;
break;
case 'm':
config.decoding_method = value;
break;
case 'f':
config.hotwords_file = value;
break;
case 's':
config.hotwords_score = atof(value);
break;
case 'h': {
fprintf(stderr, "%s\n", kUsage);
exit(0);
break;
}
default:
// do nothing as config already has valid default values
break;
}
}
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&config);
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
const char *wav_filename = argv[context.index];
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// simulate streaming
#define N 3200 // 0.2 s. Sample rate is fixed to 16 kHz
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,84 @@
// c-api-examples/fire-red-asr-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// We assume you have pre-downloaded the FireRedAsr model
// from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
// An example is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
// tar xvf sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
// rm sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx";
const char *decoder_filename =
"sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.fire_red_asr.encoder = encoder_filename;
offline_model_config.fire_red_asr.decoder = decoder_filename;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,196 @@
// c-api-examples/keywords-spotter-buffered-tokens-keywords-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// Copyright (c) 2024 Luo Xiao
//
// This file demonstrates how to use keywords spotter with sherpa-onnx's C
// API and with tokens and keywords loaded from buffered strings instead of from
// external files API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static size_t ReadFile(const char *filename, const char **buffer_out) {
FILE *file = fopen(filename, "r");
if (file == NULL) {
fprintf(stderr, "Failed to open %s\n", filename);
return -1;
}
fseek(file, 0L, SEEK_END);
long size = ftell(file);
rewind(file);
*buffer_out = malloc(size);
if (*buffer_out == NULL) {
fclose(file);
fprintf(stderr, "Memory error\n");
return -1;
}
size_t read_bytes = fread((void *)*buffer_out, 1, size, file);
if (read_bytes != size) {
printf("Errors occured in reading the file %s\n", filename);
free((void *)*buffer_out);
*buffer_out = NULL;
fclose(file);
return -1;
}
fclose(file);
return read_bytes;
}
int32_t main() {
const char *wav_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/test_wavs/"
"6.wav";
const char *encoder_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
const char *decoder_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"decoder-epoch-12-avg-2-chunk-16-left-64.onnx";
const char *joiner_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
const char *provider = "cpu";
const char *tokens_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/tokens.txt";
const char *keywords_filename =
"sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/test_wavs/"
"test_keywords.txt";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// reading tokens and keywords to buffers
const char *tokens_buf;
size_t token_buf_size = ReadFile(tokens_filename, &tokens_buf);
if (token_buf_size < 1) {
fprintf(stderr, "Please check your tokens.txt!\n");
free((void *)tokens_buf);
return -1;
}
const char *keywords_buf;
size_t keywords_buf_size = ReadFile(keywords_filename, &keywords_buf);
if (keywords_buf_size < 1) {
fprintf(stderr, "Please check your keywords.txt!\n");
free((void *)keywords_buf);
return -1;
}
// Zipformer config
SherpaMnnOnlineTransducerModelConfig zipformer_config;
memset(&zipformer_config, 0, sizeof(zipformer_config));
zipformer_config.encoder = encoder_filename;
zipformer_config.decoder = decoder_filename;
zipformer_config.joiner = joiner_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens_buf = tokens_buf;
online_model_config.tokens_buf_size = token_buf_size;
online_model_config.transducer = zipformer_config;
// Keywords-spotter config
SherpaMnnKeywordSpotterConfig keywords_spotter_config;
memset(&keywords_spotter_config, 0, sizeof(keywords_spotter_config));
keywords_spotter_config.max_active_paths = 4;
keywords_spotter_config.keywords_threshold = 0.1;
keywords_spotter_config.keywords_score = 3.0;
keywords_spotter_config.model_config = online_model_config;
keywords_spotter_config.keywords_buf = keywords_buf;
keywords_spotter_config.keywords_buf_size = keywords_buf_size;
const SherpaMnnKeywordSpotter *keywords_spotter =
SherpaMnnCreateKeywordSpotter(&keywords_spotter_config);
free((void *)tokens_buf);
tokens_buf = NULL;
free((void *)keywords_buf);
keywords_buf = NULL;
if (keywords_spotter == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateKeywordStream(keywords_spotter);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsKeywordStreamReady(keywords_spotter, stream)) {
SherpaMnnDecodeKeywordStream(keywords_spotter, stream);
}
const SherpaMnnKeywordResult *r =
SherpaMnnGetKeywordResult(keywords_spotter, stream);
if (strlen(r->keyword)) {
SherpaMnnPrint(display, segment_id, r->keyword);
}
SherpaMnnDestroyKeywordResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsKeywordStreamReady(keywords_spotter, stream)) {
SherpaMnnDecodeKeywordStream(keywords_spotter, stream);
}
const SherpaMnnKeywordResult *r =
SherpaMnnGetKeywordResult(keywords_spotter, stream);
if (strlen(r->keyword)) {
SherpaMnnPrint(display, segment_id, r->keyword);
}
SherpaMnnDestroyKeywordResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyKeywordSpotter(keywords_spotter);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,84 @@
// c-api-examples/kokoro-tts-en-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// for English TTS with Kokoro.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-en-v0_19.tar.bz2
tar xf kokoro-en-v0_19.tar.bz2
rm kokoro-en-v0_19.tar.bz2
./kokoro-tts-en-c-api
*/
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
SherpaMnnOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.kokoro.model = "./kokoro-en-v0_19/model.onnx";
config.model.kokoro.voices = "./kokoro-en-v0_19/voices.bin";
config.model.kokoro.tokens = "./kokoro-en-v0_19/tokens.txt";
config.model.kokoro.data_dir = "./kokoro-en-v0_19/espeak-ng-data";
config.model.num_threads = 2;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
const char *filename = "./generated-kokoro-en.wav";
const char *text =
"Today as always, men fall into two groups: slaves and free men. Whoever "
"does not have two-thirds of his day for himself, is a slave, whatever "
"he may be: a statesman, a businessman, an official, or a scholar. "
"Friends fell out often because life was changing so fast. The easiest "
"thing in the world was to lose touch with someone.";
const SherpaMnnOfflineTts *tts = SherpaMnnCreateOfflineTts(&config);
// mapping of sid to voice name
// 0->af, 1->af_bella, 2->af_nicole, 3->af_sarah, 4->af_sky, 5->am_adam
// 6->am_michael, 7->bf_emma, 8->bf_isabella, 9->bm_george, 10->bm_lewis
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerate(tts, text, sid, speed);
#else
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerateWithProgressCallback(tts, text, sid, speed,
ProgressCallback);
#endif
SherpaMnnWriteWave(audio->samples, audio->n, audio->sample_rate, filename);
SherpaMnnDestroyOfflineTtsGeneratedAudio(audio);
SherpaMnnDestroyOfflineTts(tts);
fprintf(stderr, "Input text is: %s\n", text);
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename);
return 0;
}

View File

@ -0,0 +1,82 @@
// c-api-examples/kokoro-tts-zh-en-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// for English + Chinese TTS with Kokoro.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-multi-lang-v1_0.tar.bz2
tar xf kokoro-multi-lang-v1_0.tar.bz2
rm kokoro-multi-lang-v1_0.tar.bz2
./kokoro-tts-zh-en-c-api
*/
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
SherpaMnnOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.kokoro.model = "./kokoro-multi-lang-v1_0/model.onnx";
config.model.kokoro.voices = "./kokoro-multi-lang-v1_0/voices.bin";
config.model.kokoro.tokens = "./kokoro-multi-lang-v1_0/tokens.txt";
config.model.kokoro.data_dir = "./kokoro-multi-lang-v1_0/espeak-ng-data";
config.model.kokoro.dict_dir = "./kokoro-multi-lang-v1_0/dict";
config.model.kokoro.lexicon =
"./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/"
"lexicon-zh.txt";
config.model.num_threads = 2;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
const char *filename = "./generated-kokoro-zh-en.wav";
const char *text =
"中英文语音合成测试。This is generated by next generation Kaldi using "
"Kokoro without Misaki. 你觉得中英文说的如何呢?";
const SherpaMnnOfflineTts *tts = SherpaMnnCreateOfflineTts(&config);
int32_t sid = 0; // there are 53 speakers
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerate(tts, text, sid, speed);
#else
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerateWithProgressCallback(tts, text, sid, speed,
ProgressCallback);
#endif
SherpaMnnWriteWave(audio->samples, audio->n, audio->sample_rate, filename);
SherpaMnnDestroyOfflineTtsGeneratedAudio(audio);
SherpaMnnDestroyOfflineTts(tts);
fprintf(stderr, "Input text is: %s\n", text);
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename);
return 0;
}

View File

@ -0,0 +1,152 @@
// c-api-examples/kws-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
//
// This file demonstrates how to use keywords spotter with sherpa-onnx's C
// clang-format off
//
// Usage
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
//
// ./kws-c-api
//
// clang-format on
#include <stdio.h>
#include <stdlib.h> // exit
#include <string.h> // memset
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
SherpaMnnKeywordSpotterConfig config;
memset(&config, 0, sizeof(config));
config.model_config.transducer.encoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
config.model_config.transducer.decoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"decoder-epoch-12-avg-2-chunk-16-left-64.onnx";
config.model_config.transducer.joiner =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"tokens.txt";
config.model_config.provider = "cpu";
config.model_config.num_threads = 1;
config.model_config.debug = 1;
config.keywords_file =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"test_wavs/test_keywords.txt";
const SherpaMnnKeywordSpotter *kws = SherpaMnnCreateKeywordSpotter(&config);
if (!kws) {
fprintf(stderr, "Please check your config");
exit(-1);
}
fprintf(stderr,
"--Test pre-defined keywords from test_wavs/test_keywords.txt--\n");
const char *wav_filename =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"test_wavs/3.wav";
float tail_paddings[8000] = {0}; // 0.5 seconds
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
exit(-1);
}
const SherpaMnnOnlineStream *stream = SherpaMnnCreateKeywordStream(kws);
if (!stream) {
fprintf(stderr, "Failed to create stream\n");
exit(-1);
}
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsKeywordStreamReady(kws, stream)) {
SherpaMnnDecodeKeywordStream(kws, stream);
const SherpaMnnKeywordResult *r = SherpaMnnGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);
// Remember to reset the keyword stream right after a keyword is detected
SherpaMnnResetKeywordStream(kws, stream);
}
SherpaMnnDestroyKeywordResult(r);
}
SherpaMnnDestroyOnlineStream(stream);
// --------------------------------------------------------------------------
fprintf(stderr, "--Use pre-defined keywords + add a new keyword--\n");
stream = SherpaMnnCreateKeywordStreamWithKeywords(kws, "y ǎn y uán @演员");
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsKeywordStreamReady(kws, stream)) {
SherpaMnnDecodeKeywordStream(kws, stream);
const SherpaMnnKeywordResult *r = SherpaMnnGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);
// Remember to reset the keyword stream
SherpaMnnResetKeywordStream(kws, stream);
}
SherpaMnnDestroyKeywordResult(r);
}
SherpaMnnDestroyOnlineStream(stream);
// --------------------------------------------------------------------------
fprintf(stderr, "--Use pre-defined keywords + add two new keywords--\n");
stream = SherpaMnnCreateKeywordStreamWithKeywords(
kws, "y ǎn y uán @演员/zh ī m íng @知名");
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
sizeof(tail_paddings) / sizeof(float));
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsKeywordStreamReady(kws, stream)) {
SherpaMnnDecodeKeywordStream(kws, stream);
const SherpaMnnKeywordResult *r = SherpaMnnGetKeywordResult(kws, stream);
if (r && r->json && strlen(r->keyword)) {
fprintf(stderr, "Detected keyword: %s\n", r->json);
// Remember to reset the keyword stream
SherpaMnnResetKeywordStream(kws, stream);
}
SherpaMnnDestroyKeywordResult(r);
}
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnFreeWave(wave);
SherpaMnnDestroyKeywordSpotter(kws);
return 0;
}

View File

@ -0,0 +1,87 @@
// c-api-examples/matcha-tts-en-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// for English TTS with MatchaTTS.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2
tar xvf matcha-icefall-en_US-ljspeech.tar.bz2
rm matcha-icefall-en_US-ljspeech.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
./matcha-tts-en-c-api
*/
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
SherpaMnnOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.matcha.acoustic_model =
"./matcha-icefall-en_US-ljspeech/model-steps-3.onnx";
config.model.matcha.vocoder = "./hifigan_v2.onnx";
config.model.matcha.tokens = "./matcha-icefall-en_US-ljspeech/tokens.txt";
config.model.matcha.data_dir =
"./matcha-icefall-en_US-ljspeech/espeak-ng-data";
config.model.num_threads = 1;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
const char *filename = "./generated-matcha-en.wav";
const char *text =
"Today as always, men fall into two groups: slaves and free men. Whoever "
"does not have two-thirds of his day for himself, is a slave, whatever "
"he may be: a statesman, a businessman, an official, or a scholar. "
"Friends fell out often because life was changing so fast. The easiest "
"thing in the world was to lose touch with someone.";
const SherpaMnnOfflineTts *tts = SherpaMnnCreateOfflineTts(&config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerate(tts, text, sid, speed);
#else
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerateWithProgressCallback(tts, text, sid, speed,
ProgressCallback);
#endif
SherpaMnnWriteWave(audio->samples, audio->n, audio->sample_rate, filename);
SherpaMnnDestroyOfflineTtsGeneratedAudio(audio);
SherpaMnnDestroyOfflineTts(tts);
fprintf(stderr, "Input text is: %s\n", text);
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename);
return 0;
}

View File

@ -0,0 +1,87 @@
// c-api-examples/matcha-tts-zh-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// for Chinese TTS with MatchaTTS.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
rm matcha-icefall-zh-baker.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
./matcha-tts-zh-c-api
*/
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
SherpaMnnOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
config.model.matcha.acoustic_model =
"./matcha-icefall-zh-baker/model-steps-3.onnx";
config.model.matcha.vocoder = "./hifigan_v2.onnx";
config.model.matcha.lexicon = "./matcha-icefall-zh-baker/lexicon.txt";
config.model.matcha.tokens = "./matcha-icefall-zh-baker/tokens.txt";
config.model.matcha.dict_dir = "./matcha-icefall-zh-baker/dict";
config.model.num_threads = 1;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
// clang-format off
config.rule_fsts = "./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst";
// clang-format on
const char *filename = "./generated-matcha-zh.wav";
const char *text =
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如"
"涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感"
"受着生命的奇迹与温柔."
"某某银行的副行长和一些行政领导表示,他们去过长江和长白山; "
"经济不断增长。2024年12月31号拨打110或者18920240511。123456块钱。";
const SherpaMnnOfflineTts *tts = SherpaMnnCreateOfflineTts(&config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerate(tts, text, sid, speed);
#else
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerateWithProgressCallback(tts, text, sid, speed,
ProgressCallback);
#endif
SherpaMnnWriteWave(audio->samples, audio->n, audio->sample_rate, filename);
SherpaMnnDestroyOfflineTtsGeneratedAudio(audio);
SherpaMnnDestroyOfflineTts(tts);
fprintf(stderr, "Input text is: %s\n", text);
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename);
return 0;
}

View File

@ -0,0 +1,83 @@
// c-api-examples/moonshine-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use Moonshine tiny with sherpa-onnx's C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// tar xvf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// rm sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav";
const char *preprocessor =
"./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx";
const char *encoder = "./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx";
const char *uncached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx";
const char *cached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx";
const char *tokens = "./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = "cpu";
offline_model_config.tokens = tokens;
offline_model_config.moonshine.preprocessor = preprocessor;
offline_model_config.moonshine.encoder = encoder;
offline_model_config.moonshine.uncached_decoder = uncached_decoder;
offline_model_config.moonshine.cached_decoder = cached_decoder;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,131 @@
// c-api-examples/offline-sepaker-diarization-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to implement speaker diarization with
// sherpa-onnx's C API.
// clang-format off
/*
Usage:
Step 1: Download a speaker segmentation model
Please visit https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
for a list of available models. The following is an example
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
tar xvf sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
rm sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
Step 2: Download a speaker embedding extractor model
Please visit https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
for a list of available models. The following is an example
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx
Step 3. Download test wave files
Please visit https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
for a list of available test wave files. The following is an example
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/0-four-speakers-zh.wav
Step 4. Run it
*/
// clang-format on
#include <stdio.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static int32_t ProgressCallback(int32_t num_processed_chunks,
int32_t num_total_chunks, void *arg) {
float progress = 100.0 * num_processed_chunks / num_total_chunks;
fprintf(stderr, "progress %.2f%%\n", progress);
// the return value is currently ignored
return 0;
}
int main() {
// Please see the comments at the start of this file for how to download
// the .onnx file and .wav files below
const char *segmentation_model =
"./sherpa-onnx-pyannote-segmentation-3-0/model.onnx";
const char *embedding_extractor_model =
"./3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx";
const char *wav_filename = "./0-four-speakers-zh.wav";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
SherpaMnnOfflineSpeakerDiarizationConfig config;
memset(&config, 0, sizeof(config));
config.segmentation.pyannote.model = segmentation_model;
config.embedding.model = embedding_extractor_model;
// the test wave ./0-four-speakers-zh.wav has 4 speakers, so
// we set num_clusters to 4
//
config.clustering.num_clusters = 4;
// If you don't know the number of speakers in the test wave file, please
// use
// config.clustering.threshold = 0.5; // You need to tune this threshold
const SherpaMnnOfflineSpeakerDiarization *sd =
SherpaMnnCreateOfflineSpeakerDiarization(&config);
if (!sd) {
fprintf(stderr, "Failed to initialize offline speaker diarization\n");
return -1;
}
if (SherpaMnnOfflineSpeakerDiarizationGetSampleRate(sd) !=
wave->sample_rate) {
fprintf(
stderr,
"Expected sample rate: %d. Actual sample rate from the wave file: %d\n",
SherpaMnnOfflineSpeakerDiarizationGetSampleRate(sd),
wave->sample_rate);
goto failed;
}
const SherpaMnnOfflineSpeakerDiarizationResult *result =
SherpaMnnOfflineSpeakerDiarizationProcessWithCallback(
sd, wave->samples, wave->num_samples, ProgressCallback, NULL);
if (!result) {
fprintf(stderr, "Failed to do speaker diarization");
goto failed;
}
int32_t num_segments =
SherpaMnnOfflineSpeakerDiarizationResultGetNumSegments(result);
const SherpaMnnOfflineSpeakerDiarizationSegment *segments =
SherpaMnnOfflineSpeakerDiarizationResultSortByStartTime(result);
for (int32_t i = 0; i != num_segments; ++i) {
fprintf(stderr, "%.3f -- %.3f speaker_%02d\n", segments[i].start,
segments[i].end, segments[i].speaker);
}
failed:
SherpaMnnOfflineSpeakerDiarizationDestroySegment(segments);
SherpaMnnOfflineSpeakerDiarizationDestroyResult(result);
SherpaMnnDestroyOfflineSpeakerDiarization(sd);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,249 @@
// c-api-examples/offline-tts-c-api.c
//
// Copyright (c) 2023 Xiaomi Corporation
// This file shows how to use sherpa-onnx C API
// to convert text to speech using an offline model.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "cargs.h"
#include "sherpa-mnn/c-api/c-api.h"
static struct cag_option options[] = {
{.identifier = 'h',
.access_letters = "h",
.access_name = "help",
.description = "Show help"},
{.access_name = "vits-model",
.value_name = "/path/to/xxx.onnx",
.identifier = '0',
.description = "Path to VITS model"},
{.access_name = "vits-lexicon",
.value_name = "/path/to/lexicon.txt",
.identifier = '1',
.description = "Path to lexicon.txt for VITS models"},
{.access_name = "vits-tokens",
.value_name = "/path/to/tokens.txt",
.identifier = '2',
.description = "Path to tokens.txt for VITS models"},
{.access_name = "vits-noise-scale",
.value_name = "0.667",
.identifier = '3',
.description = "noise_scale for VITS models"},
{.access_name = "vits-noise-scale-w",
.value_name = "0.8",
.identifier = '4',
.description = "noise_scale_w for VITS models"},
{.access_name = "vits-length-scale",
.value_name = "1.0",
.identifier = '5',
.description =
"length_scale for VITS models. Default to 1. You can tune it "
"to change the speech speed. small -> faster; large -> slower. "},
{.access_name = "num-threads",
.value_name = "1",
.identifier = '6',
.description = "Number of threads"},
{.access_name = "provider",
.value_name = "cpu",
.identifier = '7',
.description = "Provider: cpu (default), cuda, coreml"},
{.access_name = "debug",
.value_name = "0",
.identifier = '8',
.description = "1 to show debug messages while loading the model"},
{.access_name = "sid",
.value_name = "0",
.identifier = '9',
.description = "Speaker ID. Default to 0. Note it is not used for "
"single-speaker models."},
{.access_name = "output-filename",
.value_name = "./generated.wav",
.identifier = 'a',
.description =
"Filename to save the generated audio. Default to ./generated.wav"},
{.access_name = "tts-rule-fsts",
.value_name = "/path/to/rule.fst",
.identifier = 'b',
.description = "It not empty, it contains a list of rule FST filenames."
"Multiple filenames are separated by a comma and they are "
"applied from left to right. An example value: "
"rule1.fst,rule2,fst,rule3.fst"},
{.access_name = "max-num-sentences",
.value_name = "2",
.identifier = 'c',
.description = "Maximum number of sentences that we process at a time. "
"This is to avoid OOM for very long input text. "
"If you set it to -1, then we process all sentences in a "
"single batch."},
{.access_name = "vits-data-dir",
.value_name = "/path/to/espeak-ng-data",
.identifier = 'd',
.description =
"Path to espeak-ng-data. If it is given, --vits-lexicon is ignored"},
};
static void ShowUsage() {
const char *kUsageMessage =
"Offline text-to-speech with sherpa-onnx C API"
"\n"
"./offline-tts-c-api \\\n"
" --vits-model=/path/to/model.onnx \\\n"
" --vits-lexicon=/path/to/lexicon.txt \\\n"
" --vits-tokens=/path/to/tokens.txt \\\n"
" --sid=0 \\\n"
" --output-filename=./generated.wav \\\n"
" 'some text within single quotes on linux/macos or use double quotes on "
"windows'\n"
"\n"
"It will generate a file ./generated.wav as specified by "
"--output-filename.\n"
"\n"
"You can download a test model from\n"
"https://huggingface.co/csukuangfj/vits-ljs\n"
"\n"
"For instance, you can use:\n"
"wget "
"https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx\n"
"wget "
"https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt\n"
"wget "
"https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt\n"
"\n"
"./offline-tts-c-api \\\n"
" --vits-model=./vits-ljs.onnx \\\n"
" --vits-lexicon=./lexicon.txt \\\n"
" --vits-tokens=./tokens.txt \\\n"
" --sid=0 \\\n"
" --output-filename=./generated.wav \\\n"
" 'liliana, the most beautiful and lovely assistant of our team!'\n"
"\n"
"Please see\n"
"https://k2-fsa.github.io/sherpa/onnx/tts/index.html\n"
"or details.\n\n";
fprintf(stderr, "%s", kUsageMessage);
cag_option_print(options, CAG_ARRAY_SIZE(options), stderr);
exit(0);
}
int32_t main(int32_t argc, char *argv[]) {
cag_option_context context;
char identifier;
const char *value;
cag_option_prepare(&context, options, CAG_ARRAY_SIZE(options), argc, argv);
SherpaMnnOfflineTtsConfig config;
memset(&config, 0, sizeof(config));
int32_t sid = 0;
const char *filename = strdup("./generated.wav");
const char *text;
while (cag_option_fetch(&context)) {
identifier = cag_option_get(&context);
value = cag_option_get_value(&context);
switch (identifier) {
case '0':
config.model.vits.model = value;
break;
case '1':
config.model.vits.lexicon = value;
break;
case '2':
config.model.vits.tokens = value;
break;
case '3':
config.model.vits.noise_scale = atof(value);
break;
case '4':
config.model.vits.noise_scale_w = atof(value);
break;
case '5':
config.model.vits.length_scale = atof(value);
break;
case '6':
config.model.num_threads = atoi(value);
break;
case '7':
config.model.provider = value;
break;
case '8':
config.model.debug = atoi(value);
break;
case '9':
sid = atoi(value);
break;
case 'a':
free((void *)filename);
filename = strdup(value);
break;
case 'b':
config.rule_fsts = value;
break;
case 'c':
config.max_num_sentences = atoi(value);
break;
case 'd':
config.model.vits.data_dir = value;
break;
case '?':
fprintf(stderr, "Unknown option\n");
// fall through
case 'h':
// fall through
default:
ShowUsage();
}
}
fprintf(stderr, "here\n");
if (!config.model.vits.model) {
fprintf(stderr, "Please provide --vits-model\n");
ShowUsage();
}
if (!config.model.vits.tokens) {
fprintf(stderr, "Please provide --vits-tokens\n");
ShowUsage();
}
if (!config.model.vits.data_dir && !config.model.vits.lexicon) {
fprintf(stderr, "Please provide --vits-data-dir or --vits-lexicon\n");
ShowUsage();
}
// the last arg is the text
text = argv[argc - 1];
if (text[0] == '-') {
fprintf(stderr, "\n***Please input your text!***\n\n");
fprintf(stderr, "\n---------------Usage---------------\n\n");
ShowUsage();
}
const SherpaMnnOfflineTts *tts = SherpaMnnCreateOfflineTts(&config);
const SherpaMnnGeneratedAudio *audio =
SherpaMnnOfflineTtsGenerate(tts, text, sid, 1.0);
SherpaMnnWriteWave(audio->samples, audio->n, audio->sample_rate, filename);
SherpaMnnDestroyOfflineTtsGeneratedAudio(audio);
SherpaMnnDestroyOfflineTts(tts);
fprintf(stderr, "Input text is: %s\n", text);
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename);
free((void *)filename);
return 0;
}

View File

@ -0,0 +1,83 @@
// c-api-examples/paraformer-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use non-streaming Paraformer with sherpa-onnx's
// C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2
// tar xvf sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2
// rm sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav";
const char *model_filename =
"sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Paraformer config
SherpaMnnOfflineParaformerModelConfig paraformer_config;
memset(&paraformer_config, 0, sizeof(paraformer_config));
paraformer_config.model = model_filename;
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.paraformer = paraformer_config;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -ex
if [ ! -d ./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 ]; then
echo "Please download the pre-trained model for testing."
echo "You can refer to"
echo ""
echo "https://k2-fsa.github.io/sherpa/onnx/pretrained_models/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-bilingual-chinese-english"
echo "for help"
exit 1
fi
if [[ ! -f ../build/lib/libsherpa-onnx-c-api.a && ! -f ../build/lib/libsherpa-onnx-c-api.dylib && ! -f ../build/lib/libsherpa-onnx-c-api.so ]]; then
echo "Please build sherpa-onnx first. You can use"
echo ""
echo " cd /path/to/sherpa-onnx"
echo " mkdir build"
echo " cd build"
echo " cmake .."
echo " make -j4"
exit 1
fi
if [ ! -f ./decode-file-c-api ]; then
make
fi
./decode-file-c-api \
--tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx \
./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/0.wav
# Run with hotwords
echo "礼 拜 二" > hotwords.txt
./decode-file-c-api \
--tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx \
--hotwords-file=hotwords.txt \
--hotwords-score=1.5 \
--decoding-method=modified_beam_search \
./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/0.wav

View File

@ -0,0 +1,85 @@
// c-api-examples/sense-voice-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use SenseVoice with sherpa-onnx's C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// tar xvf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/en.wav";
const char *model_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx";
const char *tokens_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt";
const char *language = "auto";
const char *provider = "cpu";
int32_t use_inverse_text_normalization = 1;
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
SherpaMnnOfflineSenseVoiceModelConfig sense_voice_config;
memset(&sense_voice_config, 0, sizeof(sense_voice_config));
sense_voice_config.model = model_filename;
sense_voice_config.language = language;
sense_voice_config.use_itn = use_inverse_text_normalization;
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.sense_voice = sense_voice_config;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,257 @@
// c-api-examples/speaker-identification-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// We assume you have pre-downloaded the speaker embedding extractor model
// from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
//
// An example command to download
// "3dspeaker_speech_campplus_sv_zh-cn_16k-common.onnx"
// is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_campplus_sv_zh-cn_16k-common.onnx
//
// clang-format on
//
// Also, please download the test wave files from
//
// https://github.com/csukuangfj/sr-data
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static const float *ComputeEmbedding(
const SherpaMnnSpeakerEmbeddingExtractor *ex, const char *wav_filename) {
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
exit(-1);
}
const SherpaMnnOnlineStream *stream =
SherpaMnnSpeakerEmbeddingExtractorCreateStream(ex);
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnOnlineStreamInputFinished(stream);
if (!SherpaMnnSpeakerEmbeddingExtractorIsReady(ex, stream)) {
fprintf(stderr, "The input wave file %s is too short!\n", wav_filename);
exit(-1);
}
// we will free `v` outside of this function
const float *v =
SherpaMnnSpeakerEmbeddingExtractorComputeEmbedding(ex, stream);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnFreeWave(wave);
// Remeber to free v to avoid memory leak
return v;
}
int32_t main() {
SherpaMnnSpeakerEmbeddingExtractorConfig config;
memset(&config, 0, sizeof(config));
// please download the model from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
config.model = "./3dspeaker_speech_campplus_sv_zh-cn_16k-common.onnx";
config.num_threads = 1;
config.debug = 0;
config.provider = "cpu";
const SherpaMnnSpeakerEmbeddingExtractor *ex =
SherpaMnnCreateSpeakerEmbeddingExtractor(&config);
if (!ex) {
fprintf(stderr, "Failed to create speaker embedding extractor");
return -1;
}
int32_t dim = SherpaMnnSpeakerEmbeddingExtractorDim(ex);
const SherpaMnnSpeakerEmbeddingManager *manager =
SherpaMnnCreateSpeakerEmbeddingManager(dim);
// Please download the test data from
// https://github.com/csukuangfj/sr-data
const char *spk1_1 = "./sr-data/enroll/fangjun-sr-1.wav";
const char *spk1_2 = "./sr-data/enroll/fangjun-sr-2.wav";
const char *spk1_3 = "./sr-data/enroll/fangjun-sr-3.wav";
const char *spk2_1 = "./sr-data/enroll/leijun-sr-1.wav";
const char *spk2_2 = "./sr-data/enroll/leijun-sr-2.wav";
const float *spk1_vec[4] = {NULL};
spk1_vec[0] = ComputeEmbedding(ex, spk1_1);
spk1_vec[1] = ComputeEmbedding(ex, spk1_2);
spk1_vec[2] = ComputeEmbedding(ex, spk1_3);
const float *spk2_vec[3] = {NULL};
spk2_vec[0] = ComputeEmbedding(ex, spk2_1);
spk2_vec[1] = ComputeEmbedding(ex, spk2_2);
if (!SherpaMnnSpeakerEmbeddingManagerAddList(manager, "fangjun", spk1_vec)) {
fprintf(stderr, "Failed to register fangjun\n");
exit(-1);
}
if (!SherpaMnnSpeakerEmbeddingManagerContains(manager, "fangjun")) {
fprintf(stderr, "Failed to find fangjun\n");
exit(-1);
}
if (!SherpaMnnSpeakerEmbeddingManagerAddList(manager, "leijun", spk2_vec)) {
fprintf(stderr, "Failed to register leijun\n");
exit(-1);
}
if (!SherpaMnnSpeakerEmbeddingManagerContains(manager, "leijun")) {
fprintf(stderr, "Failed to find leijun\n");
exit(-1);
}
if (SherpaMnnSpeakerEmbeddingManagerNumSpeakers(manager) != 2) {
fprintf(stderr, "There should be two speakers: fangjun and leijun\n");
exit(-1);
}
const char *const *all_speakers =
SherpaMnnSpeakerEmbeddingManagerGetAllSpeakers(manager);
const char *const *p = all_speakers;
fprintf(stderr, "list of registered speakers\n-----\n");
while (p[0]) {
fprintf(stderr, "speaker: %s\n", p[0]);
++p;
}
fprintf(stderr, "----\n");
SherpaMnnSpeakerEmbeddingManagerFreeAllSpeakers(all_speakers);
const char *test1 = "./sr-data/test/fangjun-test-sr-1.wav";
const char *test2 = "./sr-data/test/leijun-test-sr-1.wav";
const char *test3 = "./sr-data/test/liudehua-test-sr-1.wav";
const float *v1 = ComputeEmbedding(ex, test1);
const float *v2 = ComputeEmbedding(ex, test2);
const float *v3 = ComputeEmbedding(ex, test3);
float threshold = 0.6;
const char *name1 =
SherpaMnnSpeakerEmbeddingManagerSearch(manager, v1, threshold);
if (name1) {
fprintf(stderr, "%s: Found %s\n", test1, name1);
SherpaMnnSpeakerEmbeddingManagerFreeSearch(name1);
} else {
fprintf(stderr, "%s: Not found\n", test1);
}
const char *name2 =
SherpaMnnSpeakerEmbeddingManagerSearch(manager, v2, threshold);
if (name2) {
fprintf(stderr, "%s: Found %s\n", test2, name2);
SherpaMnnSpeakerEmbeddingManagerFreeSearch(name2);
} else {
fprintf(stderr, "%s: Not found\n", test2);
}
const char *name3 =
SherpaMnnSpeakerEmbeddingManagerSearch(manager, v3, threshold);
if (name3) {
fprintf(stderr, "%s: Found %s\n", test3, name3);
SherpaMnnSpeakerEmbeddingManagerFreeSearch(name3);
} else {
fprintf(stderr, "%s: Not found\n", test3);
}
int32_t ok = SherpaMnnSpeakerEmbeddingManagerVerify(manager, "fangjun", v1,
threshold);
if (ok) {
fprintf(stderr, "%s matches fangjun\n", test1);
} else {
fprintf(stderr, "%s does NOT match fangjun\n", test1);
}
ok = SherpaMnnSpeakerEmbeddingManagerVerify(manager, "fangjun", v2,
threshold);
if (ok) {
fprintf(stderr, "%s matches fangjun\n", test2);
} else {
fprintf(stderr, "%s does NOT match fangjun\n", test2);
}
fprintf(stderr, "Removing fangjun\n");
if (!SherpaMnnSpeakerEmbeddingManagerRemove(manager, "fangjun")) {
fprintf(stderr, "Failed to remove fangjun\n");
exit(-1);
}
if (SherpaMnnSpeakerEmbeddingManagerNumSpeakers(manager) != 1) {
fprintf(stderr, "There should be only 1 speaker left\n");
exit(-1);
}
name1 = SherpaMnnSpeakerEmbeddingManagerSearch(manager, v1, threshold);
if (name1) {
fprintf(stderr, "%s: Found %s\n", test1, name1);
SherpaMnnSpeakerEmbeddingManagerFreeSearch(name1);
} else {
fprintf(stderr, "%s: Not found\n", test1);
}
fprintf(stderr, "Removing leijun\n");
if (!SherpaMnnSpeakerEmbeddingManagerRemove(manager, "leijun")) {
fprintf(stderr, "Failed to remove leijun\n");
exit(-1);
}
if (SherpaMnnSpeakerEmbeddingManagerNumSpeakers(manager) != 0) {
fprintf(stderr, "There should be only 1 speaker left\n");
exit(-1);
}
name2 = SherpaMnnSpeakerEmbeddingManagerSearch(manager, v2, threshold);
if (name2) {
fprintf(stderr, "%s: Found %s\n", test2, name2);
SherpaMnnSpeakerEmbeddingManagerFreeSearch(name2);
} else {
fprintf(stderr, "%s: Not found\n", test2);
}
all_speakers = SherpaMnnSpeakerEmbeddingManagerGetAllSpeakers(manager);
p = all_speakers;
fprintf(stderr, "list of registered speakers\n-----\n");
while (p[0]) {
fprintf(stderr, "speaker: %s\n", p[0]);
++p;
}
fprintf(stderr, "----\n");
SherpaMnnSpeakerEmbeddingManagerFreeAllSpeakers(all_speakers);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(v1);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(v2);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(v3);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(spk1_vec[0]);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(spk1_vec[1]);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(spk1_vec[2]);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(spk2_vec[0]);
SherpaMnnSpeakerEmbeddingExtractorDestroyEmbedding(spk2_vec[1]);
SherpaMnnDestroySpeakerEmbeddingManager(manager);
SherpaMnnDestroySpeakerEmbeddingExtractor(ex);
return 0;
}

View File

@ -0,0 +1,55 @@
// c-api-examples/speech-enhancement-gtcrn-c-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
//
// We assume you have pre-downloaded model
// from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/speech-enhancement-models
//
//
// An example command to download
// clang-format off
/*
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/gtcrn_simple.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/inp_16k.wav
*/
// clang-format on
#include <stdio.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
SherpaMnnOfflineSpeechDenoiserConfig config;
const char *wav_filename = "./inp_16k.wav";
const char *out_wave_filename = "./enhanced_16k.wav";
memset(&config, 0, sizeof(config));
config.model.gtcrn.model = "./gtcrn_simple.onnx";
const SherpaMnnOfflineSpeechDenoiser *sd =
SherpaMnnCreateOfflineSpeechDenoiser(&config);
if (!sd) {
fprintf(stderr, "Please check your config");
return -1;
}
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
SherpaMnnDestroyOfflineSpeechDenoiser(sd);
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
const SherpaMnnDenoisedAudio *denoised = SherpaMnnOfflineSpeechDenoiserRun(
sd, wave->samples, wave->num_samples, wave->sample_rate);
SherpaMnnWriteWave(denoised->samples, denoised->n, denoised->sample_rate,
out_wave_filename);
SherpaMnnDestroyDenoisedAudio(denoised);
SherpaMnnFreeWave(wave);
SherpaMnnDestroyOfflineSpeechDenoiser(sd);
fprintf(stdout, "Saved to %s\n", out_wave_filename);
}

View File

@ -0,0 +1,68 @@
// c-api-examples/spoken-language-identification-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// We assume you have pre-downloaded the whisper multi-lingual models
// from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
// An example command to download the "tiny" whisper model is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
// tar xvf sherpa-onnx-whisper-tiny.tar.bz2
// rm sherpa-onnx-whisper-tiny.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
SherpaMnnSpokenLanguageIdentificationConfig config;
memset(&config, 0, sizeof(config));
config.whisper.encoder = "./sherpa-onnx-whisper-tiny/tiny-encoder.int8.onnx";
config.whisper.decoder = "./sherpa-onnx-whisper-tiny/tiny-decoder.int8.onnx";
config.num_threads = 1;
config.debug = 1;
config.provider = "cpu";
const SherpaMnnSpokenLanguageIdentification *slid =
SherpaMnnCreateSpokenLanguageIdentification(&config);
if (!slid) {
fprintf(stderr, "Failed to create spoken language identifier");
return -1;
}
// You can find more test waves from
// https://hf-mirror.com/spaces/k2-fsa/spoken-language-identification/tree/main/test_wavs
const char *wav_filename = "./sherpa-onnx-whisper-tiny/test_wavs/0.wav";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
SherpaMnnOfflineStream *stream =
SherpaMnnSpokenLanguageIdentificationCreateOfflineStream(slid);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
const SherpaMnnSpokenLanguageIdentificationResult *result =
SherpaMnnSpokenLanguageIdentificationCompute(slid, stream);
fprintf(stderr, "wav_filename: %s\n", wav_filename);
fprintf(stderr, "Detected language: %s\n", result->lang);
SherpaMnnDestroySpokenLanguageIdentificationResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnFreeWave(wave);
SherpaMnnDestroySpokenLanguageIdentification(slid);
return 0;
}

View File

@ -0,0 +1,180 @@
// c-api-examples/streaming-ctc-buffered-tokens-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// Copyright (c) 2024 Luo Xiao
//
// This file demonstrates how to use streaming Zipformer2 Ctc with sherpa-onnx's
// C API and with tokens loaded from buffered strings instead of
// from external files API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
// tar xvf sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
// rm sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static size_t ReadFile(const char *filename, const char **buffer_out) {
FILE *file = fopen(filename, "r");
if (file == NULL) {
fprintf(stderr, "Failed to open %s\n", filename);
return -1;
}
fseek(file, 0L, SEEK_END);
long size = ftell(file);
rewind(file);
*buffer_out = malloc(size);
if (*buffer_out == NULL) {
fclose(file);
fprintf(stderr, "Memory error\n");
return -1;
}
size_t read_bytes = fread((void *)*buffer_out, 1, size, file);
if (read_bytes != size) {
printf("Errors occured in reading the file %s\n", filename);
free((void *)*buffer_out);
*buffer_out = NULL;
fclose(file);
return -1;
}
fclose(file);
return read_bytes;
}
int32_t main() {
const char *wav_filename =
"sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/"
"DEV_T0000000000.wav";
const char *model_filename =
"sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/"
"ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// reading tokens to buffers
const char *tokens_buf;
size_t token_buf_size = ReadFile(tokens_filename, &tokens_buf);
if (token_buf_size < 1) {
fprintf(stderr, "Please check your tokens.txt!\n");
free((void *)tokens_buf);
return -1;
}
// Zipformer2Ctc config
SherpaMnnOnlineZipformer2CtcModelConfig zipformer2_ctc_config;
memset(&zipformer2_ctc_config, 0, sizeof(zipformer2_ctc_config));
zipformer2_ctc_config.model = model_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens_buf = tokens_buf;
online_model_config.tokens_buf_size = token_buf_size;
online_model_config.zipformer2_ctc = zipformer2_ctc_config;
// Recognizer config
SherpaMnnOnlineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = online_model_config;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&recognizer_config);
free((void *)tokens_buf);
tokens_buf = NULL;
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,130 @@
// c-api-examples/streaming-hlg-decode-file-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
/*
We use the following model as an example
// clang-format off
Download the model from
https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
rm sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
build/bin/streaming-hlg-decode-file-c-api
(The above model is from https://github.com/k2-fsa/icefall/pull/1557)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
// clang-format off
//
// Please download the model from
// https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
const char *model = "./sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18/ctc-epoch-30-avg-3-chunk-16-left-128.int8.onnx";
const char *tokens = "./sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18/tokens.txt";
const char *graph = "./sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18/HLG.fst";
const char *wav_filename = "./sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18/test_wavs/8k.wav";
// clang-format on
SherpaMnnOnlineRecognizerConfig config;
memset(&config, 0, sizeof(config));
config.feat_config.sample_rate = 16000;
config.feat_config.feature_dim = 80;
config.model_config.zipformer2_ctc.model = model;
config.model_config.tokens = tokens;
config.model_config.num_threads = 1;
config.model_config.provider = "cpu";
config.model_config.debug = 0;
config.ctc_fst_decoder_config.graph = graph;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&config);
if (!recognizer) {
fprintf(stderr, "Failed to create recognizer");
exit(-1);
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
exit(-1);
}
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,181 @@
// c-api-examples/streaming-paraformer-buffered-tokens-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// Copyright (c) 2024 Luo Xiao
//
// This file demonstrates how to use streaming Paraformer with sherpa-onnx's C
// API and with tokens loaded from buffered strings instead of from
// external files API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
// tar xvf sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
// rm sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static size_t ReadFile(const char *filename, const char **buffer_out) {
FILE *file = fopen(filename, "r");
if (file == NULL) {
fprintf(stderr, "Failed to open %s\n", filename);
return -1;
}
fseek(file, 0L, SEEK_END);
long size = ftell(file);
rewind(file);
*buffer_out = malloc(size);
if (*buffer_out == NULL) {
fclose(file);
fprintf(stderr, "Memory error\n");
return -1;
}
size_t read_bytes = fread((void *)*buffer_out, 1, size, file);
if (read_bytes != size) {
printf("Errors occured in reading the file %s\n", filename);
free((void *)*buffer_out);
*buffer_out = NULL;
fclose(file);
return -1;
}
fclose(file);
return read_bytes;
}
int32_t main() {
const char *wav_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx";
const char *decoder_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// reading tokens to buffers
const char *tokens_buf;
size_t token_buf_size = ReadFile(tokens_filename, &tokens_buf);
if (token_buf_size < 1) {
fprintf(stderr, "Please check your tokens.txt!\n");
free((void *)tokens_buf);
return -1;
}
// Paraformer config
SherpaMnnOnlineParaformerModelConfig paraformer_config;
memset(&paraformer_config, 0, sizeof(paraformer_config));
paraformer_config.encoder = encoder_filename;
paraformer_config.decoder = decoder_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens_buf = tokens_buf;
online_model_config.tokens_buf_size = token_buf_size;
online_model_config.paraformer = paraformer_config;
// Recognizer config
SherpaMnnOnlineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = online_model_config;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&recognizer_config);
free((void *)tokens_buf);
tokens_buf = NULL;
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,139 @@
// c-api-examples/streaming-paraformer-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use streaming Paraformer with sherpa-onnx's C
// API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
// tar xvf sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
// rm sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx";
const char *decoder_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Paraformer config
SherpaMnnOnlineParaformerModelConfig paraformer_config;
memset(&paraformer_config, 0, sizeof(paraformer_config));
paraformer_config.encoder = encoder_filename;
paraformer_config.decoder = decoder_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens = tokens_filename;
online_model_config.paraformer = paraformer_config;
// Recognizer config
SherpaMnnOnlineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = online_model_config;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,203 @@
// c-api-examples/streaming-zipformer-buffered-tokens-hotwords-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// Copyright (c) 2024 Luo Xiao
//
// This file demonstrates how to use streaming Zipformer with sherpa-onnx's C
// API and with tokens and hotwords loaded from buffered strings instead of from
// external files API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
// tar xvf sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
// rm sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
static size_t ReadFile(const char *filename, const char **buffer_out) {
FILE *file = fopen(filename, "r");
if (file == NULL) {
fprintf(stderr, "Failed to open %s\n", filename);
return -1;
}
fseek(file, 0L, SEEK_END);
long size = ftell(file);
rewind(file);
*buffer_out = malloc(size);
if (*buffer_out == NULL) {
fclose(file);
fprintf(stderr, "Memory error\n");
return -1;
}
size_t read_bytes = fread((void *)*buffer_out, 1, size, file);
if (read_bytes != size) {
printf("Errors occured in reading the file %s\n", filename);
free((void *)*buffer_out);
*buffer_out = NULL;
fclose(file);
return -1;
}
fclose(file);
return read_bytes;
}
int32_t main() {
const char *wav_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"encoder-epoch-99-avg-1.onnx";
const char *decoder_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"decoder-epoch-99-avg-1.onnx";
const char *joiner_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"joiner-epoch-99-avg-1.onnx";
const char *provider = "cpu";
const char *modeling_unit = "bpe";
const char *tokens_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/tokens.txt";
const char *hotwords_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/hotwords.txt";
const char *bpe_vocab =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"bpe.vocab";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// reading tokens and hotwords to buffers
const char *tokens_buf;
size_t token_buf_size = ReadFile(tokens_filename, &tokens_buf);
if (token_buf_size < 1) {
fprintf(stderr, "Please check your tokens.txt!\n");
free((void *)tokens_buf);
return -1;
}
const char *hotwords_buf;
size_t hotwords_buf_size = ReadFile(hotwords_filename, &hotwords_buf);
if (hotwords_buf_size < 1) {
fprintf(stderr, "Please check your hotwords.txt!\n");
free((void *)hotwords_buf);
return -1;
}
// Zipformer config
SherpaMnnOnlineTransducerModelConfig zipformer_config;
memset(&zipformer_config, 0, sizeof(zipformer_config));
zipformer_config.encoder = encoder_filename;
zipformer_config.decoder = decoder_filename;
zipformer_config.joiner = joiner_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens_buf = tokens_buf;
online_model_config.tokens_buf_size = token_buf_size;
online_model_config.transducer = zipformer_config;
// Recognizer config
SherpaMnnOnlineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "modified_beam_search";
recognizer_config.model_config = online_model_config;
recognizer_config.hotwords_buf = hotwords_buf;
recognizer_config.hotwords_buf_size = hotwords_buf_size;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&recognizer_config);
free((void *)tokens_buf);
tokens_buf = NULL;
free((void *)hotwords_buf);
hotwords_buf = NULL;
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,145 @@
// c-api-examples/streaming-zipformer-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use streaming Zipformer with sherpa-onnx's C
// API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
// tar xvf sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
// rm sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"encoder-epoch-99-avg-1.onnx";
const char *decoder_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"decoder-epoch-99-avg-1.onnx";
const char *joiner_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/"
"joiner-epoch-99-avg-1.onnx";
const char *tokens_filename =
"sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Zipformer config
SherpaMnnOnlineTransducerModelConfig zipformer_config;
memset(&zipformer_config, 0, sizeof(zipformer_config));
zipformer_config.encoder = encoder_filename;
zipformer_config.decoder = decoder_filename;
zipformer_config.joiner = joiner_filename;
// Online model config
SherpaMnnOnlineModelConfig online_model_config;
memset(&online_model_config, 0, sizeof(online_model_config));
online_model_config.debug = 1;
online_model_config.num_threads = 1;
online_model_config.provider = provider;
online_model_config.tokens = tokens_filename;
online_model_config.transducer = zipformer_config;
// Recognizer config
SherpaMnnOnlineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = online_model_config;
const SherpaMnnOnlineRecognizer *recognizer =
SherpaMnnCreateOnlineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOnlineStream *stream =
SherpaMnnCreateOnlineStream(recognizer);
const SherpaMnnDisplay *display = SherpaMnnCreateDisplay(50);
int32_t segment_id = 0;
// simulate streaming. You can choose an arbitrary N
#define N 3200
fprintf(stderr, "sample rate: %d, num samples: %d, duration: %.2f s\n",
wave->sample_rate, wave->num_samples,
(float)wave->num_samples / wave->sample_rate);
int32_t k = 0;
while (k < wave->num_samples) {
int32_t start = k;
int32_t end =
(start + N > wave->num_samples) ? wave->num_samples : (start + N);
k += N;
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate,
wave->samples + start, end - start);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
if (SherpaMnnOnlineStreamIsEndpoint(recognizer, stream)) {
if (strlen(r->text)) {
++segment_id;
}
SherpaMnnOnlineStreamReset(recognizer, stream);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
}
// add some tail padding
float tail_paddings[4800] = {0}; // 0.3 seconds at 16 kHz sample rate
SherpaMnnOnlineStreamAcceptWaveform(stream, wave->sample_rate, tail_paddings,
4800);
SherpaMnnFreeWave(wave);
SherpaMnnOnlineStreamInputFinished(stream);
while (SherpaMnnIsOnlineStreamReady(recognizer, stream)) {
SherpaMnnDecodeOnlineStream(recognizer, stream);
}
const SherpaMnnOnlineRecognizerResult *r =
SherpaMnnGetOnlineStreamResult(recognizer, stream);
if (strlen(r->text)) {
SherpaMnnPrint(display, segment_id, r->text);
}
SherpaMnnDestroyOnlineRecognizerResult(r);
SherpaMnnDestroyDisplay(display);
SherpaMnnDestroyOnlineStream(stream);
SherpaMnnDestroyOnlineRecognizer(recognizer);
fprintf(stderr, "\n");
return 0;
}

View File

@ -0,0 +1,78 @@
// c-api-examples/telespeech-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use TeleSpeech-ASR CTC model with sherpa-onnx's
// C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2
// tar xvf sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2
// rm sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/test_wavs/3-sichuan.wav";
const char *model_filename =
"sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/model.int8.onnx";
const char *tokens_filename =
"sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.telespeech_ctc = model_filename;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,146 @@
// c-api-examples/vad-moonshine-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use VAD + Moonshine with sherpa-onnx's C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/Obama.wav
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// tar xvf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// rm sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename = "./Obama.wav";
const char *vad_filename = "./silero_vad.onnx";
const char *preprocessor =
"./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx";
const char *encoder = "./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx";
const char *uncached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx";
const char *cached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx";
const char *tokens = "./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
if (wave->sample_rate != 16000) {
fprintf(stderr, "Expect the sample rate to be 16000. Given: %d\n",
wave->sample_rate);
SherpaMnnFreeWave(wave);
return -1;
}
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 0;
offline_model_config.num_threads = 1;
offline_model_config.provider = "cpu";
offline_model_config.tokens = tokens;
offline_model_config.moonshine.preprocessor = preprocessor;
offline_model_config.moonshine.encoder = encoder;
offline_model_config.moonshine.uncached_decoder = uncached_decoder;
offline_model_config.moonshine.cached_decoder = cached_decoder;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
SherpaMnnVadModelConfig vadConfig;
memset(&vadConfig, 0, sizeof(vadConfig));
vadConfig.silero_vad.model = vad_filename;
vadConfig.silero_vad.threshold = 0.5;
vadConfig.silero_vad.min_silence_duration = 0.5;
vadConfig.silero_vad.min_speech_duration = 0.5;
vadConfig.silero_vad.max_speech_duration = 10;
vadConfig.silero_vad.window_size = 512;
vadConfig.sample_rate = 16000;
vadConfig.num_threads = 1;
vadConfig.debug = 1;
SherpaMnnVoiceActivityDetector *vad =
SherpaMnnCreateVoiceActivityDetector(&vadConfig, 30);
if (vad == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
SherpaMnnDestroyOfflineRecognizer(recognizer);
return -1;
}
int32_t window_size = vadConfig.silero_vad.window_size;
int32_t i = 0;
int is_eof = 0;
while (!is_eof) {
if (i + window_size < wave->num_samples) {
SherpaMnnVoiceActivityDetectorAcceptWaveform(vad, wave->samples + i,
window_size);
} else {
SherpaMnnVoiceActivityDetectorFlush(vad);
is_eof = 1;
}
while (!SherpaMnnVoiceActivityDetectorEmpty(vad)) {
const SherpaMnnSpeechSegment *segment =
SherpaMnnVoiceActivityDetectorFront(vad);
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate,
segment->samples, segment->n);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
float start = segment->start / 16000.0f;
float duration = segment->n / 16000.0f;
float stop = start + duration;
fprintf(stderr, "%.3f -- %.3f: %s\n", start, stop, result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroySpeechSegment(segment);
SherpaMnnVoiceActivityDetectorPop(vad);
}
i += window_size;
}
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnDestroyVoiceActivityDetector(vad);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,148 @@
// c-api-examples/vad-sense-voice-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use VAD + SenseVoice with sherpa-onnx's C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/lei-jun-test.wav
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// tar xvf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename = "./lei-jun-test.wav";
const char *vad_filename = "./silero_vad.onnx";
const char *model_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx";
const char *tokens_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt";
const char *language = "auto";
const char *provider = "cpu";
int32_t use_inverse_text_normalization = 1;
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
if (wave->sample_rate != 16000) {
fprintf(stderr, "Expect the sample rate to be 16000. Given: %d\n",
wave->sample_rate);
SherpaMnnFreeWave(wave);
return -1;
}
SherpaMnnOfflineSenseVoiceModelConfig sense_voice_config;
memset(&sense_voice_config, 0, sizeof(sense_voice_config));
sense_voice_config.model = model_filename;
sense_voice_config.language = language;
sense_voice_config.use_itn = use_inverse_text_normalization;
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 0;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.sense_voice = sense_voice_config;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
SherpaMnnVadModelConfig vadConfig;
memset(&vadConfig, 0, sizeof(vadConfig));
vadConfig.silero_vad.model = vad_filename;
vadConfig.silero_vad.threshold = 0.5;
vadConfig.silero_vad.min_silence_duration = 0.5;
vadConfig.silero_vad.min_speech_duration = 0.5;
vadConfig.silero_vad.max_speech_duration = 5;
vadConfig.silero_vad.window_size = 512;
vadConfig.sample_rate = 16000;
vadConfig.num_threads = 1;
vadConfig.debug = 1;
SherpaMnnVoiceActivityDetector *vad =
SherpaMnnCreateVoiceActivityDetector(&vadConfig, 30);
if (vad == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
SherpaMnnDestroyOfflineRecognizer(recognizer);
return -1;
}
int32_t window_size = vadConfig.silero_vad.window_size;
int32_t i = 0;
int is_eof = 0;
while (!is_eof) {
if (i + window_size < wave->num_samples) {
SherpaMnnVoiceActivityDetectorAcceptWaveform(vad, wave->samples + i,
window_size);
} else {
SherpaMnnVoiceActivityDetectorFlush(vad);
is_eof = 1;
}
while (!SherpaMnnVoiceActivityDetectorEmpty(vad)) {
const SherpaMnnSpeechSegment *segment =
SherpaMnnVoiceActivityDetectorFront(vad);
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate,
segment->samples, segment->n);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
float start = segment->start / 16000.0f;
float duration = segment->n / 16000.0f;
float stop = start + duration;
fprintf(stderr, "%.3f -- %.3f: %s\n", start, stop, result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroySpeechSegment(segment);
SherpaMnnVoiceActivityDetectorPop(vad);
}
i += window_size;
}
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnDestroyVoiceActivityDetector(vad);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,145 @@
// c-api-examples/vad-whisper-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use VAD + Whisper tiny.en with
// sherpa-onnx's C API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/Obama.wav
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
// tar xvf sherpa-onnx-whisper-tiny.en.tar.bz2
// rm sherpa-onnx-whisper-tiny.en.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename = "./Obama.wav";
const char *vad_filename = "./silero_vad.onnx";
const char *encoder = "sherpa-onnx-whisper-tiny.en/tiny.en-encoder.int8.onnx";
const char *decoder = "sherpa-onnx-whisper-tiny.en/tiny.en-decoder.int8.onnx";
const char *tokens = "sherpa-onnx-whisper-tiny.en/tiny.en-tokens.txt";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
if (wave->sample_rate != 16000) {
fprintf(stderr, "Expect the sample rate to be 16000. Given: %d\n",
wave->sample_rate);
SherpaMnnFreeWave(wave);
return -1;
}
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 0;
offline_model_config.num_threads = 1;
offline_model_config.provider = "cpu";
offline_model_config.tokens = tokens;
offline_model_config.whisper.encoder = encoder;
offline_model_config.whisper.decoder = decoder;
offline_model_config.whisper.language = "en";
offline_model_config.whisper.tail_paddings = 0;
offline_model_config.whisper.task = "transcribe";
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
SherpaMnnVadModelConfig vadConfig;
memset(&vadConfig, 0, sizeof(vadConfig));
vadConfig.silero_vad.model = vad_filename;
vadConfig.silero_vad.threshold = 0.5;
vadConfig.silero_vad.min_silence_duration = 0.5;
vadConfig.silero_vad.min_speech_duration = 0.5;
vadConfig.silero_vad.max_speech_duration = 10;
vadConfig.silero_vad.window_size = 512;
vadConfig.sample_rate = 16000;
vadConfig.num_threads = 1;
vadConfig.debug = 1;
SherpaMnnVoiceActivityDetector *vad =
SherpaMnnCreateVoiceActivityDetector(&vadConfig, 30);
if (vad == NULL) {
fprintf(stderr, "Please check your recognizer config!\n");
SherpaMnnFreeWave(wave);
SherpaMnnDestroyOfflineRecognizer(recognizer);
return -1;
}
int32_t window_size = vadConfig.silero_vad.window_size;
int32_t i = 0;
int is_eof = 0;
while (!is_eof) {
if (i + window_size < wave->num_samples) {
SherpaMnnVoiceActivityDetectorAcceptWaveform(vad, wave->samples + i,
window_size);
}
else {
SherpaMnnVoiceActivityDetectorFlush(vad);
is_eof = 1;
}
while (!SherpaMnnVoiceActivityDetectorEmpty(vad)) {
const SherpaMnnSpeechSegment *segment =
SherpaMnnVoiceActivityDetectorFront(vad);
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate,
segment->samples, segment->n);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
float start = segment->start / 16000.0f;
float duration = segment->n / 16000.0f;
float stop = start + duration;
fprintf(stderr, "%.3f -- %.3f: %s\n", start, stop, result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroySpeechSegment(segment);
SherpaMnnVoiceActivityDetectorPop(vad);
}
i += window_size;
}
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnDestroyVoiceActivityDetector(vad);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,89 @@
// c-api-examples/whisper-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
// We assume you have pre-downloaded the whisper multi-lingual models
// from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
// An example command to download the "tiny" whisper model is given below:
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
// tar xvf sherpa-onnx-whisper-tiny.tar.bz2
// rm sherpa-onnx-whisper-tiny.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename = "./sherpa-onnx-whisper-tiny/test_wavs/0.wav";
const char *encoder_filename = "sherpa-onnx-whisper-tiny/tiny-encoder.onnx";
const char *decoder_filename = "sherpa-onnx-whisper-tiny/tiny-decoder.onnx";
const char *tokens_filename = "sherpa-onnx-whisper-tiny/tiny-tokens.txt";
const char *language = "en";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Whisper config
SherpaMnnOfflineWhisperModelConfig whisper_config;
memset(&whisper_config, 0, sizeof(whisper_config));
whisper_config.decoder = decoder_filename;
whisper_config.encoder = encoder_filename;
whisper_config.language = language;
whisper_config.tail_paddings = 0;
whisper_config.task = "transcribe";
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.whisper = whisper_config;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1,89 @@
// c-api-examples/zipformer-c-api.c
//
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use non-streaming Zipformer with sherpa-onnx's
// C API.
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-small-en-2023-06-26.tar.bz2
// tar xvf sherpa-onnx-zipformer-small-en-2023-06-26.tar.bz2
// rm sherpa-onnx-zipformer-small-en-2023-06-26.tar.bz2
//
// clang-format on
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sherpa-mnn/c-api/c-api.h"
int32_t main() {
const char *wav_filename =
"sherpa-onnx-zipformer-small-en-2023-06-26/test_wavs/0.wav";
const char *encoder_filename =
"sherpa-onnx-zipformer-small-en-2023-06-26/encoder-epoch-99-avg-1.onnx";
const char *decoder_filename =
"sherpa-onnx-zipformer-small-en-2023-06-26/decoder-epoch-99-avg-1.onnx";
const char *joiner_filename =
"sherpa-onnx-zipformer-small-en-2023-06-26/joiner-epoch-99-avg-1.onnx";
const char *tokens_filename =
"sherpa-onnx-zipformer-small-en-2023-06-26/tokens.txt";
const char *provider = "cpu";
const SherpaMnnWave *wave = SherpaMnnReadWave(wav_filename);
if (wave == NULL) {
fprintf(stderr, "Failed to read %s\n", wav_filename);
return -1;
}
// Zipformer config
SherpaMnnOfflineTransducerModelConfig zipformer_config;
memset(&zipformer_config, 0, sizeof(zipformer_config));
zipformer_config.encoder = encoder_filename;
zipformer_config.decoder = decoder_filename;
zipformer_config.joiner = joiner_filename;
// Offline model config
SherpaMnnOfflineModelConfig offline_model_config;
memset(&offline_model_config, 0, sizeof(offline_model_config));
offline_model_config.debug = 1;
offline_model_config.num_threads = 1;
offline_model_config.provider = provider;
offline_model_config.tokens = tokens_filename;
offline_model_config.transducer = zipformer_config;
// Recognizer config
SherpaMnnOfflineRecognizerConfig recognizer_config;
memset(&recognizer_config, 0, sizeof(recognizer_config));
recognizer_config.decoding_method = "greedy_search";
recognizer_config.model_config = offline_model_config;
const SherpaMnnOfflineRecognizer *recognizer =
SherpaMnnCreateOfflineRecognizer(&recognizer_config);
if (recognizer == NULL) {
fprintf(stderr, "Please check your config!\n");
SherpaMnnFreeWave(wave);
return -1;
}
const SherpaMnnOfflineStream *stream =
SherpaMnnCreateOfflineStream(recognizer);
SherpaMnnAcceptWaveformOffline(stream, wave->sample_rate, wave->samples,
wave->num_samples);
SherpaMnnDecodeOfflineStream(recognizer, stream);
const SherpaMnnOfflineRecognizerResult *result =
SherpaMnnGetOfflineStreamResult(stream);
fprintf(stderr, "Decoded text: %s\n", result->text);
SherpaMnnDestroyOfflineRecognizerResult(result);
SherpaMnnDestroyOfflineStream(stream);
SherpaMnnDestroyOfflineRecognizer(recognizer);
SherpaMnnFreeWave(wave);
return 0;
}

View File

@ -0,0 +1 @@
!*.cmake

View File

@ -0,0 +1,45 @@
function(download_asio)
include(FetchContent)
set(asio_URL "https://github.com/chriskohlhoff/asio/archive/refs/tags/asio-1-24-0.tar.gz")
set(asio_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/asio-asio-1-24-0.tar.gz")
set(asio_HASH "SHA256=cbcaaba0f66722787b1a7c33afe1befb3a012b5af3ad7da7ff0f6b8c9b7a8a5b")
# If you don't have access to the Internet,
# please pre-download asio
set(possible_file_locations
$ENV{HOME}/Downloads/asio-asio-1-24-0.tar.gz
${CMAKE_SOURCE_DIR}/asio-asio-1-24-0.tar.gz
${CMAKE_BINARY_DIR}/asio-asio-1-24-0.tar.gz
/tmp/asio-asio-1-24-0.tar.gz
/star-fj/fangjun/download/github/asio-asio-1-24-0.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(asio_URL "${f}")
file(TO_CMAKE_PATH "${asio_URL}" asio_URL)
message(STATUS "Found local downloaded asio: ${asio_URL}")
set(asio_URL2)
break()
endif()
endforeach()
FetchContent_Declare(asio
URL
${asio_URL}
${asio_URL2}
URL_HASH ${asio_HASH}
)
FetchContent_GetProperties(asio)
if(NOT asio_POPULATED)
message(STATUS "Downloading asio ${asio_URL}")
FetchContent_Populate(asio)
endif()
message(STATUS "asio is downloaded to ${asio_SOURCE_DIR}")
# add_subdirectory(${asio_SOURCE_DIR} ${asio_BINARY_DIR} EXCLUDE_FROM_ALL)
include_directories(${asio_SOURCE_DIR}/asio/include)
endfunction()
download_asio()

View File

@ -0,0 +1,50 @@
function(download_cargs)
include(FetchContent)
set(cargs_URL "https://github.com/likle/cargs/archive/refs/tags/v1.0.3.tar.gz")
set(cargs_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/cargs-1.0.3.tar.gz")
set(cargs_HASH "SHA256=ddba25bd35e9c6c75bc706c126001b8ce8e084d40ef37050e6aa6963e836eb8b")
# If you don't have access to the Internet,
# please pre-download cargs
set(possible_file_locations
$ENV{HOME}/Downloads/cargs-1.0.3.tar.gz
${CMAKE_SOURCE_DIR}/cargs-1.0.3.tar.gz
${CMAKE_BINARY_DIR}/cargs-1.0.3.tar.gz
/tmp/cargs-1.0.3.tar.gz
/star-fj/fangjun/download/github/cargs-1.0.3.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(cargs_URL "${f}")
file(TO_CMAKE_PATH "${cargs_URL}" cargs_URL)
message(STATUS "Found local downloaded cargs: ${cargs_URL}")
set(cargs_URL2)
break()
endif()
endforeach()
FetchContent_Declare(cargs
URL
${cargs_URL}
${cargs_URL2}
URL_HASH
${cargs_HASH}
)
FetchContent_GetProperties(cargs)
if(NOT cargs_POPULATED)
message(STATUS "Downloading cargs ${cargs_URL}")
FetchContent_Populate(cargs)
endif()
message(STATUS "cargs is downloaded to ${cargs_SOURCE_DIR}")
add_subdirectory(${cargs_SOURCE_DIR} ${cargs_BINARY_DIR} EXCLUDE_FROM_ALL)
install(TARGETS cargs DESTINATION lib)
install(FILES ${cargs_SOURCE_DIR}/include/cargs.h
DESTINATION include
)
endfunction()
download_cargs()

View File

@ -0,0 +1,227 @@
# cmake/cmake_extension.py
# Copyright (c) 2023 Xiaomi Corporation
#
# flake8: noqa
import os
import platform
import shutil
import sys
from pathlib import Path
import setuptools
from setuptools.command.build_ext import build_ext
def is_for_pypi():
ans = os.environ.get("SHERPA_ONNX_IS_FOR_PYPI", None)
return ans is not None
def is_macos():
return platform.system() == "Darwin"
def is_windows():
return platform.system() == "Windows"
def is_linux():
return platform.system() == "Linux"
def is_arm64():
return platform.machine() in ["arm64", "aarch64"]
def is_x86():
return platform.machine() in ["i386", "i686", "x86_64"]
def enable_alsa():
build_alsa = os.environ.get("SHERPA_ONNX_ENABLE_ALSA", None)
return build_alsa and is_linux() and (is_arm64() or is_x86())
def get_binaries():
binaries = [
"sherpa-onnx",
"sherpa-onnx-keyword-spotter",
"sherpa-onnx-microphone",
"sherpa-onnx-microphone-offline",
"sherpa-onnx-microphone-offline-audio-tagging",
"sherpa-onnx-microphone-offline-speaker-identification",
"sherpa-onnx-offline",
"sherpa-onnx-offline-audio-tagging",
"sherpa-onnx-offline-language-identification",
"sherpa-onnx-offline-punctuation",
"sherpa-onnx-offline-speaker-diarization",
"sherpa-onnx-offline-tts",
"sherpa-onnx-offline-tts-play",
"sherpa-onnx-offline-websocket-server",
"sherpa-onnx-online-punctuation",
"sherpa-onnx-online-websocket-client",
"sherpa-onnx-online-websocket-server",
"sherpa-onnx-vad-microphone",
"sherpa-onnx-vad-microphone-offline-asr",
"sherpa-onnx-vad-with-offline-asr",
]
if enable_alsa():
binaries += [
"sherpa-onnx-alsa",
"sherpa-onnx-alsa-offline",
"sherpa-onnx-alsa-offline-speaker-identification",
"sherpa-onnx-offline-tts-play-alsa",
"sherpa-onnx-vad-alsa",
"sherpa-onnx-alsa-offline-audio-tagging",
]
if is_windows():
binaries += [
"onnxruntime.dll",
"sherpa-onnx-c-api.dll",
"sherpa-onnx-cxx-api.dll",
]
return binaries
try:
from wheel.bdist_wheel import bdist_wheel as _bdist_wheel
class bdist_wheel(_bdist_wheel):
def finalize_options(self):
_bdist_wheel.finalize_options(self)
# In this case, the generated wheel has a name in the form
# sherpa-xxx-pyxx-none-any.whl
if is_for_pypi() and not is_macos():
self.root_is_pure = True
else:
# The generated wheel has a name ending with
# -linux_x86_64.whl
self.root_is_pure = False
except ImportError:
bdist_wheel = None
def cmake_extension(name, *args, **kwargs) -> setuptools.Extension:
kwargs["language"] = "c++"
sources = []
return setuptools.Extension(name, sources, *args, **kwargs)
class BuildExtension(build_ext):
def build_extension(self, ext: setuptools.extension.Extension):
# build/temp.linux-x86_64-3.8
os.makedirs(self.build_temp, exist_ok=True)
# build/lib.linux-x86_64-3.8
os.makedirs(self.build_lib, exist_ok=True)
out_bin_dir = Path(self.build_lib).parent / "sherpa_onnx" / "bin"
install_dir = Path(self.build_lib).resolve() / "sherpa_onnx"
sherpa_onnx_dir = Path(__file__).parent.parent.resolve()
cmake_args = os.environ.get("SHERPA_ONNX_CMAKE_ARGS", "")
make_args = os.environ.get("SHERPA_ONNX_MAKE_ARGS", "")
system_make_args = os.environ.get("MAKEFLAGS", "")
if cmake_args == "":
cmake_args = "-DCMAKE_BUILD_TYPE=Release"
extra_cmake_args = f" -DCMAKE_INSTALL_PREFIX={install_dir} "
extra_cmake_args += " -DBUILD_SHARED_LIBS=ON "
extra_cmake_args += " -DBUILD_PIPER_PHONMIZE_EXE=OFF "
extra_cmake_args += " -DBUILD_PIPER_PHONMIZE_TESTS=OFF "
extra_cmake_args += " -DBUILD_ESPEAK_NG_EXE=OFF "
extra_cmake_args += " -DBUILD_ESPEAK_NG_TESTS=OFF "
extra_cmake_args += " -DSHERPA_ONNX_ENABLE_C_API=ON "
extra_cmake_args += " -DSHERPA_ONNX_BUILD_C_API_EXAMPLES=OFF "
extra_cmake_args += " -DSHERPA_ONNX_ENABLE_CHECK=OFF "
extra_cmake_args += " -DSHERPA_ONNX_ENABLE_PYTHON=ON "
extra_cmake_args += " -DSHERPA_ONNX_ENABLE_PORTAUDIO=ON "
extra_cmake_args += " -DSHERPA_ONNX_ENABLE_WEBSOCKET=ON "
if "PYTHON_EXECUTABLE" not in cmake_args:
print(f"Setting PYTHON_EXECUTABLE to {sys.executable}")
cmake_args += f" -DPYTHON_EXECUTABLE={sys.executable}"
cmake_args += extra_cmake_args
if is_windows():
build_cmd = f"""
cmake {cmake_args} -B {self.build_temp} -S {sherpa_onnx_dir}
cmake --build {self.build_temp} --target install --config Release -- -m:2
"""
print(f"build command is:\n{build_cmd}")
ret = os.system(
f"cmake {cmake_args} -B {self.build_temp} -S {sherpa_onnx_dir}"
)
if ret != 0:
raise Exception("Failed to configure sherpa")
ret = os.system(
f"cmake --build {self.build_temp} --target install --config Release -- -m:2" # noqa
)
if ret != 0:
raise Exception("Failed to build and install sherpa")
else:
if make_args == "" and system_make_args == "":
print("for fast compilation, run:")
print('export SHERPA_ONNX_MAKE_ARGS="-j"; python setup.py install')
print('Setting make_args to "-j4"')
make_args = "-j4"
if "-G Ninja" in cmake_args:
build_cmd = f"""
cd {self.build_temp}
cmake {cmake_args} {sherpa_onnx_dir}
ninja {make_args} install
"""
else:
build_cmd = f"""
cd {self.build_temp}
cmake {cmake_args} {sherpa_onnx_dir}
make {make_args} install/strip
"""
print(f"build command is:\n{build_cmd}")
ret = os.system(build_cmd)
if ret != 0:
raise Exception(
"\nBuild sherpa-onnx failed. Please check the error message.\n"
"You can ask for help by creating an issue on GitHub.\n"
"\nClick:\n\thttps://github.com/k2-fsa/sherpa-onnx/issues/new\n" # noqa
)
suffix = ".exe" if is_windows() else ""
# Remember to also change setup.py
binaries = get_binaries()
for f in binaries:
suffix = "" if ".dll" in f else suffix
src_file = install_dir / "bin" / (f + suffix)
if not src_file.is_file():
src_file = install_dir / "lib" / (f + suffix)
if not src_file.is_file():
src_file = install_dir / ".." / (f + suffix)
print(f"Copying {src_file} to {out_bin_dir}/")
shutil.copy(f"{src_file}", f"{out_bin_dir}/")
shutil.rmtree(f"{install_dir}/bin")
shutil.rmtree(f"{install_dir}/share")
shutil.rmtree(f"{install_dir}/lib/pkgconfig")
if is_macos():
os.remove(f"{install_dir}/lib/libonnxruntime.dylib")
if is_windows():
shutil.rmtree(f"{install_dir}/lib")

View File

@ -0,0 +1,45 @@
function(download_cppjieba)
include(FetchContent)
set(cppjieba_URL "https://github.com/csukuangfj/cppjieba/archive/refs/tags/sherpa-onnx-2024-04-19.tar.gz")
set(cppjieba_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/cppjieba-sherpa-onnx-2024-04-19.tar.gz")
set(cppjieba_HASH "SHA256=03e5264687f0efaef05487a07d49c3f4c0f743347bfbf825df4b30cc75ac5288")
# If you don't have access to the Internet,
# please pre-download cppjieba
set(possible_file_locations
$ENV{HOME}/Downloads/cppjieba-sherpa-onnx-2024-04-19.tar.gz
${CMAKE_SOURCE_DIR}/cppjieba-sherpa-onnx-2024-04-19.tar.gz
${CMAKE_BINARY_DIR}/cppjieba-sherpa-onnx-2024-04-19.tar.gz
/tmp/cppjieba-sherpa-onnx-2024-04-19.tar.gz
/star-fj/fangjun/download/github/cppjieba-sherpa-onnx-2024-04-19.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(cppjieba_URL "${f}")
file(TO_CMAKE_PATH "${cppjieba_URL}" cppjieba_URL)
message(STATUS "Found local downloaded cppjieba: ${cppjieba_URL}")
set(cppjieba_URL2)
break()
endif()
endforeach()
FetchContent_Declare(cppjieba
URL
${cppjieba_URL}
${cppjieba_URL2}
URL_HASH
${cppjieba_HASH}
)
FetchContent_GetProperties(cppjieba)
if(NOT cppjieba_POPULATED)
message(STATUS "Downloading cppjieba ${cppjieba_URL}")
FetchContent_Populate(cppjieba)
endif()
message(STATUS "cppjieba is downloaded to ${cppjieba_SOURCE_DIR}")
add_subdirectory(${cppjieba_SOURCE_DIR} ${cppjieba_BINARY_DIR} EXCLUDE_FROM_ALL)
endfunction()
download_cppjieba()

View File

@ -0,0 +1,48 @@
function(download_eigen)
include(FetchContent)
set(eigen_URL "https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz")
set(eigen_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/eigen-3.4.0.tar.gz")
set(eigen_HASH "SHA256=8586084f71f9bde545ee7fa6d00288b264a2b7ac3607b974e54d13e7162c1c72")
# If you don't have access to the Internet,
# please pre-download eigen
set(possible_file_locations
$ENV{HOME}/Downloads/eigen-3.4.0.tar.gz
${CMAKE_SOURCE_DIR}/eigen-3.4.0.tar.gz
${CMAKE_BINARY_DIR}/eigen-3.4.0.tar.gz
/tmp/eigen-3.4.0.tar.gz
/star-fj/fangjun/download/github/eigen-3.4.0.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(eigen_URL "${f}")
file(TO_CMAKE_PATH "${eigen_URL}" eigen_URL)
message(STATUS "Found local downloaded eigen: ${eigen_URL}")
set(eigen_URL2)
break()
endif()
endforeach()
set(BUILD_TESTING OFF CACHE BOOL "" FORCE)
set(EIGEN_BUILD_DOC OFF CACHE BOOL "" FORCE)
FetchContent_Declare(eigen
URL ${eigen_URL}
URL_HASH ${eigen_HASH}
)
FetchContent_GetProperties(eigen)
if(NOT eigen_POPULATED)
message(STATUS "Downloading eigen from ${eigen_URL}")
FetchContent_Populate(eigen)
endif()
message(STATUS "eigen is downloaded to ${eigen_SOURCE_DIR}")
message(STATUS "eigen's binary dir is ${eigen_BINARY_DIR}")
add_subdirectory(${eigen_SOURCE_DIR} ${eigen_BINARY_DIR} EXCLUDE_FROM_ALL)
endfunction()
download_eigen()

View File

@ -0,0 +1,134 @@
function(download_espeak_ng_for_piper)
include(FetchContent)
set(espeak_ng_URL "https://github.com/csukuangfj/espeak-ng/archive/f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip")
set(espeak_ng_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip")
set(espeak_ng_HASH "SHA256=70cbf4050e7a014aae19140b05e57249da4720f56128459fbe3a93beaf971ae6")
set(BUILD_ESPEAK_NG_TESTS OFF CACHE BOOL "" FORCE)
set(USE_ASYNC OFF CACHE BOOL "" FORCE)
set(USE_MBROLA OFF CACHE BOOL "" FORCE)
set(USE_LIBSONIC OFF CACHE BOOL "" FORCE)
set(USE_LIBPCAUDIO OFF CACHE BOOL "" FORCE)
set(USE_KLATT OFF CACHE BOOL "" FORCE)
set(USE_SPEECHPLAYER OFF CACHE BOOL "" FORCE)
set(EXTRA_cmn ON CACHE BOOL "" FORCE)
set(EXTRA_ru ON CACHE BOOL "" FORCE)
if (NOT SHERPA_ONNX_ENABLE_EPSEAK_NG_EXE)
set(BUILD_ESPEAK_NG_EXE OFF CACHE BOOL "" FORCE)
endif()
# If you don't have access to the Internet,
# please pre-download kaldi-decoder
set(possible_file_locations
$ENV{HOME}/Downloads/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip
${CMAKE_SOURCE_DIR}/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip
${CMAKE_BINARY_DIR}/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip
/tmp/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip
/star-fj/fangjun/download/github/espeak-ng-f6fed6c58b5e0998b8e68c6610125e2d07d595a7.zip
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(espeak_ng_URL "${f}")
file(TO_CMAKE_PATH "${espeak_ng_URL}" espeak_ng_URL)
message(STATUS "Found local downloaded espeak-ng: ${espeak_ng_URL}")
set(espeak_ng_URL2 )
break()
endif()
endforeach()
FetchContent_Declare(espeak_ng
URL
${espeak_ng_URL}
${espeak_ng_URL2}
URL_HASH ${espeak_ng_HASH}
)
FetchContent_GetProperties(espeak_ng)
if(NOT espeak_ng_POPULATED)
message(STATUS "Downloading espeak-ng from ${espeak_ng_URL}")
FetchContent_Populate(espeak_ng)
endif()
message(STATUS "espeak-ng is downloaded to ${espeak_ng_SOURCE_DIR}")
message(STATUS "espeak-ng binary dir is ${espeak_ng_BINARY_DIR}")
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${espeak_ng_SOURCE_DIR} ${espeak_ng_BINARY_DIR})
if(_build_shared_libs_bak)
set_target_properties(espeak-ng
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
set(espeak_ng_SOURCE_DIR ${espeak_ng_SOURCE_DIR} PARENT_SCOPE)
if(WIN32 AND MSVC)
target_compile_options(ucd PUBLIC
/wd4309
)
target_compile_options(espeak-ng PUBLIC
/wd4005
/wd4018
/wd4067
/wd4068
/wd4090
/wd4101
/wd4244
/wd4267
/wd4996
)
if(TARGET espeak-ng-bin)
target_compile_options(espeak-ng-bin PRIVATE
/wd4244
/wd4024
/wd4047
/wd4067
/wd4267
/wd4996
)
endif()
endif()
if(UNIX AND NOT APPLE)
target_compile_options(espeak-ng PRIVATE
-Wno-unused-result
-Wno-format-overflow
-Wno-format-truncation
-Wno-uninitialized
-Wno-format
)
if(TARGET espeak-ng-bin)
target_compile_options(espeak-ng-bin PRIVATE
-Wno-unused-result
)
endif()
endif()
target_include_directories(espeak-ng
INTERFACE
${espeak_ng_SOURCE_DIR}/src/include
${espeak_ng_SOURCE_DIR}/src/ucd-tools/src/include
)
if(NOT BUILD_SHARED_LIBS)
install(TARGETS
espeak-ng
ucd
DESTINATION lib)
endif()
endfunction()
download_espeak_ng_for_piper()

View File

@ -0,0 +1,76 @@
function(download_googltest)
include(FetchContent)
set(googletest_URL "https://github.com/google/googletest/archive/refs/tags/v1.13.0.tar.gz")
set(googletest_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/googletest-1.13.0.tar.gz")
set(googletest_HASH "SHA256=ad7fdba11ea011c1d925b3289cf4af2c66a352e18d4c7264392fead75e919363")
# If you don't have access to the Internet,
# please pre-download googletest
set(possible_file_locations
$ENV{HOME}/Downloads/googletest-1.13.0.tar.gz
${CMAKE_SOURCE_DIR}/googletest-1.13.0.tar.gz
${CMAKE_BINARY_DIR}/googletest-1.13.0.tar.gz
/tmp/googletest-1.13.0.tar.gz
/star-fj/fangjun/download/github/googletest-1.13.0.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(googletest_URL "${f}")
file(TO_CMAKE_PATH "${googletest_URL}" googletest_URL)
message(STATUS "Found local downloaded googletest: ${googletest_URL}")
set(googletest_URL2)
break()
endif()
endforeach()
set(BUILD_GMOCK ON CACHE BOOL "" FORCE)
set(INSTALL_GTEST OFF CACHE BOOL "" FORCE)
set(gtest_disable_pthreads ON CACHE BOOL "" FORCE)
set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FetchContent_Declare(googletest
URL
${googletest_URL}
${googletest_URL2}
URL_HASH ${googletest_HASH}
)
FetchContent_GetProperties(googletest)
if(NOT googletest_POPULATED)
message(STATUS "Downloading googletest from ${googletest_URL}")
FetchContent_Populate(googletest)
endif()
message(STATUS "googletest is downloaded to ${googletest_SOURCE_DIR}")
message(STATUS "googletest's binary dir is ${googletest_BINARY_DIR}")
if(APPLE)
set(CMAKE_MACOSX_RPATH ON) # to solve the following warning on macOS
endif()
#[==[
-- Generating done
Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake
--help-policy CMP0042" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
MACOSX_RPATH is not specified for the following targets:
gmock
gmock_main
gtest
gtest_main
This warning is for project developers. Use -Wno-dev to suppress it.
]==]
add_subdirectory(${googletest_SOURCE_DIR} ${googletest_BINARY_DIR} EXCLUDE_FROM_ALL)
target_include_directories(gtest
INTERFACE
${googletest_SOURCE_DIR}/googletest/include
${googletest_SOURCE_DIR}/googlemock/include
)
endfunction()
download_googltest()

View File

@ -0,0 +1,47 @@
function(download_hclust_cpp)
include(FetchContent)
# The latest commit as of 2024.09.29
set(hclust_cpp_URL "https://github.com/csukuangfj/hclust-cpp/archive/refs/tags/2024-09-29.tar.gz")
set(hclust_cpp_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/hclust-cpp-2024-09-29.tar.gz")
set(hclust_cpp_HASH "SHA256=abab51448a3cb54272aae07522970306e0b2cc6479d59d7b19e7aee4d6cedd33")
# If you don't have access to the Internet,
# please pre-download hclust-cpp
set(possible_file_locations
$ENV{HOME}/Downloads/hclust-cpp-2024-09-29.tar.gz
${CMAKE_SOURCE_DIR}/hclust-cpp-2024-09-29.tar.gz
${CMAKE_BINARY_DIR}/hclust-cpp-2024-09-29.tar.gz
/tmp/hclust-cpp-2024-09-29.tar.gz
/star-fj/fangjun/download/github/hclust-cpp-2024-09-29.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(hclust_cpp_URL "${f}")
file(TO_CMAKE_PATH "${hclust_cpp_URL}" hclust_cpp_URL)
message(STATUS "Found local downloaded hclust_cpp: ${hclust_cpp_URL}")
set(hclust_cpp_URL2)
break()
endif()
endforeach()
FetchContent_Declare(hclust_cpp
URL
${hclust_cpp_URL}
${hclust_cpp_URL2}
URL_HASH ${hclust_cpp_HASH}
)
FetchContent_GetProperties(hclust_cpp)
if(NOT hclust_cpp_POPULATED)
message(STATUS "Downloading hclust_cpp from ${hclust_cpp_URL}")
FetchContent_Populate(hclust_cpp)
endif()
message(STATUS "hclust_cpp is downloaded to ${hclust_cpp_SOURCE_DIR}")
message(STATUS "hclust_cpp's binary dir is ${hclust_cpp_BINARY_DIR}")
include_directories(${hclust_cpp_SOURCE_DIR})
endfunction()
download_hclust_cpp()

View File

@ -0,0 +1,89 @@
function(download_kaldi_decoder)
include(FetchContent)
set(kaldi_decoder_URL "https://github.com/k2-fsa/kaldi-decoder/archive/refs/tags/v0.2.6.tar.gz")
set(kaldi_decoder_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldi-decoder-0.2.6.tar.gz")
set(kaldi_decoder_HASH "SHA256=b13c78b37495cafc6ef3f8a7b661b349c55a51abbd7f7f42f389408dcf86a463")
set(KALDI_DECODER_BUILD_PYTHON OFF CACHE BOOL "" FORCE)
set(KALDI_DECODER_ENABLE_TESTS OFF CACHE BOOL "" FORCE)
set(KALDIFST_BUILD_PYTHON OFF CACHE BOOL "" FORCE)
# If you don't have access to the Internet,
# please pre-download kaldi-decoder
set(possible_file_locations
$ENV{HOME}/Downloads/kaldi-decoder-0.2.6.tar.gz
${CMAKE_SOURCE_DIR}/kaldi-decoder-0.2.6.tar.gz
${CMAKE_BINARY_DIR}/kaldi-decoder-0.2.6.tar.gz
/tmp/kaldi-decoder-0.2.6.tar.gz
/star-fj/fangjun/download/github/kaldi-decoder-0.2.6.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(kaldi_decoder_URL "${f}")
file(TO_CMAKE_PATH "${kaldi_decoder_URL}" kaldi_decoder_URL)
message(STATUS "Found local downloaded kaldi-decoder: ${kaldi_decoder_URL}")
set(kaldi_decoder_URL2 )
break()
endif()
endforeach()
FetchContent_Declare(kaldi_decoder
URL
${kaldi_decoder_URL}
${kaldi_decoder_URL2}
URL_HASH ${kaldi_decoder_HASH}
)
FetchContent_GetProperties(kaldi_decoder)
if(NOT kaldi_decoder_POPULATED)
message(STATUS "Downloading kaldi-decoder from ${kaldi_decoder_URL}")
FetchContent_Populate(kaldi_decoder)
endif()
message(STATUS "kaldi-decoder is downloaded to ${kaldi_decoder_SOURCE_DIR}")
message(STATUS "kaldi-decoder's binary dir is ${kaldi_decoder_BINARY_DIR}")
include_directories(${kaldi_decoder_SOURCE_DIR})
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${kaldi_decoder_SOURCE_DIR} ${kaldi_decoder_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(
kaldi-decoder-core
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
if(WIN32 AND MSVC)
target_compile_options(kaldi-decoder-core PUBLIC
/wd4018
/wd4291
)
endif()
target_include_directories(kaldi-decoder-core
INTERFACE
${kaldi-decoder_SOURCE_DIR}/
)
if(NOT BUILD_SHARED_LIBS)
install(TARGETS
kaldi-decoder-core
kaldifst_core
fst
fstfar
DESTINATION lib)
endif()
endfunction()
download_kaldi_decoder()

View File

@ -0,0 +1,74 @@
function(download_kaldi_native_fbank)
include(FetchContent)
set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.21.1.tar.gz")
set(kaldi_native_fbank_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldi-native-fbank-1.21.1.tar.gz")
set(kaldi_native_fbank_HASH "SHA256=37c1aa230b00fe062791d800d8fc50aa3de215918d3dce6440699e67275d859e")
set(KALDI_NATIVE_FBANK_BUILD_TESTS OFF CACHE BOOL "" FORCE)
set(KALDI_NATIVE_FBANK_BUILD_PYTHON OFF CACHE BOOL "" FORCE)
set(KALDI_NATIVE_FBANK_ENABLE_CHECK OFF CACHE BOOL "" FORCE)
# If you don't have access to the Internet,
# please pre-download kaldi-native-fbank
set(possible_file_locations
$ENV{HOME}/Downloads/kaldi-native-fbank-1.21.1.tar.gz
${CMAKE_SOURCE_DIR}/kaldi-native-fbank-1.21.1.tar.gz
${CMAKE_BINARY_DIR}/kaldi-native-fbank-1.21.1.tar.gz
/tmp/kaldi-native-fbank-1.21.1.tar.gz
/star-fj/fangjun/download/github/kaldi-native-fbank-1.21.1.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(kaldi_native_fbank_URL "${f}")
file(TO_CMAKE_PATH "${kaldi_native_fbank_URL}" kaldi_native_fbank_URL)
message(STATUS "Found local downloaded kaldi-native-fbank: ${kaldi_native_fbank_URL}")
set(kaldi_native_fbank_URL2 )
break()
endif()
endforeach()
FetchContent_Declare(kaldi_native_fbank
URL
${kaldi_native_fbank_URL}
${kaldi_native_fbank_URL2}
URL_HASH ${kaldi_native_fbank_HASH}
)
FetchContent_GetProperties(kaldi_native_fbank)
if(NOT kaldi_native_fbank_POPULATED)
message(STATUS "Downloading kaldi-native-fbank from ${kaldi_native_fbank_URL}")
FetchContent_Populate(kaldi_native_fbank)
endif()
message(STATUS "kaldi-native-fbank is downloaded to ${kaldi_native_fbank_SOURCE_DIR}")
message(STATUS "kaldi-native-fbank's binary dir is ${kaldi_native_fbank_BINARY_DIR}")
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${kaldi_native_fbank_SOURCE_DIR} ${kaldi_native_fbank_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(kaldi-native-fbank-core
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
target_include_directories(kaldi-native-fbank-core
INTERFACE
${kaldi_native_fbank_SOURCE_DIR}/
)
if(NOT BUILD_SHARED_LIBS)
install(TARGETS kaldi-native-fbank-core DESTINATION lib)
endif()
endfunction()
download_kaldi_native_fbank()

View File

@ -0,0 +1,72 @@
function(download_kaldifst)
include(FetchContent)
set(kaldifst_URL "https://github.com/k2-fsa/kaldifst/archive/refs/tags/v1.7.11.tar.gz")
set(kaldifst_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldifst-1.7.11.tar.gz")
set(kaldifst_HASH "SHA256=b43b3332faa2961edc730e47995a58cd4e22ead21905d55b0c4a41375b4a525f")
# If you don't have access to the Internet,
# please pre-download kaldifst
set(possible_file_locations
$ENV{HOME}/Downloads/kaldifst-1.7.11.tar.gz
${CMAKE_SOURCE_DIR}/kaldifst-1.7.11.tar.gz
${CMAKE_BINARY_DIR}/kaldifst-1.7.11.tar.gz
/tmp/kaldifst-1.7.11.tar.gz
/star-fj/fangjun/download/github/kaldifst-1.7.11.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(kaldifst_URL "${f}")
file(TO_CMAKE_PATH "${kaldifst_URL}" kaldifst_URL)
message(STATUS "Found local downloaded kaldifst: ${kaldifst_URL}")
set(kaldifst_URL2)
break()
endif()
endforeach()
set(KALDIFST_BUILD_TESTS OFF CACHE BOOL "" FORCE)
set(KALDIFST_BUILD_PYTHON OFF CACHE BOOL "" FORCE)
FetchContent_Declare(kaldifst
URL ${kaldifst_URL}
URL_HASH ${kaldifst_HASH}
)
FetchContent_GetProperties(kaldifst)
if(NOT kaldifst_POPULATED)
message(STATUS "Downloading kaldifst from ${kaldifst_URL}")
FetchContent_Populate(kaldifst)
endif()
message(STATUS "kaldifst is downloaded to ${kaldifst_SOURCE_DIR}")
message(STATUS "kaldifst's binary dir is ${kaldifst_BINARY_DIR}")
list(APPEND CMAKE_MODULE_PATH ${kaldifst_SOURCE_DIR}/cmake)
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${kaldifst_SOURCE_DIR} ${kaldifst_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(kaldifst_core
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
target_include_directories(kaldifst_core
PUBLIC
${kaldifst_SOURCE_DIR}/
)
set_target_properties(kaldifst_core PROPERTIES OUTPUT_NAME "sherpa-mnn-kaldifst-core")
# installed in ./kaldi-decoder.cmake
endfunction()
download_kaldifst()

View File

@ -0,0 +1,109 @@
# Copyright (c) 2020 Xiaomi Corporation (author: Fangjun Kuang)
function(download_openfst)
include(FetchContent)
set(openfst_URL "https://github.com/csukuangfj/openfst/archive/refs/tags/sherpa-onnx-2024-06-19.tar.gz")
set(openfst_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/openfst-sherpa-onnx-2024-06-19.tar.gz")
set(openfst_HASH "SHA256=5c98e82cc509c5618502dde4860b8ea04d843850ed57e6d6b590b644b268853d")
# If you don't have access to the Internet,
# please pre-download it
set(possible_file_locations
$ENV{HOME}/Downloads/openfst-sherpa-onnx-2024-06-19.tar.gz
${CMAKE_SOURCE_DIR}/openfst-sherpa-onnx-2024-06-19.tar.gz
${CMAKE_BINARY_DIR}/openfst-sherpa-onnx-2024-06-19.tar.gz
/tmp/openfst-sherpa-onnx-2024-06-19.tar.gz
/star-fj/fangjun/download/github/openfst-sherpa-onnx-2024-06-19.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(openfst_URL "${f}")
file(TO_CMAKE_PATH "${openfst_URL}" openfst_URL)
set(openfst_URL2)
break()
endif()
endforeach()
set(HAVE_BIN OFF CACHE BOOL "" FORCE)
set(HAVE_SCRIPT OFF CACHE BOOL "" FORCE)
set(HAVE_COMPACT OFF CACHE BOOL "" FORCE)
set(HAVE_COMPRESS OFF CACHE BOOL "" FORCE)
set(HAVE_CONST OFF CACHE BOOL "" FORCE)
set(HAVE_FAR ON CACHE BOOL "" FORCE)
set(HAVE_GRM OFF CACHE BOOL "" FORCE)
set(HAVE_PDT OFF CACHE BOOL "" FORCE)
set(HAVE_MPDT OFF CACHE BOOL "" FORCE)
set(HAVE_LINEAR OFF CACHE BOOL "" FORCE)
set(HAVE_LOOKAHEAD OFF CACHE BOOL "" FORCE)
set(HAVE_NGRAM OFF CACHE BOOL "" FORCE)
set(HAVE_PYTHON OFF CACHE BOOL "" FORCE)
set(HAVE_SPECIAL OFF CACHE BOOL "" FORCE)
if(NOT WIN32)
FetchContent_Declare(openfst
URL
${openfst_URL}
${openfst_URL2}
URL_HASH ${openfst_HASH}
PATCH_COMMAND
sed -i.bak s/enable_testing\(\)//g "src/CMakeLists.txt" &&
sed -i.bak s/add_subdirectory\(test\)//g "src/CMakeLists.txt" &&
sed -i.bak /message/d "src/script/CMakeLists.txt"
# sed -i.bak s/add_subdirectory\(script\)//g "src/CMakeLists.txt" &&
# sed -i.bak s/add_subdirectory\(extensions\)//g "src/CMakeLists.txt"
)
else()
FetchContent_Declare(openfst
URL ${openfst_URL}
URL_HASH ${openfst_HASH}
)
endif()
FetchContent_GetProperties(openfst)
if(NOT openfst_POPULATED)
message(STATUS "Downloading openfst from ${openfst_URL}")
FetchContent_Populate(openfst)
endif()
message(STATUS "openfst is downloaded to ${openfst_SOURCE_DIR}")
if(_build_shared_libs_bak)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${openfst_SOURCE_DIR} ${openfst_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(fst fstfar
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
set(openfst_SOURCE_DIR ${openfst_SOURCE_DIR} PARENT_SCOPE)
set_target_properties(fst PROPERTIES OUTPUT_NAME "sherpa-mnn-fst")
set_target_properties(fstfar PROPERTIES OUTPUT_NAME "sherpa-mnn-fstfar")
if(LINUX)
target_compile_options(fst PUBLIC -Wno-missing-template-keyword)
endif()
target_include_directories(fst
PUBLIC
${openfst_SOURCE_DIR}/src/include
)
target_include_directories(fstfar
PUBLIC
${openfst_SOURCE_DIR}/src/include
)
# installed in ./kaldi-decoder.cmake
endfunction()
download_openfst()

View File

@ -0,0 +1,78 @@
function(download_piper_phonemize)
include(FetchContent)
set(piper_phonemize_URL "https://github.com/csukuangfj/piper-phonemize/archive/78a788e0b719013401572d70fef372e77bff8e43.zip")
set(piper_phonemize_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip")
set(piper_phonemize_HASH "SHA256=89641a46489a4898754643ce57bda9c9b54b4ca46485fdc02bf0dc84b866645d")
# If you don't have access to the Internet,
# please pre-download kaldi-decoder
set(possible_file_locations
$ENV{HOME}/Downloads/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip
${CMAKE_SOURCE_DIR}/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip
${CMAKE_BINARY_DIR}/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip
/tmp/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip
/star-fj/fangjun/download/github/piper-phonemize-78a788e0b719013401572d70fef372e77bff8e43.zip
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(piper_phonemize_URL "${f}")
file(TO_CMAKE_PATH "${piper_phonemize_URL}" piper_phonemize_URL)
message(STATUS "Found local downloaded espeak-ng: ${piper_phonemize_URL}")
set(piper_phonemize_URL2 )
break()
endif()
endforeach()
FetchContent_Declare(piper_phonemize
URL
${piper_phonemize_URL}
${piper_phonemize_URL2}
URL_HASH ${piper_phonemize_HASH}
)
FetchContent_GetProperties(piper_phonemize)
if(NOT piper_phonemize_POPULATED)
message(STATUS "Downloading piper-phonemize from ${piper_phonemize_URL}")
FetchContent_Populate(piper_phonemize)
endif()
message(STATUS "piper-phonemize is downloaded to ${piper_phonemize_SOURCE_DIR}")
message(STATUS "piper-phonemize binary dir is ${piper_phonemize_BINARY_DIR}")
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${piper_phonemize_SOURCE_DIR} ${piper_phonemize_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(piper_phonemize
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
if(WIN32 AND MSVC)
target_compile_options(piper_phonemize PUBLIC
/wd4309
)
endif()
target_include_directories(piper_phonemize
INTERFACE
${piper_phonemize_SOURCE_DIR}/src/include
)
if(NOT BUILD_SHARED_LIBS)
install(TARGETS
piper_phonemize
DESTINATION lib)
endif()
endfunction()
download_piper_phonemize()

View File

@ -0,0 +1,71 @@
function(download_portaudio)
include(FetchContent)
set(portaudio_URL "http://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz")
set(portaudio_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/pa_stable_v190700_20210406.tgz")
set(portaudio_HASH "SHA256=47efbf42c77c19a05d22e627d42873e991ec0c1357219c0d74ce6a2948cb2def")
# If you don't have access to the Internet, please download it to your
# local drive and modify the following line according to your needs.
set(possible_file_locations
$ENV{HOME}/Downloads/pa_stable_v190700_20210406.tgz
$ENV{HOME}/asr/pa_stable_v190700_20210406.tgz
${CMAKE_SOURCE_DIR}/pa_stable_v190700_20210406.tgz
${CMAKE_BINARY_DIR}/pa_stable_v190700_20210406.tgz
/tmp/pa_stable_v190700_20210406.tgz
/star-fj/fangjun/download/github/pa_stable_v190700_20210406.tgz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(portaudio_URL "${f}")
file(TO_CMAKE_PATH "${portaudio_URL}" portaudio_URL)
message(STATUS "Found local downloaded portaudio: ${portaudio_URL}")
set(portaudio_URL2)
break()
endif()
endforeach()
# Always use static build
set(PA_BUILD_SHARED OFF CACHE BOOL "" FORCE)
set(PA_BUILD_STATIC ON CACHE BOOL "" FORCE)
FetchContent_Declare(portaudio
URL
${portaudio_URL}
${portaudio_URL2}
URL_HASH ${portaudio_HASH}
)
FetchContent_GetProperties(portaudio)
if(NOT portaudio_POPULATED)
message(STATUS "Downloading portaudio from ${portaudio_URL}")
FetchContent_Populate(portaudio)
endif()
message(STATUS "portaudio is downloaded to ${portaudio_SOURCE_DIR}")
message(STATUS "portaudio's binary dir is ${portaudio_BINARY_DIR}")
if(APPLE)
set(CMAKE_MACOSX_RPATH ON) # to solve the following warning on macOS
endif()
add_subdirectory(${portaudio_SOURCE_DIR} ${portaudio_BINARY_DIR} EXCLUDE_FROM_ALL)
set_target_properties(portaudio_static PROPERTIES OUTPUT_NAME "sherpa-onnx-portaudio_static")
if(NOT WIN32)
target_compile_options(portaudio_static PRIVATE "-Wno-deprecated-declarations")
endif()
if(NOT BUILD_SHARED_LIBS AND SHERPA_ONNX_ENABLE_BINARY)
install(TARGETS
portaudio_static
DESTINATION lib)
endif()
endfunction()
download_portaudio()
# Note
# See http://portaudio.com/docs/v19-doxydocs/tutorial_start.html
# for how to use portaudio

View File

@ -0,0 +1,44 @@
function(download_pybind11)
include(FetchContent)
set(pybind11_URL "https://github.com/pybind/pybind11/archive/refs/tags/v2.12.0.tar.gz")
set(pybind11_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/pybind11-2.12.0.tar.gz")
set(pybind11_HASH "SHA256=bf8f242abd1abcd375d516a7067490fb71abd79519a282d22b6e4d19282185a7")
# If you don't have access to the Internet,
# please pre-download pybind11
set(possible_file_locations
$ENV{HOME}/Downloads/pybind11-2.12.0.tar.gz
${CMAKE_SOURCE_DIR}/pybind11-2.12.0.tar.gz
${CMAKE_BINARY_DIR}/pybind11-2.12.0.tar.gz
/tmp/pybind11-2.12.0.tar.gz
/star-fj/fangjun/download/github/pybind11-2.12.0.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(pybind11_URL "${f}")
file(TO_CMAKE_PATH "${pybind11_URL}" pybind11_URL)
message(STATUS "Found local downloaded pybind11: ${pybind11_URL}")
set(pybind11_URL2)
break()
endif()
endforeach()
FetchContent_Declare(pybind11
URL
${pybind11_URL}
${pybind11_URL2}
URL_HASH ${pybind11_HASH}
)
FetchContent_GetProperties(pybind11)
if(NOT pybind11_POPULATED)
message(STATUS "Downloading pybind11 from ${pybind11_URL}")
FetchContent_Populate(pybind11)
endif()
message(STATUS "pybind11 is downloaded to ${pybind11_SOURCE_DIR}")
add_subdirectory(${pybind11_SOURCE_DIR} ${pybind11_BINARY_DIR} EXCLUDE_FROM_ALL)
endfunction()
download_pybind11()

View File

@ -0,0 +1,25 @@
# Note: If you use Python, then the prefix might not be correct.
#
# You need to either manually modify this file to change the prefix to the location
# where this sherpa-onnx.pc file actually resides
# or
# you can use
#
# pkg-config --define-variable=prefix=/path/to/the/dir/containing/this/file --cflags sherpa-onnx
prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
includedir="${prefix}/include"
libdir="${exec_prefix}/lib"
Name: sherpa-onnx
Description: pkg-config for sherpa-onnx
URL: https://github.com/k2-fsa/sherpa-onnx
Version: @SHERPA_ONNX_VERSION@
Cflags: -I"${includedir}"
# Note: -lcargs is required only for the following file
# https://github.com/k2-fsa/sherpa-onnx/blob/master/c-api-examples/decode-file-c-api.c
# We add it here so that users don't need to specify -lcargs when compiling decode-file-c-api.c
Libs: -L"${libdir}" -lsherpa-onnx-c-api -lonnxruntime -Wl,-rpath,${libdir} @SHERPA_ONNX_PKG_WITH_CARGS@ @SHERPA_ONNX_PKG_CONFIG_EXTRA_LIBS@

View File

@ -0,0 +1,25 @@
# Note: If you use Python, then the prefix might not be correct.
#
# You need to either manually modify this file to change the prefix to the location
# where this sherpa-onnx.pc file actually resides
# or
# you can use
#
# pkg-config --define-variable=prefix=/path/to/the/dir/containing/this/file --cflags sherpa-onnx
prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
includedir="${prefix}/include"
libdir="${exec_prefix}/lib"
Name: sherpa-onnx
Description: pkg-config for sherpa-onnx with TTS support
URL: https://github.com/k2-fsa/sherpa-onnx
Version: @SHERPA_ONNX_VERSION@
Cflags: -I"${includedir}"
# Note: -lcargs is required only for the following file
# https://github.com/k2-fsa/sherpa-onnx/blob/master/c-api-examples/decode-file-c-api.c
# We add it here so that users don't need to specify -lcargs when compiling decode-file-c-api.c
Libs: -L"${libdir}" -lsherpa-onnx-c-api -lsherpa-onnx-core -lkaldi-decoder-core -lsherpa-onnx-kaldifst-core -lsherpa-onnx-fst -lkaldi-native-fbank-core -lonnxruntime -lssentencepiece_core -Wl,-rpath,${libdir} @SHERPA_ONNX_PKG_WITH_CARGS@ @SHERPA_ONNX_PKG_CONFIG_EXTRA_LIBS@

View File

@ -0,0 +1,25 @@
# Note: If you use Python, then the prefix might not be correct.
#
# You need to either manually modify this file to change the prefix to the location
# where this sherpa-onnx.pc file actually resides
# or
# you can use
#
# pkg-config --define-variable=prefix=/path/to/the/dir/containing/this/file --cflags sherpa-onnx
prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
includedir="${prefix}/include"
libdir="${exec_prefix}/lib"
Name: sherpa-onnx
Description: pkg-config for sherpa-onnx
URL: https://github.com/k2-fsa/sherpa-onnx
Version: @SHERPA_ONNX_VERSION@
Cflags: -I"${includedir}"
# Note: -lcargs is required only for the following file
# https://github.com/k2-fsa/sherpa-onnx/blob/master/c-api-examples/decode-file-c-api.c
# We add it here so that users don't need to specify -lcargs when compiling decode-file-c-api.c
Libs: -L"${libdir}" -lsherpa-onnx-c-api -lsherpa-onnx-core -lkaldi-decoder-core -lsherpa-onnx-kaldifst-core -lsherpa-onnx-fstfar -lsherpa-onnx-fst -lkaldi-native-fbank-core -lpiper_phonemize -lespeak-ng -lucd -lonnxruntime -lssentencepiece_core -Wl,-rpath,${libdir} @SHERPA_ONNX_PKG_WITH_CARGS@ @SHERPA_ONNX_PKG_CONFIG_EXTRA_LIBS@

View File

@ -0,0 +1,73 @@
function(download_simple_sentencepiece)
include(FetchContent)
set(simple-sentencepiece_URL "https://github.com/pkufool/simple-sentencepiece/archive/refs/tags/v0.7.tar.gz")
set(simple-sentencepiece_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/simple-sentencepiece-0.7.tar.gz")
set(simple-sentencepiece_HASH "SHA256=1748a822060a35baa9f6609f84efc8eb54dc0e74b9ece3d82367b7119fdc75af")
# If you don't have access to the Internet,
# please pre-download simple-sentencepiece
set(possible_file_locations
$ENV{HOME}/Downloads/simple-sentencepiece-0.7.tar.gz
${CMAKE_SOURCE_DIR}/simple-sentencepiece-0.7.tar.gz
${CMAKE_BINARY_DIR}/simple-sentencepiece-0.7.tar.gz
/tmp/simple-sentencepiece-0.7.tar.gz
/star-fj/fangjun/download/github/simple-sentencepiece-0.7.tar.gz
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(simple-sentencepiece_URL "${f}")
file(TO_CMAKE_PATH "${simple-sentencepiece_URL}" simple-sentencepiece_URL)
message(STATUS "Found local downloaded simple-sentencepiece: ${simple-sentencepiece_URL}")
set(simple-sentencepiece_URL2)
break()
endif()
endforeach()
set(SBPE_ENABLE_TESTS OFF CACHE BOOL "" FORCE)
set(SBPE_BUILD_PYTHON OFF CACHE BOOL "" FORCE)
FetchContent_Declare(simple-sentencepiece
URL
${simple-sentencepiece_URL}
${simple-sentencepiece_URL2}
URL_HASH
${simple-sentencepiece_HASH}
)
FetchContent_GetProperties(simple-sentencepiece)
if(NOT simple-sentencepiece_POPULATED)
message(STATUS "Downloading simple-sentencepiece ${simple-sentencepiece_URL}")
FetchContent_Populate(simple-sentencepiece)
endif()
message(STATUS "simple-sentencepiece is downloaded to ${simple-sentencepiece_SOURCE_DIR}")
if(BUILD_SHARED_LIBS)
set(_build_shared_libs_bak ${BUILD_SHARED_LIBS})
set(BUILD_SHARED_LIBS OFF)
endif()
add_subdirectory(${simple-sentencepiece_SOURCE_DIR} ${simple-sentencepiece_BINARY_DIR} EXCLUDE_FROM_ALL)
if(_build_shared_libs_bak)
set_target_properties(ssentencepiece_core
PROPERTIES
POSITION_INDEPENDENT_CODE ON
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
set(BUILD_SHARED_LIBS ON)
endif()
target_include_directories(ssentencepiece_core
PUBLIC
${simple-sentencepiece_SOURCE_DIR}/
)
if(NOT BUILD_SHARED_LIBS)
install(TARGETS ssentencepiece_core DESTINATION lib)
endif()
endfunction()
download_simple_sentencepiece()

View File

@ -0,0 +1,46 @@
function(download_websocketpp)
include(FetchContent)
# The latest commit on the develop branch os as 2022-10-22
set(websocketpp_URL "https://github.com/zaphoyd/websocketpp/archive/b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip")
set(websocketpp_URL2 "https://hf-mirror.com/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip")
set(websocketpp_HASH "SHA256=1385135ede8191a7fbef9ec8099e3c5a673d48df0c143958216cd1690567f583")
# If you don't have access to the Internet,
# please pre-download websocketpp
set(possible_file_locations
$ENV{HOME}/Downloads/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip
${CMAKE_SOURCE_DIR}/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip
${CMAKE_BINARY_DIR}/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip
/tmp/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip
/star-fj/fangjun/download/github/websocketpp-b9aeec6eaf3d5610503439b4fae3581d9aff08e8.zip
)
foreach(f IN LISTS possible_file_locations)
if(EXISTS ${f})
set(websocketpp_URL "${f}")
file(TO_CMAKE_PATH "${websocketpp_URL}" websocketpp_URL)
message(STATUS "Found local downloaded websocketpp: ${websocketpp_URL}")
set(websocketpp_URL2)
break()
endif()
endforeach()
FetchContent_Declare(websocketpp
URL
${websocketpp_URL}
${websocketpp_URL2}
URL_HASH ${websocketpp_HASH}
)
FetchContent_GetProperties(websocketpp)
if(NOT websocketpp_POPULATED)
message(STATUS "Downloading websocketpp from ${websocketpp_URL}")
FetchContent_Populate(websocketpp)
endif()
message(STATUS "websocketpp is downloaded to ${websocketpp_SOURCE_DIR}")
# add_subdirectory(${websocketpp_SOURCE_DIR} ${websocketpp_BINARY_DIR} EXCLUDE_FROM_ALL)
include_directories(${websocketpp_SOURCE_DIR})
endfunction()
download_websocketpp()

View File

@ -0,0 +1,39 @@
include_directories(${CMAKE_SOURCE_DIR})
add_executable(streaming-zipformer-cxx-api ./streaming-zipformer-cxx-api.cc)
target_link_libraries(streaming-zipformer-cxx-api sherpa-mnn-cxx-api)
add_executable(speech-enhancement-gtcrn-cxx-api ./speech-enhancement-gtcrn-cxx-api.cc)
target_link_libraries(speech-enhancement-gtcrn-cxx-api sherpa-mnn-cxx-api)
add_executable(kws-cxx-api ./kws-cxx-api.cc)
target_link_libraries(kws-cxx-api sherpa-mnn-cxx-api)
add_executable(streaming-zipformer-rtf-cxx-api ./streaming-zipformer-rtf-cxx-api.cc)
target_link_libraries(streaming-zipformer-rtf-cxx-api sherpa-mnn-cxx-api)
add_executable(whisper-cxx-api ./whisper-cxx-api.cc)
target_link_libraries(whisper-cxx-api sherpa-mnn-cxx-api)
add_executable(fire-red-asr-cxx-api ./fire-red-asr-cxx-api.cc)
target_link_libraries(fire-red-asr-cxx-api sherpa-mnn-cxx-api)
add_executable(moonshine-cxx-api ./moonshine-cxx-api.cc)
target_link_libraries(moonshine-cxx-api sherpa-mnn-cxx-api)
add_executable(sense-voice-cxx-api ./sense-voice-cxx-api.cc)
target_link_libraries(sense-voice-cxx-api sherpa-mnn-cxx-api)
if(SHERPA_MNN_ENABLE_TTS)
add_executable(matcha-tts-zh-cxx-api ./matcha-tts-zh-cxx-api.cc)
target_link_libraries(matcha-tts-zh-cxx-api sherpa-mnn-cxx-api)
add_executable(matcha-tts-en-cxx-api ./matcha-tts-en-cxx-api.cc)
target_link_libraries(matcha-tts-en-cxx-api sherpa-mnn-cxx-api)
add_executable(kokoro-tts-en-cxx-api ./kokoro-tts-en-cxx-api.cc)
target_link_libraries(kokoro-tts-en-cxx-api sherpa-mnn-cxx-api)
add_executable(kokoro-tts-zh-en-cxx-api ./kokoro-tts-zh-en-cxx-api.cc)
target_link_libraries(kokoro-tts-zh-en-cxx-api sherpa-mnn-cxx-api)
endif()

View File

@ -0,0 +1,77 @@
// cxx-api-examples/fire-red-asr-cxx-api.cc
// Copyright (c) 2025 Xiaomi Corporation
//
// This file demonstrates how to use FireRedAsr AED with sherpa-onnx's C++ API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
// tar xvf sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
// rm sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16.tar.bz2
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineRecognizerConfig config;
config.model_config.fire_red_asr.encoder =
"./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx";
config.model_config.fire_red_asr.decoder =
"./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt";
config.model_config.num_threads = 1;
std::cout << "Loading model\n";
OfflineRecognizer recongizer = OfflineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename =
"./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
const auto begin = std::chrono::steady_clock::now();
OfflineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
recongizer.Decode(&stream);
OfflineRecognizerResult result = recongizer.GetResult(&stream);
const auto end = std::chrono::steady_clock::now();
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
return 0;
}

View File

@ -0,0 +1,73 @@
// cxx-api-examples/kokoro-tts-en-cxx-api.c
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx CXX API
// for English TTS with Kokoro.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-en-v0_19.tar.bz2
tar xf kokoro-en-v0_19.tar.bz2
rm kokoro-en-v0_19.tar.bz2
./kokoro-tts-en-cxx-api
*/
// clang-format on
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress, void *arg) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineTtsConfig config;
config.model.kokoro.model = "./kokoro-en-v0_19/model.onnx";
config.model.kokoro.voices = "./kokoro-en-v0_19/voices.bin";
config.model.kokoro.tokens = "./kokoro-en-v0_19/tokens.txt";
config.model.kokoro.data_dir = "./kokoro-en-v0_19/espeak-ng-data";
config.model.num_threads = 2;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
std::string filename = "./generated-kokoro-en-cxx.wav";
std::string text =
"Today as always, men fall into two groups: slaves and free men. Whoever "
"does not have two-thirds of his day for himself, is a slave, whatever "
"he may be: a statesman, a businessman, an official, or a scholar. "
"Friends fell out often because life was changing so fast. The easiest "
"thing in the world was to lose touch with someone.";
auto tts = OfflineTts::Create(config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif
WriteWave(filename, {audio.samples, audio.sample_rate});
fprintf(stderr, "Input text is: %s\n", text.c_str());
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename.c_str());
return 0;
}

View File

@ -0,0 +1,74 @@
// cxx-api-examples/kokoro-tts-zh-en-cxx-api.cc
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx CXX API
// for Chinese + English TTS with Kokoro.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-multi-lang-v1_0.tar.bz2
tar xf kokoro-multi-lang-v1_0.tar.bz2
rm kokoro-multi-lang-v1_0.tar.bz2
./kokoro-tts-zh-en-cxx-api
*/
// clang-format on
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress, void *arg) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineTtsConfig config;
config.model.kokoro.model = "./kokoro-multi-lang-v1_0/model.onnx";
config.model.kokoro.voices = "./kokoro-multi-lang-v1_0/voices.bin";
config.model.kokoro.tokens = "./kokoro-multi-lang-v1_0/tokens.txt";
config.model.kokoro.data_dir = "./kokoro-multi-lang-v1_0/espeak-ng-data";
config.model.kokoro.dict_dir = "./kokoro-multi-lang-v1_0/dict";
config.model.kokoro.lexicon =
"./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/"
"lexicon-zh.txt";
config.model.num_threads = 2;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
std::string filename = "./generated-kokoro-zh-en-cxx.wav";
std::string text =
"中英文语音合成测试。This is generated by next generation Kaldi using "
"Kokoro without Misaki. 你觉得中英文说的如何呢?";
auto tts = OfflineTts::Create(config);
int32_t sid = 50;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif
WriteWave(filename, {audio.samples, audio.sample_rate});
fprintf(stderr, "Input text is: %s\n", text.c_str());
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename.c_str());
return 0;
}

View File

@ -0,0 +1,143 @@
// cxx-api-examples/kws-cxx-api.cc
//
// Copyright (c) 2025 Xiaomi Corporation
//
// This file demonstrates how to use keywords spotter with sherpa-onnx's C
// clang-format off
//
// Usage
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
// rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile.tar.bz2
//
// ./kws-cxx-api
//
// clang-format on
#include <array>
#include <iostream>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
KeywordSpotterConfig config;
config.model_config.transducer.encoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
config.model_config.transducer.decoder =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"decoder-epoch-12-avg-2-chunk-16-left-64.onnx";
config.model_config.transducer.joiner =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"tokens.txt";
config.model_config.provider = "cpu";
config.model_config.num_threads = 1;
config.model_config.debug = 1;
config.keywords_file =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"test_wavs/test_keywords.txt";
KeywordSpotter kws = KeywordSpotter::Create(config);
if (!kws.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout
<< "--Test pre-defined keywords from test_wavs/test_keywords.txt--\n";
std::string wave_filename =
"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01-mobile/"
"test_wavs/3.wav";
std::array<float, 8000> tail_paddings = {0}; // 0.5 seconds
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
OnlineStream stream = kws.CreateStream();
if (!stream.Get()) {
std::cerr << "Failed to create stream\n";
return -1;
}
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
stream.AcceptWaveform(wave.sample_rate, tail_paddings.data(),
tail_paddings.size());
stream.InputFinished();
while (kws.IsReady(&stream)) {
kws.Decode(&stream);
auto r = kws.GetResult(&stream);
if (!r.keyword.empty()) {
std::cout << "Detected keyword: " << r.json << "\n";
// Remember to reset the keyword stream right after a keyword is detected
kws.Reset(&stream);
}
}
// --------------------------------------------------------------------------
std::cout << "--Use pre-defined keywords + add a new keyword--\n";
stream = kws.CreateStream("y ǎn y uán @演员");
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
stream.AcceptWaveform(wave.sample_rate, tail_paddings.data(),
tail_paddings.size());
stream.InputFinished();
while (kws.IsReady(&stream)) {
kws.Decode(&stream);
auto r = kws.GetResult(&stream);
if (!r.keyword.empty()) {
std::cout << "Detected keyword: " << r.json << "\n";
// Remember to reset the keyword stream right after a keyword is detected
kws.Reset(&stream);
}
}
// --------------------------------------------------------------------------
std::cout << "--Use pre-defined keywords + add two new keywords--\n";
stream = kws.CreateStream("y ǎn y uán @演员/zh ī m íng @知名");
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
stream.AcceptWaveform(wave.sample_rate, tail_paddings.data(),
tail_paddings.size());
stream.InputFinished();
while (kws.IsReady(&stream)) {
kws.Decode(&stream);
auto r = kws.GetResult(&stream);
if (!r.keyword.empty()) {
std::cout << "Detected keyword: " << r.json << "\n";
// Remember to reset the keyword stream right after a keyword is detected
kws.Reset(&stream);
}
}
return 0;
}

View File

@ -0,0 +1,80 @@
// cxx-api-examples/matcha-tts-en-cxx-api.cc
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx CXX API
// for Chinese TTS with MatchaTTS.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2
tar xvf matcha-icefall-en_US-ljspeech.tar.bz2
rm matcha-icefall-en_US-ljspeech.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
./matcha-tts-en-cxx-api
*/
// clang-format on
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress, void *arg) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineTtsConfig config;
config.model.matcha.acoustic_model =
"./matcha-icefall-en_US-ljspeech/model-steps-3.onnx";
config.model.matcha.vocoder = "./hifigan_v2.onnx";
config.model.matcha.tokens = "./matcha-icefall-en_US-ljspeech/tokens.txt";
config.model.matcha.data_dir =
"./matcha-icefall-en_US-ljspeech/espeak-ng-data";
config.model.num_threads = 1;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
std::string filename = "./generated-matcha-en-cxx.wav";
std::string text =
"Today as always, men fall into two groups: slaves and free men. Whoever "
"does not have two-thirds of his day for himself, is a slave, whatever "
"he may be: a statesman, a businessman, an official, or a scholar. "
"Friends fell out often because life was changing so fast. The easiest "
"thing in the world was to lose touch with someone.";
auto tts = OfflineTts::Create(config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif
WriteWave(filename, {audio.samples, audio.sample_rate});
fprintf(stderr, "Input text is: %s\n", text.c_str());
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename.c_str());
return 0;
}

View File

@ -0,0 +1,79 @@
// cxx-api-examples/matcha-tts-zh-cxx-api.cc
//
// Copyright (c) 2025 Xiaomi Corporation
// This file shows how to use sherpa-onnx CXX API
// for Chinese TTS with MatchaTTS.
//
// clang-format off
/*
Usage
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2
tar xvf matcha-icefall-zh-baker.tar.bz2
rm matcha-icefall-zh-baker.tar.bz2
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx
./matcha-tts-zh-cxx-api
*/
// clang-format on
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
static int32_t ProgressCallback(const float *samples, int32_t num_samples,
float progress, void *arg) {
fprintf(stderr, "Progress: %.3f%%\n", progress * 100);
// return 1 to continue generating
// return 0 to stop generating
return 1;
}
int32_t main(int32_t argc, char *argv[]) {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineTtsConfig config;
config.model.matcha.acoustic_model =
"./matcha-icefall-zh-baker/model-steps-3.onnx";
config.model.matcha.vocoder = "./hifigan_v2.onnx";
config.model.matcha.lexicon = "./matcha-icefall-zh-baker/lexicon.txt";
config.model.matcha.tokens = "./matcha-icefall-zh-baker/tokens.txt";
config.model.matcha.dict_dir = "./matcha-icefall-zh-baker/dict";
config.model.num_threads = 1;
// If you don't want to see debug messages, please set it to 0
config.model.debug = 1;
// clang-format off
config.rule_fsts = "./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst"; // NOLINT
// clang-format on
std::string filename = "./generated-matcha-zh-cxx.wav";
std::string text =
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如"
"涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感"
"受着生命的奇迹与温柔."
"某某银行的副行长和一些行政领导表示,他们去过长江和长白山; "
"经济不断增长。2024年12月31号拨打110或者18920240511。123456块钱。";
auto tts = OfflineTts::Create(config);
int32_t sid = 0;
float speed = 1.0; // larger -> faster in speech speed
#if 0
// If you don't want to use a callback, then please enable this branch
GeneratedAudio audio = tts.Generate(text, sid, speed);
#else
GeneratedAudio audio = tts.Generate(text, sid, speed, ProgressCallback);
#endif
WriteWave(filename, {audio.samples, audio.sample_rate});
fprintf(stderr, "Input text is: %s\n", text.c_str());
fprintf(stderr, "Speaker ID is is: %d\n", sid);
fprintf(stderr, "Saved to: %s\n", filename.c_str());
return 0;
}

View File

@ -0,0 +1,81 @@
// cxx-api-examples/moonshine-cxx-api.cc
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use Moonshine with sherpa-onnx's C++ API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// tar xvf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
// rm sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineRecognizerConfig config;
config.model_config.moonshine.preprocessor =
"./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx";
config.model_config.moonshine.encoder =
"./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx";
config.model_config.moonshine.uncached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx";
config.model_config.moonshine.cached_decoder =
"./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt";
config.model_config.num_threads = 1;
std::cout << "Loading model\n";
OfflineRecognizer recongizer = OfflineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename =
"./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
const auto begin = std::chrono::steady_clock::now();
OfflineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
recongizer.Decode(&stream);
OfflineRecognizerResult result = recongizer.GetResult(&stream);
const auto end = std::chrono::steady_clock::now();
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
return 0;
}

View File

@ -0,0 +1,78 @@
// cxx-api-examples/sense-voice-cxx-api.cc
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use sense voice with sherpa-onnx's C++ API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// tar xvf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
// rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineRecognizerConfig config;
config.model_config.sense_voice.model =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx";
config.model_config.sense_voice.use_itn = true;
config.model_config.sense_voice.language = "auto";
config.model_config.tokens =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt";
config.model_config.num_threads = 1;
std::cout << "Loading model\n";
OfflineRecognizer recongizer = OfflineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename =
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/en.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
const auto begin = std::chrono::steady_clock::now();
OfflineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
recongizer.Decode(&stream);
OfflineRecognizerResult result = recongizer.GetResult(&stream);
const auto end = std::chrono::steady_clock::now();
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
return 0;
}

View File

@ -0,0 +1,65 @@
// cxx-api-examples/speech-enhancement-gtcrn-cxx-api.cc
//
// Copyright (c) 2025 Xiaomi Corporation
//
// We assume you have pre-downloaded model
// from
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/speech-enhancement-models
//
//
// An example command to download
// clang-format off
/*
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/gtcrn_simple.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speech-enhancement-models/inp_16k.wav
*/
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineSpeechDenoiserConfig config;
std::string wav_filename = "./inp_16k.wav";
std::string out_wave_filename = "./enhanced_16k.wav";
config.model.gtcrn.model = "./gtcrn_simple.onnx";
auto sd = OfflineSpeechDenoiser::Create(config);
if (!sd.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
Wave wave = ReadWave(wav_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wav_filename << "'\n";
return -1;
}
std::cout << "Started\n";
const auto begin = std::chrono::steady_clock::now();
auto denoised =
sd.Run(wave.samples.data(), wave.samples.size(), wave.sample_rate);
const auto end = std::chrono::steady_clock::now();
std::cout << "Done\n";
WriteWave(out_wave_filename, {denoised.samples, denoised.sample_rate});
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "Saved to " << out_wave_filename << "\n";
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
}

View File

@ -0,0 +1,93 @@
// cxx-api-examples/streaming-zipformer-cxx-api.cc
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use streaming Zipformer
// with sherpa-onnx's C++ API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
// tar xvf sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
// rm sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OnlineRecognizerConfig config;
// please see
// https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#csukuangfj-sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-bilingual-chinese-english
config.model_config.transducer.encoder =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"encoder-epoch-99-avg-1.int8.onnx";
// Note: We recommend not using int8.onnx for the decoder.
config.model_config.transducer.decoder =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"decoder-epoch-99-avg-1.onnx";
config.model_config.transducer.joiner =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"joiner-epoch-99-avg-1.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt";
config.model_config.num_threads = 1;
std::cout << "Loading model\n";
OnlineRecognizer recongizer = OnlineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/"
"0.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
const auto begin = std::chrono::steady_clock::now();
OnlineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
stream.InputFinished();
while (recongizer.IsReady(&stream)) {
recongizer.Decode(&stream);
}
OnlineRecognizerResult result = recongizer.GetResult(&stream);
const auto end = std::chrono::steady_clock::now();
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
return 0;
}

View File

@ -0,0 +1,132 @@
// cxx-api-examples/streaming-zipformer-rtf-cxx-api.cc
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use streaming Zipformer
// with sherpa-onnx's C++ API.
//
// clang-format off
//
// cd /path/sherpa-onnx/
// mkdir build
// cd build
// cmake ..
// make
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
// tar xvf sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
// rm sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
//
// # 1. Test on CPU, run once
//
// ./bin/streaming-zipformer-rtf-cxx-api
//
// # 2. Test on CPU, run 10 times
//
// ./bin/streaming-zipformer-rtf-cxx-api 10
//
// # 3. Test on GPU, run 10 times
//
// ./bin/streaming-zipformer-rtf-cxx-api 10 cuda
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main(int argc, char *argv[]) {
int32_t num_runs = 1;
if (argc >= 2) {
num_runs = atoi(argv[1]);
if (num_runs < 0) {
num_runs = 1;
}
}
bool use_gpu = (argc == 3);
using namespace sherpa_mnn::cxx; // NOLINT
OnlineRecognizerConfig config;
// please see
// https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#csukuangfj-sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-bilingual-chinese-english
config.model_config.transducer.encoder =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"encoder-epoch-99-avg-1.int8.onnx";
// Note: We recommend not using int8.onnx for the decoder.
config.model_config.transducer.decoder =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"decoder-epoch-99-avg-1.onnx";
config.model_config.transducer.joiner =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/"
"joiner-epoch-99-avg-1.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt";
config.model_config.num_threads = 1;
config.model_config.provider = use_gpu ? "cuda" : "cpu";
std::cout << "Loading model\n";
OnlineRecognizer recongizer = OnlineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename =
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/"
"0.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
float total_elapsed_seconds = 0;
OnlineRecognizerResult result;
for (int32_t i = 0; i < num_runs; ++i) {
const auto begin = std::chrono::steady_clock::now();
OnlineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
stream.InputFinished();
while (recongizer.IsReady(&stream)) {
recongizer.Decode(&stream);
}
result = recongizer.GetResult(&stream);
auto end = std::chrono::steady_clock::now();
float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
printf("Run %d/%d, elapsed seconds: %.3f\n", i, num_runs, elapsed_seconds);
total_elapsed_seconds += elapsed_seconds;
}
float average_elapsed_secodns = total_elapsed_seconds / num_runs;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = total_elapsed_seconds / num_runs / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Total Elapsed seconds: %.3fs\n", total_elapsed_seconds);
printf("Num runs: %d\n", num_runs);
printf("Elapsed seconds per run: %.3f/%d=%.3f\n", total_elapsed_seconds,
num_runs, average_elapsed_secodns);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n",
average_elapsed_secodns, duration, rtf);
return 0;
}

View File

@ -0,0 +1,76 @@
// cxx-api-examples/whisper-cxx-api.cc
// Copyright (c) 2024 Xiaomi Corporation
//
// This file demonstrates how to use whisper with sherpa-onnx's C++ API.
//
// clang-format off
//
// wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
// tar xvf sherpa-onnx-whisper-tiny.en.tar.bz2
// rm sherpa-onnx-whisper-tiny.en.tar.bz2
//
// clang-format on
#include <chrono> // NOLINT
#include <iostream>
#include <string>
#include "sherpa-mnn/c-api/cxx-api.h"
int32_t main() {
using namespace sherpa_mnn::cxx; // NOLINT
OfflineRecognizerConfig config;
config.model_config.whisper.encoder =
"./sherpa-onnx-whisper-tiny.en/tiny.en-encoder.int8.onnx";
config.model_config.whisper.decoder =
"./sherpa-onnx-whisper-tiny.en/tiny.en-decoder.int8.onnx";
config.model_config.tokens =
"./sherpa-onnx-whisper-tiny.en/tiny.en-tokens.txt";
config.model_config.num_threads = 1;
std::cout << "Loading model\n";
OfflineRecognizer recongizer = OfflineRecognizer::Create(config);
if (!recongizer.Get()) {
std::cerr << "Please check your config\n";
return -1;
}
std::cout << "Loading model done\n";
std::string wave_filename = "./sherpa-onnx-whisper-tiny.en/test_wavs/0.wav";
Wave wave = ReadWave(wave_filename);
if (wave.samples.empty()) {
std::cerr << "Failed to read: '" << wave_filename << "'\n";
return -1;
}
std::cout << "Start recognition\n";
const auto begin = std::chrono::steady_clock::now();
OfflineStream stream = recongizer.CreateStream();
stream.AcceptWaveform(wave.sample_rate, wave.samples.data(),
wave.samples.size());
recongizer.Decode(&stream);
OfflineRecognizerResult result = recongizer.GetResult(&stream);
const auto end = std::chrono::steady_clock::now();
const float elapsed_seconds =
std::chrono::duration_cast<std::chrono::milliseconds>(end - begin)
.count() /
1000.;
float duration = wave.samples.size() / static_cast<float>(wave.sample_rate);
float rtf = elapsed_seconds / duration;
std::cout << "text: " << result.text << "\n";
printf("Number of threads: %d\n", config.model_config.num_threads);
printf("Duration: %.3fs\n", duration);
printf("Elapsed seconds: %.3fs\n", elapsed_seconds);
printf("(Real time factor) RTF = %.3f / %.3f = %.3f\n", elapsed_seconds,
duration, rtf);
return 0;
}

View File

@ -0,0 +1,3 @@
hs_err*
vits-zh-aishell3
*.jar

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/AudioTagging.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/FeatureConfig.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OfflinePunctuation.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OfflineRecognizer.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OfflineSpeakerDiarization.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OfflineStream.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OnlinePunctuation.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OnlineRecognizer.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/OnlineStream.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/Speaker.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/SpeakerEmbeddingExtractorConfig.kt

View File

@ -0,0 +1 @@
../sherpa-mnn/kotlin-api/SpokenLanguageIdentification.kt

Some files were not shown because too many files have changed in this diff Show More