AI Explorer

Speech

Speech recognition, text-to-speech, and audio AI projects.

Data source

Category lists combine GitHub search queries, repository topics, descriptions, and sync snapshots.

Ranking logic

Projects are filtered for category relevance, then ordered by stars and quality signals.

Best for

Use category pages when you already know the AI workflow or tool type you want to evaluate.

39 projects

#1
whisperopenai/whisper45

Robust Speech Recognition via Large-Scale Weak Supervision

Speech
Stars
99,785
Growth
-
Language
Python
Created
2022-09-16
#2
unslothunslothai/unsloth45

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Speech
Stars
64,731
Growth
-
Language
Python
Created
2023-11-29
#3
GPT-SoVITSRVC-Boss/GPT-SoVITS43

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Speech
Stars
57,598
Growth
-
Language
Python
Created
2024-01-14
#4
whisper.cppggml-org/whisper.cpp44

Port of OpenAI's Whisper model in C/C++

SpeechInfra
Stars
49,893
Growth
-
Language
C++
Created
2022-09-25
#5
LocalAImudler/LocalAI45

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

ImageSpeech
Stars
46,362
Growth
-
Language
Go
Created
2023-03-18
#6
TTScoqui-ai/TTS38

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Speech
Stars
45,334
Growth
-
Language
Python
Created
2020-05-20
#7
ChatTTS2noise/ChatTTS37

A generative speech model for daily dialogue.

SpeechInfra
Stars
39,289
Growth
-
Language
Python
Created
2024-05-27
#8
MockingBirdbabysor/MockingBird33

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Speech
Stars
36,902
Growth
-
Language
Python
Created
2021-08-07
#9
OpenVoicemyshell-ai/OpenVoice38

Instant voice cloning by MIT and MyShell. Audio foundation model.

Speech
Stars
36,535
Growth
-
Language
Python
Created
2023-11-29
#10
khojkhoj-ai/khoj34

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

ImageRAGSearch
Stars
34,618
Growth
-
Language
Python
Created
2021-08-16
#11
voiceboxjamiepine/voicebox40

The open-source AI voice studio. Clone, dictate, create.

Speech
Stars
26,978
Growth
-
Language
TypeScript
Created
2026-01-25
#12
free-claude-codeAlishahryar1/free-claude-code81

Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)

CodingSpeech
Stars
26,387
Growth
-
Language
Python
Created
2026-01-28
#13
faster-whisperSYSTRAN/faster-whisper35

Faster Whisper transcription with CTranslate2

SpeechInfra
Stars
23,001
Growth
-
Language
Python
Created
2023-02-11
#14
whisperXm-bain/whisperX37

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Speech
Stars
21,979
Growth
-
Language
Python
Created
2022-12-09
#15
CosyVoiceFunAudioLLM/CosyVoice38

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Speech
Stars
21,126
Growth
-
Language
Python
Created
2024-07-03
#16
index-ttsindex-tts/index-tts30

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Speech
Stars
20,623
Growth
-
Language
Python
Created
2025-02-06
#17
buzzchidiwilliams/buzz42

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Speech
Stars
19,305
Growth
-
Language
Python
Created
2022-09-24
#18
dianari-labs/dia36

A TTS model capable of generating ultra-realistic dialogue in one pass.

Speech
Stars
19,293
Growth
-
Language
Python
Created
2025-04-19
#19
VoxCPMOpenBMB/VoxCPM41

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Speech
Stars
19,222
Growth
-
Language
Python
Created
2025-09-16
#20
Pixelle-VideoAIDC-AI/Pixelle-Video73

🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine

ImageVideoSpeech
Stars
18,427
Growth
-
Language
Python
Created
2025-11-07
#21
pyvideotransjianchang512/pyvideotrans38

Translate the video from one language to another and embed dubbing & subtitles.

Speech
Stars
17,455
Growth
-
Language
Python
Created
2023-10-02
#22
leonleon-ai/leon42

🧠 Leon is your open-source personal assistant.

AgentsSpeechAutomation
Stars
17,243
Growth
-
Language
TypeScript
Created
2019-02-10
#23
NeMoNVIDIA-NeMo/NeMo43

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Speech
Stars
17,237
Growth
-
Language
Python
Created
2019-08-05
#24
FunASRmodelscope/FunASR42

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Speech
Stars
16,136
Growth
-
Language
Python
Created
2022-11-24
#25
vosk-apialphacep/vosk-api33

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Speech
Stars
14,739
Growth
-
Language
Jupyter Notebook
Created
2019-09-03
#26
PaddleSpeechPaddlePaddle/PaddleSpeech39

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Speech
Stars
12,601
Growth
-
Language
Python
Created
2017-11-14
#27
sherpa-onnxk2-fsa/sherpa-onnx38

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

Speech
Stars
12,341
Growth
-
Language
C++
Created
2022-09-01
#28
meetilyZackriya-Solutions/meetily40

Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows.

Speech
Stars
12,161
Growth
-
Language
Rust
Created
2024-12-26
#29
edge-ttsrany2/edge-tts30

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Speech
Stars
10,975
Growth
-
Language
Python
Created
2021-05-10
#30
piperrhasspy/piper32

A fast, local neural text to speech system

Speech
Stars
10,969
Growth
-
Language
C++
Created
2023-01-10
Scroll to load more