AI Explorer

Speech

Speech recognition, text-to-speech, and audio AI projects.

Data source

Category lists combine GitHub search queries, repository topics, descriptions, and sync snapshots.

Ranking logic

Projects are filtered for category relevance, then ordered by stars and quality signals.

Best for

Use category pages when you already know the AI workflow or tool type you want to evaluate.

48 projects

#1
whisperopenai/whisper43

Robust Speech Recognition via Large-Scale Weak Supervision

Speech
Stars
103,797
Growth
-
Language
Python
Created
2022-09-16
#2
unslothunslothai/unsloth45

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Speech
Stars
67,544
Growth
-
Language
Python
Created
2023-11-29
#3
GPT-SoVITSRVC-Boss/GPT-SoVITS45

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Speech
Stars
59,130
Growth
-
Language
Python
Created
2024-01-14
#4
whisper.cppggml-org/whisper.cpp44

Port of OpenAI's Whisper model in C/C++

SpeechInfra
Stars
51,118
Growth
-
Language
C++
Created
2022-09-25
#5
VibeVoicemicrosoft/VibeVoice72

Open-Source Frontier Voice AI

Speech
Stars
48,612
Growth
-
Language
Python
Created
2025-08-25
#6
LocalAImudler/LocalAI44

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

ImageSpeech
Stars
47,211
Growth
-
Language
Go
Created
2023-03-18
#7
TTScoqui-ai/TTS38

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Speech
Stars
45,641
Growth
-
Language
Python
Created
2020-05-20
#8
airimoeru-ai/airi65

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

Speech
Stars
40,427
Growth
-
Language
TypeScript
Created
2024-12-01
#9
ChatTTS2noise/ChatTTS36

A generative speech model for daily dialogue.

SpeechInfra
Stars
39,520
Growth
-
Language
Python
Created
2024-05-27
#10
MockingBirdbabysor/MockingBird33

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Speech
Stars
36,908
Growth
-
Language
Python
Created
2021-08-07
#11
OpenVoicemyshell-ai/OpenVoice38

Instant voice cloning by MIT and MyShell. Audio foundation model.

Speech
Stars
36,804
Growth
-
Language
Python
Created
2023-11-29
#12
khojkhoj-ai/khoj39

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

ImageRAGSearch
Stars
35,380
Growth
-
Language
Python
Created
2021-08-16
#13
VoxCPMOpenBMB/VoxCPM42

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Speech
Stars
32,024
Growth
-
Language
Python
Created
2025-09-16
#14
free-claude-codeAlishahryar1/free-claude-code82

Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)

CodingSpeech
Stars
30,078
Growth
-
Language
Python
Created
2026-01-28
#15
faster-whisperSYSTRAN/faster-whisper35

Faster Whisper transcription with CTranslate2

SpeechInfra
Stars
23,905
Growth
-
Language
Python
Created
2023-02-11
#16
Pixelle-VideoAIDC-AI/Pixelle-Video42

🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine

ImageVideoSpeech
Stars
23,760
Growth
-
Language
Python
Created
2025-11-07
#17
whisperXm-bain/whisperX42

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Speech
Stars
22,761
Growth
-
Language
Python
Created
2022-12-09
#18
CosyVoiceFunAudioLLM/CosyVoice36

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Speech
Stars
21,876
Growth
-
Language
Python
Created
2024-07-03
#19
index-ttsindex-tts/index-tts35

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Speech
Stars
21,480
Growth
-
Language
Python
Created
2025-02-06
#20
buzzchidiwilliams/buzz42

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Speech
Stars
19,871
Growth
-
Language
Python
Created
2022-09-24
#21
dianari-labs/dia36

A TTS model capable of generating ultra-realistic dialogue in one pass.

Speech
Stars
19,326
Growth
-
Language
Python
Created
2025-04-19
#22
FunASRmodelscope/FunASR43

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

SpeechMCP
Stars
18,673
Growth
-
Language
Python
Created
2022-11-24
#23
pyvideotransjianchang512/pyvideotrans40

Translate the video from one language to another and embed dubbing & subtitles.

Speech
Stars
18,131
Growth
-
Language
Python
Created
2023-10-02
#24
SpeechNVIDIA-NeMo/Speech43

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Speech
Stars
17,634
Growth
-
Language
Python
Created
2019-08-05
#25
NeMoNVIDIA-NeMo/NeMo43

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Speech
Stars
17,560
Growth
-
Language
Python
Created
2019-08-05
#26
leonleon-ai/leon41

🧠 Leon is your open-source personal assistant.

AgentsSpeechAutomation
Stars
17,344
Growth
-
Language
TypeScript
Created
2019-02-10
#27
vosk-apialphacep/vosk-api36

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Speech
Stars
14,887
Growth
-
Language
Jupyter Notebook
Created
2019-09-03
#28
sherpa-onnxk2-fsa/sherpa-onnx38

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

Speech
Stars
13,237
Growth
-
Language
C++
Created
2022-09-01
#29
meetilyZackriya-Solutions/meetily38

Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows.

Speech
Stars
12,943
Growth
-
Language
Rust
Created
2024-12-26
#30
supertonicsupertone-inc/supertonic37

Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.

Speech
Stars
12,789
Growth
-
Language
Swift
Created
2025-11-18
Scroll to load more