AI Explorer
Speech
Speech recognition, text-to-speech, and audio AI projects.
Data source
Category lists combine GitHub search queries, repository topics, descriptions, and sync snapshots.
Ranking logic
Projects are filtered for category relevance, then ordered by stars and quality signals.
Best for
Use category pages when you already know the AI workflow or tool type you want to evaluate.
48 projects
Robust Speech Recognition via Large-Scale Weak Supervision
- Stars
- 103,797
- Growth
- -
- Language
- Python
- Created
- 2022-09-16
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
- Stars
- 67,544
- Growth
- -
- Language
- Python
- Created
- 2023-11-29
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
- Stars
- 59,130
- Growth
- -
- Language
- Python
- Created
- 2024-01-14
Port of OpenAI's Whisper model in C/C++
- Stars
- 51,118
- Growth
- -
- Language
- C++
- Created
- 2022-09-25
Open-Source Frontier Voice AI
- Stars
- 48,612
- Growth
- -
- Language
- Python
- Created
- 2025-08-25
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
- Stars
- 47,211
- Growth
- -
- Language
- Go
- Created
- 2023-03-18
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
- Stars
- 45,641
- Growth
- -
- Language
- Python
- Created
- 2020-05-20
💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.
- Stars
- 40,427
- Growth
- -
- Language
- TypeScript
- Created
- 2024-12-01
A generative speech model for daily dialogue.
- Stars
- 39,520
- Growth
- -
- Language
- Python
- Created
- 2024-05-27
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
- Stars
- 36,908
- Growth
- -
- Language
- Python
- Created
- 2021-08-07
Instant voice cloning by MIT and MyShell. Audio foundation model.
- Stars
- 36,804
- Growth
- -
- Language
- Python
- Created
- 2023-11-29
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
- Stars
- 35,380
- Growth
- -
- Language
- Python
- Created
- 2021-08-16
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
- Stars
- 32,024
- Growth
- -
- Language
- Python
- Created
- 2025-09-16
Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)
- Stars
- 30,078
- Growth
- -
- Language
- Python
- Created
- 2026-01-28
Faster Whisper transcription with CTranslate2
- Stars
- 23,905
- Growth
- -
- Language
- Python
- Created
- 2023-02-11
🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine
- Stars
- 23,760
- Growth
- -
- Language
- Python
- Created
- 2025-11-07
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
- Stars
- 22,761
- Growth
- -
- Language
- Python
- Created
- 2022-12-09
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
- Stars
- 21,876
- Growth
- -
- Language
- Python
- Created
- 2024-07-03
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
- Stars
- 21,480
- Growth
- -
- Language
- Python
- Created
- 2025-02-06
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
- Stars
- 19,871
- Growth
- -
- Language
- Python
- Created
- 2022-09-24
A TTS model capable of generating ultra-realistic dialogue in one pass.
- Stars
- 19,326
- Growth
- -
- Language
- Python
- Created
- 2025-04-19
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
- Stars
- 18,673
- Growth
- -
- Language
- Python
- Created
- 2022-11-24
Translate the video from one language to another and embed dubbing & subtitles.
- Stars
- 18,131
- Growth
- -
- Language
- Python
- Created
- 2023-10-02
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
- Stars
- 17,634
- Growth
- -
- Language
- Python
- Created
- 2019-08-05
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
- Stars
- 17,560
- Growth
- -
- Language
- Python
- Created
- 2019-08-05
🧠 Leon is your open-source personal assistant.
- Stars
- 17,344
- Growth
- -
- Language
- TypeScript
- Created
- 2019-02-10
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
- Stars
- 14,887
- Growth
- -
- Language
- Jupyter Notebook
- Created
- 2019-09-03
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
- Stars
- 13,237
- Growth
- -
- Language
- C++
- Created
- 2022-09-01
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows.
- Stars
- 12,943
- Growth
- -
- Language
- Rust
- Created
- 2024-12-26
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
- Stars
- 12,789
- Growth
- -
- Language
- Swift
- Created
- 2025-11-18