Audio & Speech

Speech-Resources: 语音方向实验室/公司/资源/实习等，欢迎推荐或自荐
metame-ai/awesome-audio-plaza: Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
SpeechTasks: This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.
ai-audio-startups: Community list of startups working with AI in audio and music technology
speech_rankings: A CSRankings-like index for speech researchers
INTERSPEECH-2023-Papers: INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference.

SSL

Awesome-Speech-Pretraining: Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.
facebookresearch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

kaldi: Kaldi Speech Recognition Toolkit
next-gen kaldi
- k2-fsa/icefall: The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse.
- lhotse-speech/lhotse: Tools for handling speech data in machine learning projects.
openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
awesome-whisper: Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation.
facebookresearch/audiocraft: Audiocraft is a library for audio processing and generation with deep learning.
NVIDIA/NeMo: NeMo: a framework for generative AI

QwenLM/Qwen-Audio: The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
awesome-large-audio-models: Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Large-Audio-Models: Keep track of big models in audio domain, including speech, singing, music etc.

speech-datasets-collection: a curated list of speech datasets (110+ datasets, 75+ easy to download)
ai-audio-datasets: This is a list of datasets consisting of speech, music, and sound effects
ULCA-asr-dataset-corpus: asr dataset corpus collection
coqui-ai/open-speech-corpora: A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
voice_datasets: A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
audio-datasets: open-source audio datasets
speech_dataset: The dataset of Speech Recognition
k2-fsa/libriheavy: Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
facebookresearch/libri-light