Audio & Speech

  • Speech-Resources: 语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
  • metame-ai/awesome-audio-plaza: Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
  • SpeechTasks: This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.
  • ai-audio-startups: Community list of startups working with AI in audio and music technology
  • speech_rankings: A CSRankings-like index for speech researchers
  • INTERSPEECH-2023-Papers: INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference.

SSL

ASR

  • kaldi: Kaldi Speech Recognition Toolkit
  • next-gen kaldi
    • k2-fsa/icefall: The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse.
    • lhotse-speech/lhotse: Tools for handling speech data in machine learning projects.
  • openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  • awesome-whisper: Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

Generation

Audio/Speech LLM

  • QwenLM/Qwen-Audio: The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
  • awesome-large-audio-models: Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
  • Large-Audio-Models: Keep track of big models in audio domain, including speech, singing, music etc.

Dataset