green slashgreen chevron
48.856649723289
2.3522238328
OPEN-SCIENCE
AI LAB
// WELCOME TO KYUTAI

OUR MISSION IS TO
BUILD AND DEMOCRATIZE
ARTIFICIAL GENERAL INTELLIGENCE
THROUGH OPEN SCIENCE

AI RESEARCH LAB BASED IN PARIS
-

Cascaded Voice AI

active project teaser imagepassive project teaser image
Unmute allows any LLM to listen and speak. At its core are our low-latency streaming text-to-speech and speech-to-text models, optimized for real time usage. Everything is open-source and freely available to use in your own projects.

Speech-Native Models

active project teaser imagepassive project teaser image

Moshi is the first speech-native dialogue system, unveiled during our first keynoteexternal link icon. Moshi processes speech directly rather than converting to text and back, which means it has minimal latency, and can understand emotions, and other non-verbal aspects of communication.

Moshi extends seamlessly to multimodal inputs: we showcase this with MoshiVis, a Moshi that you can talk to about images.

Moshi's multi-stream paradigm also enabled us to create Hibiki, an end-to-end real-time streaming translation system, lightweight enough to run on a phone.

Neural Audio Codecs

active project teaser imagepassive project teaser image

Encoding and decoding signals in a compressed yet accurate manner is a cornerstone of modern AI systems. Our streaming neural audio codec Mimi can efficiently model both semantic and acoustic information while achieving real-time latency. Originally developed for Moshi, Mimi is now a key component of all our audio projects.

If you want to dive deeper, check out our tutorial on neural audio codecs. It builds from the basics all the way to modern codecs like Mimi, with plenty of examples and animations along the way.

Compact Language Models

active project teaser imagepassive project teaser image

We are working on turning language models from monoliths into modular systems. Using the same model for everything is wasteful. What if you could select the knowledge, abilities and languages that you want your LLM to have, and get a specialized model 10x smaller than an equally smart generic LLM?

The first step is Helium 1, our modular and multilingual 2B-parameter LLM. In the spirit of open science, we are also releasing the dactoryexternal link iconcodebase and tools required to reproduce our training dataset.

We further release ARC-Encoder, a method to compress large contexts for LLMs, and neutral residues, an improvement over LoRA for adapting LLMs to new domains.

Learn more about what we do at Kyutai