Speech-to-speech AI, a primer.
What changed in 2024, what the words mean, and why a new class of models treats speech as a first-class language rather than a pipeline of text conversions.
What changed in 2024, what the words mean, and why a new class of models treats speech as a first-class language rather than a pipeline of text conversions.
One dispatch per week while the STS series runs. No fluff.
A weekly dispatch mapping speech-to-speech, full-duplex, and audio foundation models. Ten articles, honest status.
Long-form profiles of the labs, companies, and institutions that shape the open speech-to-speech landscape. One player at a time — what they shipped, what they bet on, and what the rest of the stack assumes about them.
STS, full-duplex, and audio-foundation-model evaluations — what each one measures and when to trust it.
see all →From Moshi to Gemini Live — production and research systems that can hold a real-time voice conversation.
see all →Speech corpora for training conversational AI, including 14 frontier 2024-26 releases.
see all →GitHub discussions + Discord. Report errors, propose additions, share papers worth tracking.
join →