What happened this week
A quiet but focused week. Every entry this week is a paper, and every paper is about the same narrow problem: when does the user stop talking, and when should the agent start. Three different angles — a model, a benchmark, and a community challenge.
Method — joint acoustic and linguistic cues
JAL-Turn proposes a turn-taking head that fuses streaming acoustic features with semantic cues from a running LLM, rather than choosing between VAD-style and end-to-end approaches. The framing is explicitly production-oriented: the authors argue that the fully-native full-duplex LMs cost too much to train and deploy for commercial voice agents, and that a lightweight fused head is the pragmatic middle path. Treat the reported accuracy as a baseline claim until external reproductions land.
Benchmarks — two different angles on interruption
Two benchmark papers, both Chinese research groups, both targeting the interruption-detection failure mode that cascaded systems keep hitting:
- SID-Bench (ICME 2026, code released) focuses on semantic interruption detection — backchannels should not stop the agent; topic pivots should. It proposes an Average Penalty Time metric that assigns temporal costs to both false alarms and late stops, which is a more useful single-number score than the usual precision/recall pair.
- Interspeech 2026 Audio Encoder Capability Challenge is a shared-task paper that treats audio-encoder quality as a pre-requisite for Large Audio Language Models. Not a paper to cite, but a paper to watch for the leaderboard in late summer.
Corrections to hello@fullduplex.ai. Next issue: 2026-W15.