In context: Some of the implications of today’s AI models are startling enough without adding a hyperrealistic human voice to them. We have seen several impressive examples over the last 10 years, but they seem to fall silent until a new one emerges. Enter Miles and Maya from Sesame AI, a company co-founded by former CEO and co-founder of Oculus, Brendan Iribe.
Researchers at Sesame AI have launched a new conversational speech model (CSM). This advanced voice AI has phenomenal human-like qualities that we have seen before from companies like Google (Duplex) and OpenAI (Omni). The demo showcases two AI voices named “Miles” (male) and “Maya” (female), and its realism has captivated some users. However, good luck trying the tech yourself. We tried and could only get to a message saying Sesame is trying to scale to capacity. For now, we’ll have to settle for a nice 30-minute demo by the YouTube channel Creator Magic (below).
Sesame’s technology uses a multimodal approach that processes text and audio in a single model, enabling more natural speech synthesis. This method is similar to OpenAI’s voice models, and the similarities are apparent. Despite its near-human quality in isolated tests, the system still struggles with conversational context, pacing, and flow – areas Sesame acknowledges as limitations. Company co-founder Brendan Iribe admits the tech is “firmly in the valley,” but he remains optimistic that improvements will close the gap.
While groundbreaking, the technology has raised significant questions about its societal impact. Reactions to the tech have varied from amazed and excited to disturbed and concerned. The CSM creates dynamic, natural conversations by incorporating subtle imperfections, like breath sounds, chuckles, and occasional self-corrections. These subtleties add to the realism and could help the tech bridge the uncanny valley in future iterations.
Users have praised the system for its expressiveness, often feeling like they’re talking to a real person. Some even mentioned forming emotional connections. However, not everyone has reacted positively to the demo. PCWorld’s Mark Hachman noted that the female version reminded him of an ex-girlfriend. The chatbot asked him questions as if trying to establish “intimacy” which made him extremely uncomfortable.
“That’s not what I wanted, at all. Maya already had Kim’s mannerisms down scarily well: the hesitations, lowering “her” voice when she confided in me, that sort of thing,” Hachman related. “It wasn’t exactly like [my ex], but close enough. I was so freaked out by talking to this AI that I had to leave.”