“Like many genAI tools, this podcast generated by GoogleLM really impresses at first listen. It’s a huge step up from text-to-audio tools like elevenlabs or ChatGPT’s voice mode. There are two people interacting with each other, very naturally and in such a realistic way. I immediately let my kids listen to it and asked them if they can tell the voice is spoken by real humans or not. “Well, if you ask me like that, it’s definitely generated by computers,” they replied, laughing. But after listening to this podcast for a while, it invariably starts to feel very “robotic”. It’s mostly because the interactions are repetitive in form and fixed. In real human interactions, one don’t always interact in such perfect back-and-forth manner in each and every turn. One lets the other person talk for minutes sometimes, or even longer. Especially in long form podcast, such frequent interruptions are distracting. But of course, this “robotic” feel can be “trained” away. It probably won’t take long. Soon, I will have my kids listen to two podcasts (I’m also iterating), one generated by machines and the other by humans, and they won’t be able to tell which is which. What that means for them? I don’t know.”
ninaxiangji via Apple Podcasts ·
United States of America ·
10/05/24