DVICE has an excellent video showing the difference between text-to-speech and what us humans call “neurons-to-speech.” As evidenced by this brief scene from Blade Runner, acted out by an iPod Shuffle and a Kindle 2, we find that the Authors Guild is as crazy as a sack of beetles in a windstorm.
TTS hasn’t improved for one good reason: the human voice is just fine for reading out text and through the use of simple synthesis – Garmin, for example, uses a nice Australian woman to synthesize everything its GPS devices have to say – you can say almost anything you want. Although you’ll get a few clinkers in there where the software can’t quite translate a phoneme, it’s mostly correct.
What DVICE has quite cleverly shown is that TTS devices are complex but failed systems. The human ear loves the human voice just as the human eye loves the human face – ask any five month old. They’ll smile and coo at anything with two eyes and that doesn’t sound like a broken Speak ‘n’ Spell. But the love real human voices and faces. There is nothing better. To essentially outlaw TTS on the Kindle 2 is analogous to outlawing voices in toys because they take the place of a loving, caring parent. If you find TTS – or a Fisher-Price stuffed animal that talks – a strong analog to the human voice, you have other issues.










I don’t know if the assessment of the Author’s guild is necessarily correct, but please don’t assume that I agree with them at any rate.
I don’t think anyone would tell you that Kindle 2 is as good as a human reader. But to not have an eye on the future about this, and assume that it’s not going to improve in the next few years, is silly. They can’t necessarily let this precedent stand as the gap closes.
The thing that I don’t agree with them about is the idea that TTS means a lot of lost sales in the long run. They do get royalties on the audio book version, but I don’t think a lot of people are going to pay for a text and then pay again for a spoken version.
I don’t think there is any question the technology currently lacks the range of true human emotion. It’s arguable that many humans lack the range – that’s why actors make money.
The consensus at this point (among people I’ve talked to) seems the Authors Guild drew a preemptive line in the sand with the expectation that the technology will improve to the point where it becomes a threat. It’s not there yet, but they want to fight this when people don’t care. Maybe it will never get there. But it’s rapidly matured in the past couple years, and looking ahead isn’t unreasonable.
To most people right now, it’s just an irrelevant if somewhat bizarre fight over obscure payments. In the future, assuming TTS matures to a widely useful level, it would appear as greedy authors restricting the advances of technology at the expense of consumers. It would be a tough fight to win, in terms of PR if nothing else.
The best analogy I can come up with (by no means perfect) is if the RIAA went after IRC users the way they went after Limewire users, they would have avoided a lot of nastiness by stopping a fringe movement before it became mainstream.
But yeah, original point stands. No one who isn’t trying to sell product is arguing that the current tech is a valid human substitute.
From the Engadget – Paul Aiken interview (exec. director of authors guild)
“it’s made a generational leap — it’s much better than it was. What we’re looking at is the trend here, where it’s headed, how good will it be three, four, five years out from now and the threat that might pose to the audiobook industry.”
“Amazon can upgrade the software anytime they like; for another, whether or not this poses an immediate threat to the current audiobook industry, text-to-speech still is and should be a legitimate market for authors and publishers.”
My point is only that the Authors Guild’s concerns aren’t about current tech. At all. Even slightly.
Which doesn’t mean I agree with them. It just means articles and videos like this aren’t really addressing the issue.
1 more quote from the interview, because it’s relevant and I’m bored…
Q: So your fundamental complaint with the Kindle 2 right now is that it is creating audiobooks. It’s replacing the market for audiobooks.
A: No, let me be clear: it has the _potential_ to replace audiobooks. I don’t know if it’s good enough at the moment to do that. It certainly could in the future with one or two generations of software, I don’t know how quick that’ll happen. It could have a significant impact on the audiobook market.
nice articles
thanks
Oh man, that video is hilarious.
TTS has improved significantly over the past few years – just not to the point where you can’t tell a difference between humans and computers.
I agree that TTS isn’t good enough to replace a talented human reading a full novel.
But it *is* good enough for certain “on-the go” applications where the benefit of being able to “multitask” is strong (i.e. GPS navigation systems and talking mobile news applications ;)
Where can i find TTS engine’s English language ?
What’s best ?
Please give me url’s
Thank’s for your answers, Andrew
http://www.nuance.com
Hello everybody,
Do anybody know how is the Kindle’s “Text To Speech” implemented?. I’ve read they finally don’t include this functionality, but i’m wondering if they’ve done by hardware (with a TTS SoC like Sensory) or by Software.
Thank you!
DGS