1. Though you're seeing this in windows and Mac, it's more the case of "so called newer TTS voices dont give you too many params to manipulate" (read almost none). DECtalk and Outloud are formant engines and are an entirely different beast. There has been research in academia around hybrid engines, see Prof Sue Hertz work from Cornell (she is also the one who created Eloquent AKA Outloud in the 90's). 2. The example you posted sounds fine, but that is also because you're "cheating" in a way. 3. It does not sound jarring because you have relatively significant stretches of speech in the same voice. 4. The following will likely sound worse: int x, y, z = 0.0; char *x ="abc"; 5. All that said, switching among voice families might well end up being what we can do for the newer engines; we made a similar compromise with Math readings in Chromevox in 2012, with the result being orders of magnitude poorer than the readings produced by AsTeR using the DECTalk in 1994. 6. Another param that could be usefully applied -- but will need work with the newer voices is spatialization -- read about "SOFA" to understand where the audio world is heading. 7. Hopefully the newer engines will eventually expose some params for influencing emotion etc -- we even worked on an Emotional Markup spec about 20 years ago at the W3C -- but that went no where. 8. And no surprize that a different voice for notifications works well, that should never be jarring --
|Full archive May 1995 - present by Year|Search the archive|
If you have questions about this archive or had problems using it, please contact us.