1. Though you're seeing this in windows and Mac, it's more the case of
"so called newer TTS voices dont give you too many params to
manipulate" (read almost none). DECtalk and Outloud are formant
engines and are an entirely different beast. There has been research
in academia around hybrid engines, see Prof Sue Hertz work from
Cornell (she is also the one who created Eloquent AKA Outloud in the 90's).
2. The example you posted sounds fine, but that is also because you're
"cheating" in a way.
3. It does not sound jarring because you have relatively significant
stretches of speech in the same voice.
4. The following will likely sound worse:
int x, y, z = 0.0;
char *x ="abc";
5. All that said, switching among voice families might well end up being
what we can do for the newer engines; we made a similar compromise
with Math readings in Chromevox in 2012, with the result being orders
of magnitude poorer than the readings produced by AsTeR using the
DECTalk in 1994.
6. Another param that could be usefully applied -- but will need work
with the newer voices is spatialization -- read about "SOFA" to
understand where the audio world is heading.
7. Hopefully the newer engines will eventually expose some params for
influencing emotion etc -- we even worked on an Emotional Markup
spec about 20 years ago at the W3C -- but that went no where.
8. And no surprize that a different voice for notifications works well,
that should never be jarring
--
|Full archive May 1995 - present by Year|Search the archive|
If you have questions about this archive or had problems using it, please contact us.