emacspeak - Re: [Emacspeak] TTS Server Implementation Questions

Subject: Emacspeak discussion list

List archive

Re: [Emacspeak] TTS Server Implementation Questions

From: John Covici <covici AT ccs.covici.com>
To: Victor Tsaran <vtsaran AT gmail.com>
Cc: Parham Doustdar <parham90 AT gmail.com>, Tim Cross <theophilusx AT gmail.com>, Robert Melton <lists AT robertmelton.com>, Emacspeaks <emacspeak AT emacspeak.net>, "T.V Raman" <raman AT google.com>
Subject: Re: [Emacspeak] TTS Server Implementation Questions
Date: Tue, 09 Apr 2024 15:05:19 -0400
Organization: Covici Computer Systems

The problem I was looking at, is when reading a long buffer, if I stop
reading the cursor is still where I started.

On Tue, 09 Apr 2024 14:42:53 -0400,
Victor Tsaran wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> I guess, the question stands: what user-facing problem are we trying to
> solve?
>
>
> On Tue, Apr 9, 2024 at 3:14 AM Parham Doustdar <emacspeak AT emacspeak.net>
> wrote:
>
> > That's true, Emacspeak doesn't currently "read" from the speech server
> > process as far as I've seen, it only "writes" to it. Fixing that isn't
> > impossible, but definitely time consuming.
> > The other concrete issue is that last time I checked, console screen
> > readers read all the text in one chunk. They don't use the audio CSS
> > (forgive me if I don't use the correct name here) that Emacspeak has,
> > which
> > requires you to play audio icons, speak text with different pitch, and
> > pauses. All of this means that you have to do extra heavy-lifting to
> > really
> > track the index, because the index you get back from the TTS engine isn't
> > simply a position in the buffer -- it is just the position in the current
> > chunk of text it has recently received.
> > So that's why I'm curious if we really think it's worth it. It could be,
> > or not, I'm not opinionated, but I'm also realizing that in our community,
> > we don't really have a good mechanism to discuss and decide on things like
> > this.
> >
> > On Tue, Apr 9, 2024 at 8:35 AM Tim Cross <theophilusx AT gmail.com> wrote:
> >
> >>
> >> You are overlooking one critical component which explains why adding
> >> indxing support is a non-trivial exercise which would require a complete
> >> redesign of the existing TTS interface model.
> >>
> >> For indexing information to be of any use, it has to be fed back into the
> >> client and used by the client. For example, tell the client to
> >> update/move the cursor to the last position spoken.
> >>
> >> There is absolutely no support for this data to be fed back into the
> >> current system. The current TTS interface has data flowing in only one
> >> direction, from emacs to emacpseak and from emacspeak to the TTS server
> >> and form the tts server to the tts synthesizer. There is no existing
> >> mechanism to feed information (i.e. index positions) back from the TTS
> >> engine to emacs. While getting this information from the TTS engine into
> >> the TTS server is probably reasonably easy, there is no existing channel
> >> to feed that information up into Emacspeak.
> >>
> >> Not only would it be necessary to define and implement a whole new model
> >> to incorporate this feedback, in addition to also working with TTS
> >> engines which do not provide indexing information, you would also likely
> >> need to implement some sort of multi speech cursor tracking so that the
> >> system can track cursor positions in different buffers.
> >>
> >> The reason this sort of functionality seems easy in systems like speakup
> >> or speech-dispatcher is because those systems were designed with this
> >> functionality. It is incprporated into the base design and part of the
> >> various communication protocols the design implement. Adding this
> >> functionality is not something which can just be 'tacked on'.
> >>
> >> The good news of course is that being open source, anyone can go ahead
> >> and define a new interface model and add indexing capability. However,
> >> it may be worth considering that it has taken 30 years of development to
> >> get the current model to where it is at, so I think you can expect a
> >> pretty steep climb initially!
> >>
> >> John Covici <covici AT ccs.covici.com> writes:
> >>
> >> > Its a lot simpler -- indexing is supposed to simply arrange things so
> >> > that when reading a buffer, and you stop reading, the cursor will be
> >> > at or near the point where you stopped. Speakup has had this for a
> >> > long time and that is why I use it on Linux. But its only good for
> >> > the virtual console. Now speech dispatcher has indexinng built in, so
> >> > if you connect to that and use one of the supported synthesizers,
> >> > indexing works correctly and I don't see any performance hit. I think
> >> > all the client has to do is connect to speech dispatcher, but check me
> >> > on this.
> >> >
> >> > On Mon, 08 Apr 2024 08:25:27 -0400,
> >> > Robert Melton wrote:
> >> >>
> >> >> Is indexing supposed to be like per reading block, or like one
> >> global? Is the idea
> >> >> that you can be reading a buffer, go to another buffer, read some of
> >> it, then come
> >> >> back and continue? IE: Index per "reading block"?
> >> >>
> >> >> Assuming it is global for simplicity, it is still a heavy lift for
> >> implementation on
> >> >> Mac and Windows.
> >> >>
> >> >> As they do not natively report back as words are spoken, now
> >> >> you can get this behavior at an "Utterance" level, by installing hooks
> >> and callbacks
> >> >> and tracking those. With that you would need to additionally keep
> >> copies of the future
> >> >> utterances, even if they already where queued with the TTS.
> >> >>
> >> >> Considered from the POV of index per reading block, then you need to
> >> find ways to ident
> >> >> each one and its position and index them and continue reading.
> >> >>
> >> >> Sounds neat, but at least for my servers, right now, the juice isn't
> >> worth the sqeeze, I
> >> >> am still trying to get basic stuff like pitch multipliers working on
> >> windows via wave
> >> >> mangling and other basic features, hehe.
> >> >>
> >> >> > On Apr 8, 2024, at 05:20, Parham Doustdar <parham90 AT gmail.com>
> >> wrote:
> >> >> >
> >> >> > I understand. My question isn't whether it's possible though, or how
> >> difficult it
> >> >> > would be, or the steps we'd have to take to implement it.
> >> >> > My question is more about whether the use cases we have today make
> >> it worth it to
> >> >> > reconsider. All other questions we can apply the wisdom of the
> >> community to solve, if
> >> >> > we were convinced that the effort would be worth it.
> >> >> > For me, the way I've got around this is to use the next/previous
> >> paragraph
> >> >> > commands. The chunks are good small enough that I can "zoom in" if I
> >> want, and yet
> >> >> > large enough that I don't have to constantly hit next-line.
> >> >> > Sent from my iPhone
> >> >> >
> >> >> >> On 8 Apr 2024, at 11:13, Tim Cross <theophilusx AT gmail.com> wrote:
> >> >> >>
> >> >> >> 
> >> >> >> This is extremely unlikely to be implemented. It is non-trivial and
> >> >> >> would require a significant re-design of the whole interface and
> >> model
> >> >> >> of operation. It isn't as simple as just getting index information
> >> from
> >> >> >> the TTS servers which support it. That information has to then be
> >> fed
> >> >> >> backwards to Emacs through some mechanism which currently does not
> >> >> >> exist and would result in a far more complicated interface/model.
> >> >> >>
> >> >> >> As Raman said, the decision not to have this was not simply an
> >> oversight
> >> >> >> or due to lack of time. It was a conscious design decision. What
> >> your
> >> >> >> asking for isn't simply an enhancement, it is a complete redesign
> >> of the
> >> >> >> TTS interface model.
> >> >> >>
> >> >> >> "Parham Doustdar" (via emacspeak Mailing List) <
> >> emacspeak AT emacspeak.net> writes:
> >> >> >>
> >> >> >>> I agree. I'm not sure which TTS engines support it. Maybe, just
> >> like notification streams
> >> >> >>> are supported in some servers, we can implement this feature for
> >> engines that support it?
> >> >> >>> Sent from my iPhone
> >> >> >>>
> >> >> >>>>> On 8 Apr 2024, at 10:24, John Covici <emacspeak AT emacspeak.net>
> >> wrote:
> >> >> >>>>
> >> >> >>>> I know this might be contraversial, but, indexing would be very
> >> useful
> >> >> >>>> to me, sometimes I read long buffers and when I stop the
> >> reading, the
> >> >> >>>> cursor is still where I started, so no real way to do this
> >> adequately
> >> >> >>>> -- I would not mind if it were just down to the line, rather than
> >> >> >>>> individual words, but it would make emacspeak lots nicer for me.
> >> >> >>>>
> >> >> >>>>> On Fri, 05 Apr 2024 15:39:15 -0400,
> >> >> >>>>> "T.V Raman" (via emacspeak Mailing List) wrote:
> >> >> >>>>>
> >> >> >>>>> [1 <text/plain; us-ascii (7bit)>]
> >> >> >>>>> as a single call is that it ensures atomicity i.e. all of the
> >> state
> >> >> >>>>> gets set at one shot from the perspective of the elisp layer, so
> >> you
> >> >> >>>>> hopefully never get TTS that has its state partially set.
> >> >> >>>>> note that the other primary benefit of tts_sync_state
> >> >> >>>>>
> >> >> >>>>> Robert Melton writes:
> >> >> >>>>>> On threading. It is all concurrent, lots of fun protecting of
> >> the state.
> >> >> >>>>>>
> >> >> >>>>>> On language and voice, I was thinking of them as a tree,
> >> language/voice,
> >> >> >>>>>> as this is how Windows and MacOS seem to provide them.
> >> >> >>>>>>
> >> >> >>>>>> ----
> >> >> >>>>>>
> >> >> >>>>>> Oh, one last thing. Should TTS Server implementations be
> >> returning a \n
> >> >> >>>>>> after command is complete, or is just returning nothing
> >> acceptable?
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>> On Apr 5, 2024, at 14:01, T.V Raman <raman AT google.com> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> And do spend some time thinking of atomicity and multithreaded
> >> systems,
> >> >> >>>>>>> e.g. ask yourself the question "how many threads of execution
> >> are active
> >> >> >>>>>>> at any given time"; Hint: the answer isn't as simple as "just
> >> one
> >> >> >>>>>>> because my server doesn't use threads". > Raman--
> >> >> >>>>>>>>
> >> >> >>>>>>>> Thanks so much, that clarifies a bunch. A few questions on
> >> >> >>>>>>>> the
> >> >> >>>>>>>> language / voice support.
> >> >> >>>>>>>>
> >> >> >>>>>>>> Does the TTS server maintain an internal list and switch
> >> through
> >> >> >>>>>>>> it or does it send the list the lisp in a way I have missed?
> >> >> >>>>>>>>
> >> >> >>>>>>>> Would it be useful to have a similar feature for voices,
> >> >> >>>>>>>> being
> >> >> >>>>>>>> first you pick right language, then you pick preferred voice
> >> >> >>>>>>>> then maybe it is stored in a defcustom and sent next time as
> >> >> >>>>>>>> (set_lang lang:voice t)
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>> On Apr 5, 2024, at 13:10, T.V Raman <raman AT google.com>
> >> wrote:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> If your TTS supports more than one language, the TTS API
> >> exposes these
> >> >> >>>>>>>>> as a list; these calls loop through the list
> >> (dectalk,espeak, outloud)
> >> >> >>>>>>>>
> >> >> >>>>>>>> --
> >> >> >>>>>>>> Robert "robertmeta" Melton
> >> >> >>>>>>>> lists AT robertmelton.com
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>> --
> >> >> >>>>>> Robert "robertmeta" Melton
> >> >> >>>>>> lists AT robertmelton.com
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> [2 <text/plain; UTF-8 (8bit)>]
> >> >> >>>>> Emacspeak discussion list -- emacspeak AT emacspeak.net
> >> >> >>>>> To unsubscribe send email to:
> >> >> >>>>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>> Your life is like a penny. You're going to lose it. The
> >> question is:
> >> >> >>>> How do
> >> >> >>>> you spend it?
> >> >> >>>>
> >> >> >>>> John Covici wb2una
> >> >> >>>> covici AT ccs.covici.com
> >> >> >>>> Emacspeak discussion list -- emacspeak AT emacspeak.net
> >> >> >>>> To unsubscribe send email to:
> >> >> >>>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
> >> >> >>>
> >> >> >>> Emacspeak discussion list -- emacspeak AT emacspeak.net
> >> >> >>> To unsubscribe send email to:
> >> >> >>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
> >> >>
> >> >> --
> >> >> Robert "robertmeta" Melton
> >> >> lists AT robertmelton.com
> >> >>
> >> >>
> >>
> > Emacspeak discussion list -- emacspeak AT emacspeak.net
> > To unsubscribe send email to:
> > emacspeak-request AT emacspeak.net with a subject of: unsubscribe
> >
>
>
> --
>
> --- --- --- ---
> Find my music on
> Youtube: http://www.youtube.com/c/victortsaran
> <http://www.youtube.com/vtsaran>
> Spotify: https://open.spotify.com/artist/605ZF2JPei9KqgbXBqYA16
> Band Camp: http://victortsaran.bandcamp.com
> [2 <text/html; UTF-8 (quoted-printable)>]

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
covici AT ccs.covici.com

Re: [Emacspeak] TTS Server Implementation Questions, (continued)

List archive

Re: [Emacspeak] TTS Server Implementation Questions