Subject: Emacspeak discussion list
List archive
- From: Tim Cross <theophilusx AT gmail.com>
- To: John Covici <covici AT ccs.covici.com>
- Cc: Victor Tsaran <vtsaran AT gmail.com>, Parham Doustdar <parham90 AT gmail.com>, Robert Melton <lists AT robertmelton.com>, Emacspeaks <emacspeak AT emacspeak.net>, "T.V Raman" <raman AT google.com>
- Subject: Re: [Emacspeak] TTS Server Implementation Questions
- Date: Wed, 10 Apr 2024 07:19:25 +1000
Look at the key bindings for the various movement commands. This is one of
the great powers of emacs and emacspeak works with those commands so
that the text is spoken appropriately based on the movement unit. This
also results in the cursor being moved by that unit of space.
For example, emacs has commands for moving by paragraph, sentence, sexp
(for programming modes), pages (where pages are deliniated by control l
characters) and windows/screens.
In addition, emacspeak also implements a number of convenience commands
for 'browsing' a buffer of text in various ways, allowing you to move
forward in convenient chunks of text by hitting just one key.
I suspect one reason many experienced emacs and emacspeak users don't
find the problem you have identified much of an issue is because they
have got use to using the various movement commands and other
facilities, such as isearch, to move around and get the spoken feedback
as necessary. It is a little like learning to use keyboard navigation
rather than the arrow keys and mouse. It has a bit of a learning curve,
but once you get past that curve, you really get the benefits and don't
look back.
John Covici <covici AT ccs.covici.com> writes:
> The problem I was looking at, is when reading a long buffer, if I stop
> reading the cursor is still where I started.
>
> On Tue, 09 Apr 2024 14:42:53 -0400,
> Victor Tsaran wrote:
>>
>> [1 <text/plain; UTF-8 (quoted-printable)>]
>> I guess, the question stands: what user-facing problem are we trying to
>> solve?
>>
>>
>> On Tue, Apr 9, 2024 at 3:14 AM Parham Doustdar <emacspeak AT emacspeak.net>
>> wrote:
>>
>> > That's true, Emacspeak doesn't currently "read" from the speech server
>> > process as far as I've seen, it only "writes" to it. Fixing that isn't
>> > impossible, but definitely time consuming.
>> > The other concrete issue is that last time I checked, console screen
>> > readers read all the text in one chunk. They don't use the audio CSS
>> > (forgive me if I don't use the correct name here) that Emacspeak has,
>> > which
>> > requires you to play audio icons, speak text with different pitch, and
>> > pauses. All of this means that you have to do extra heavy-lifting to
>> > really
>> > track the index, because the index you get back from the TTS engine isn't
>> > simply a position in the buffer -- it is just the position in the current
>> > chunk of text it has recently received.
>> > So that's why I'm curious if we really think it's worth it. It could be,
>> > or not, I'm not opinionated, but I'm also realizing that in our
>> > community,
>> > we don't really have a good mechanism to discuss and decide on things
>> > like
>> > this.
>> >
>> > On Tue, Apr 9, 2024 at 8:35 AM Tim Cross <theophilusx AT gmail.com> wrote:
>> >
>> >>
>> >> You are overlooking one critical component which explains why adding
>> >> indxing support is a non-trivial exercise which would require a complete
>> >> redesign of the existing TTS interface model.
>> >>
>> >> For indexing information to be of any use, it has to be fed back into
>> >> the
>> >> client and used by the client. For example, tell the client to
>> >> update/move the cursor to the last position spoken.
>> >>
>> >> There is absolutely no support for this data to be fed back into the
>> >> current system. The current TTS interface has data flowing in only one
>> >> direction, from emacs to emacpseak and from emacspeak to the TTS server
>> >> and form the tts server to the tts synthesizer. There is no existing
>> >> mechanism to feed information (i.e. index positions) back from the TTS
>> >> engine to emacs. While getting this information from the TTS engine into
>> >> the TTS server is probably reasonably easy, there is no existing channel
>> >> to feed that information up into Emacspeak.
>> >>
>> >> Not only would it be necessary to define and implement a whole new model
>> >> to incorporate this feedback, in addition to also working with TTS
>> >> engines which do not provide indexing information, you would also likely
>> >> need to implement some sort of multi speech cursor tracking so that the
>> >> system can track cursor positions in different buffers.
>> >>
>> >> The reason this sort of functionality seems easy in systems like speakup
>> >> or speech-dispatcher is because those systems were designed with this
>> >> functionality. It is incprporated into the base design and part of the
>> >> various communication protocols the design implement. Adding this
>> >> functionality is not something which can just be 'tacked on'.
>> >>
>> >> The good news of course is that being open source, anyone can go ahead
>> >> and define a new interface model and add indexing capability. However,
>> >> it may be worth considering that it has taken 30 years of development to
>> >> get the current model to where it is at, so I think you can expect a
>> >> pretty steep climb initially!
>> >>
>> >> John Covici <covici AT ccs.covici.com> writes:
>> >>
>> >> > Its a lot simpler -- indexing is supposed to simply arrange things so
>> >> > that when reading a buffer, and you stop reading, the cursor will be
>> >> > at or near the point where you stopped. Speakup has had this for a
>> >> > long time and that is why I use it on Linux. But its only good for
>> >> > the virtual console. Now speech dispatcher has indexinng built in, so
>> >> > if you connect to that and use one of the supported synthesizers,
>> >> > indexing works correctly and I don't see any performance hit. I think
>> >> > all the client has to do is connect to speech dispatcher, but check me
>> >> > on this.
>> >> >
>> >> > On Mon, 08 Apr 2024 08:25:27 -0400,
>> >> > Robert Melton wrote:
>> >> >>
>> >> >> Is indexing supposed to be like per reading block, or like one
>> >> global? Is the idea
>> >> >> that you can be reading a buffer, go to another buffer, read some of
>> >> it, then come
>> >> >> back and continue? IE: Index per "reading block"?
>> >> >>
>> >> >> Assuming it is global for simplicity, it is still a heavy lift for
>> >> implementation on
>> >> >> Mac and Windows.
>> >> >>
>> >> >> As they do not natively report back as words are spoken, now
>> >> >> you can get this behavior at an "Utterance" level, by installing
>> >> >> hooks
>> >> and callbacks
>> >> >> and tracking those. With that you would need to additionally keep
>> >> copies of the future
>> >> >> utterances, even if they already where queued with the TTS.
>> >> >>
>> >> >> Considered from the POV of index per reading block, then you need to
>> >> find ways to ident
>> >> >> each one and its position and index them and continue reading.
>> >> >>
>> >> >> Sounds neat, but at least for my servers, right now, the juice isn't
>> >> worth the sqeeze, I
>> >> >> am still trying to get basic stuff like pitch multipliers working on
>> >> windows via wave
>> >> >> mangling and other basic features, hehe.
>> >> >>
>> >> >> > On Apr 8, 2024, at 05:20, Parham Doustdar <parham90 AT gmail.com>
>> >> wrote:
>> >> >> >
>> >> >> > I understand. My question isn't whether it's possible though, or
>> >> >> > how
>> >> difficult it
>> >> >> > would be, or the steps we'd have to take to implement it.
>> >> >> > My question is more about whether the use cases we have today make
>> >> it worth it to
>> >> >> > reconsider. All other questions we can apply the wisdom of the
>> >> community to solve, if
>> >> >> > we were convinced that the effort would be worth it.
>> >> >> > For me, the way I've got around this is to use the next/previous
>> >> paragraph
>> >> >> > commands. The chunks are good small enough that I can "zoom in" if
>> >> >> > I
>> >> want, and yet
>> >> >> > large enough that I don't have to constantly hit next-line.
>> >> >> > Sent from my iPhone
>> >> >> >
>> >> >> >> On 8 Apr 2024, at 11:13, Tim Cross <theophilusx AT gmail.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> This is extremely unlikely to be implemented. It is non-trivial
>> >> >> >> and
>> >> >> >> would require a significant re-design of the whole interface and
>> >> model
>> >> >> >> of operation. It isn't as simple as just getting index information
>> >> from
>> >> >> >> the TTS servers which support it. That information has to then be
>> >> fed
>> >> >> >> backwards to Emacs through some mechanism which currently does not
>> >> >> >> exist and would result in a far more complicated interface/model.
>> >> >> >>
>> >> >> >> As Raman said, the decision not to have this was not simply an
>> >> oversight
>> >> >> >> or due to lack of time. It was a conscious design decision. What
>> >> your
>> >> >> >> asking for isn't simply an enhancement, it is a complete redesign
>> >> of the
>> >> >> >> TTS interface model.
>> >> >> >>
>> >> >> >> "Parham Doustdar" (via emacspeak Mailing List) <
>> >> emacspeak AT emacspeak.net> writes:
>> >> >> >>
>> >> >> >>> I agree. I'm not sure which TTS engines support it. Maybe, just
>> >> like notification streams
>> >> >> >>> are supported in some servers, we can implement this feature for
>> >> engines that support it?
>> >> >> >>> Sent from my iPhone
>> >> >> >>>
>> >> >> >>>>> On 8 Apr 2024, at 10:24, John Covici <emacspeak AT emacspeak.net>
>> >> wrote:
>> >> >> >>>>
>> >> >> >>>> I know this might be contraversial, but, indexing would be very
>> >> useful
>> >> >> >>>> to me, sometimes I read long buffers and when I stop the
>> >> reading, the
>> >> >> >>>> cursor is still where I started, so no real way to do this
>> >> adequately
>> >> >> >>>> -- I would not mind if it were just down to the line, rather
>> >> >> >>>> than
>> >> >> >>>> individual words, but it would make emacspeak lots nicer for me.
>> >> >> >>>>
>> >> >> >>>>> On Fri, 05 Apr 2024 15:39:15 -0400,
>> >> >> >>>>> "T.V Raman" (via emacspeak Mailing List) wrote:
>> >> >> >>>>>
>> >> >> >>>>> [1 <text/plain; us-ascii (7bit)>]
>> >> >> >>>>> as a single call is that it ensures atomicity i.e. all of the
>> >> state
>> >> >> >>>>> gets set at one shot from the perspective of the elisp layer,
>> >> >> >>>>> so
>> >> you
>> >> >> >>>>> hopefully never get TTS that has its state partially set.
>> >> >> >>>>> note that the other primary benefit of tts_sync_state
>> >> >> >>>>>
>> >> >> >>>>> Robert Melton writes:
>> >> >> >>>>>> On threading. It is all concurrent, lots of fun protecting of
>> >> the state.
>> >> >> >>>>>>
>> >> >> >>>>>> On language and voice, I was thinking of them as a tree,
>> >> language/voice,
>> >> >> >>>>>> as this is how Windows and MacOS seem to provide them.
>> >> >> >>>>>>
>> >> >> >>>>>> ----
>> >> >> >>>>>>
>> >> >> >>>>>> Oh, one last thing. Should TTS Server implementations be
>> >> returning a \n
>> >> >> >>>>>> after command is complete, or is just returning nothing
>> >> acceptable?
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>> On Apr 5, 2024, at 14:01, T.V Raman <raman AT google.com> wrote:
>> >> >> >>>>>>>
>> >> >> >>>>>>> And do spend some time thinking of atomicity and
>> >> >> >>>>>>> multithreaded
>> >> systems,
>> >> >> >>>>>>> e.g. ask yourself the question "how many threads of execution
>> >> are active
>> >> >> >>>>>>> at any given time"; Hint: the answer isn't as simple as "just
>> >> one
>> >> >> >>>>>>> because my server doesn't use threads". > Raman--
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Thanks so much, that clarifies a bunch. A few questions on
>> >> >> >>>>>>>> the
>> >> >> >>>>>>>> language / voice support.
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Does the TTS server maintain an internal list and switch
>> >> through
>> >> >> >>>>>>>> it or does it send the list the lisp in a way I have missed?
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Would it be useful to have a similar feature for voices,
>> >> >> >>>>>>>> being
>> >> >> >>>>>>>> first you pick right language, then you pick preferred voice
>> >> >> >>>>>>>> then maybe it is stored in a defcustom and sent next time as
>> >> >> >>>>>>>> (set_lang lang:voice t)
>> >> >> >>>>>>>>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>>> On Apr 5, 2024, at 13:10, T.V Raman <raman AT google.com>
>> >> wrote:
>> >> >> >>>>>>>>>
>> >> >> >>>>>>>>> If your TTS supports more than one language, the TTS API
>> >> exposes these
>> >> >> >>>>>>>>> as a list; these calls loop through the list
>> >> (dectalk,espeak, outloud)
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> --
>> >> >> >>>>>>>> Robert "robertmeta" Melton
>> >> >> >>>>>>>> lists AT robertmelton.com
>> >> >> >>>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> --
>> >> >> >>>>>> Robert "robertmeta" Melton
>> >> >> >>>>>> lists AT robertmelton.com
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> [2 <text/plain; UTF-8 (8bit)>]
>> >> >> >>>>> Emacspeak discussion list -- emacspeak AT emacspeak.net
>> >> >> >>>>> To unsubscribe send email to:
>> >> >> >>>>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>> Your life is like a penny. You're going to lose it. The
>> >> question is:
>> >> >> >>>> How do
>> >> >> >>>> you spend it?
>> >> >> >>>>
>> >> >> >>>> John Covici wb2una
>> >> >> >>>> covici AT ccs.covici.com
>> >> >> >>>> Emacspeak discussion list -- emacspeak AT emacspeak.net
>> >> >> >>>> To unsubscribe send email to:
>> >> >> >>>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
>> >> >> >>>
>> >> >> >>> Emacspeak discussion list -- emacspeak AT emacspeak.net
>> >> >> >>> To unsubscribe send email to:
>> >> >> >>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
>> >> >>
>> >> >> --
>> >> >> Robert "robertmeta" Melton
>> >> >> lists AT robertmelton.com
>> >> >>
>> >> >>
>> >>
>> > Emacspeak discussion list -- emacspeak AT emacspeak.net
>> > To unsubscribe send email to:
>> > emacspeak-request AT emacspeak.net with a subject of: unsubscribe
>> >
>>
>>
>> --
>>
>> --- --- --- ---
>> Find my music on
>> Youtube: http://www.youtube.com/c/victortsaran
>> <http://www.youtube.com/vtsaran>
>> Spotify: https://open.spotify.com/artist/605ZF2JPei9KqgbXBqYA16
>> Band Camp: http://victortsaran.bandcamp.com
>> [2 <text/html; UTF-8 (quoted-printable)>]
- Re: [Emacspeak] TTS Server Implementation Questions, (continued)
- Re: [Emacspeak] TTS Server Implementation Questions, John Covici, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Parham Doustdar, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Tim Cross, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Parham Doustdar, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Robert Melton, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, John Covici, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Tim Cross, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Parham Doustdar, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Victor Tsaran, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, John Covici, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Tim Cross, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Tim Cross, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Victor Tsaran, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Devin Prater, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, John Covici, 04/08/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Rob Hill, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Robert Melton, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, T.V Raman, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Robert Melton, 04/09/2024
- Re: [Emacspeak] TTS Server Implementation Questions, T.V Raman, 04/10/2024
- Re: [Emacspeak] TTS Server Implementation Questions, Robert Melton, 04/10/2024
- Re: [Emacspeak] TTS Server Implementation Questions, T.V Raman, 04/10/2024
Archive powered by MHonArc 2.6.19+.