Speaking as someone who has relied on emacspeak for many and varied tasks, both work and pleasure, though isn't from the programming world, I have never missed indexing. The ability to easily navigate by chunks does the job, and also prevents me from falling asleep if reading late at night. It's just a different way of doing the job. Rob "Tim Cross" (via emacspeak Mailing List) writes: > > I could be missing something, but as I see it, what voice indexing would > provide is for the ability to have a 'voice cursor' (which may or may > not be the same as your emacs cursor) tracking of location where the TTS > engine is up to when generating speech from the submitted text. > > The would, for example, enable for the pausing and then subsequent > resuming of speech whereby the resumed speech would start from where the > speech was previously paused. In some systems, this is very important > becasue the system only sends large chunks of speech at a time. For > example, I've seen a simple TTS interface for reading files where it > will just start reading the file. You odn't have the ability to ask for > just a page, paragraph, sentence, line, word. You just ask for it to > start speaking and then yuou can pause and resume speech. The other > thing you may get is cursor tracking of speech. A cursor might move > through the text as it is spoken so that when you pause speech, the > cursor is at that point in yhour text. This can be useful for people who > want to read along with the speech i.e. the speech is an aid to visual > reading. > > While I can see the potential benefits in having the ability to get and > use speech index information, I've not found it very high on my wishlist > for emacspeak. This is primarily because emacspeak provides very fined > grained control over the size or chunks of speech I send at a > time. Depending on what I'm doing, I'll read/browse the data using a > movement/chank size which suits my need. For example, I'f I have a large > buffer of text I want to read, I'm unlikely to ask emacspeak to just > read the whole buffer. Instead, I'm more likely to as it to read by > page, paragraph or perhaps sentence. > > With emacspeak, I find it is very much about moving around using the > unit (letter, word, sentence, paragraph, page, buffer) best sutied for > what I'm doing. I find this provides an adequate balance between my use > case and complexity/consistency across speech servers. This has also > enabled me to experiment with different TTS engines. For example, many > years ago, I wrote speech servers for the Cepstral TTS engines. These > were a commercial TTS engine that at the time, had high quality > voices. The additional complexity and overheads involved in a TTS > interface model which supported voice indexing would likely hav made > this much harder to implement and discouraged the type of experimentation > which is at the heart of emacspeak. Likewise, I wonder if we would have > had the other TTS engines, some of which have come and gone, like the > flite and festival servers or the server written in C, or the existing > mac, swiftmac servers or the experimental windows, speech-dispatcher and > JS servers that are out there currently in various stages of > development. > > I personally don't see the amount of required effort justifyhing the > benefits given we already have the capability to work with varying > chunks of speech. Yes, it would provide some convenience, but at a high > cost which I feel is hard to justify. However, provided someone can > implement something which does not require changes to the existing > servers or their design, I would say go for it. A lot can be learnt from > implementing a TTS server. In fact, I've learnt a lot from failed > attempts to implement TTS servers as there is a considerable amount of > subtle and non-obvious aspects to a TTS server which only become clear > when you try implementing one, making it a great learning experience. At > least it was for me. > > > Victor Tsaran <vtsaran@xxxxxxxxx> writes: > > > I guess, the question stands: what user-facing problem are we trying to solve? > > > > On Tue, Apr 9, 2024 at 3:14 AM Parham Doustdar <emacspeak@xxxxxxxxxxxxx> wrote: > > > > That's true, Emacspeak doesn't currently "read" from the speech server process as far as I've seen, it only "writes" to it. > > Fixing that isn't impossible, but definitely time consuming. > > The other concrete issue is that last time I checked, console screen readers read all the text in one chunk. They don't use the > > audio CSS (forgive me if I don't use the correct name here) that Emacspeak has, which requires you to play audio icons, > > speak text with different pitch, and pauses. All of this means that you have to do extra heavy-lifting to really track the index, > > because the index you get back from the TTS engine isn't simply a position in the buffer -- it is just the position in the > > current chunk of text it has recently received. > > So that's why I'm curious if we really think it's worth it. It could be, or not, I'm not opinionated, but I'm also realizing that in > > our community, we don't really have a good mechanism to discuss and decide on things like this. > > > > On Tue, Apr 9, 2024 at 8:35 AM Tim Cross <theophilusx@xxxxxxxxx> wrote: > > > > You are overlooking one critical component which explains why adding > > indxing support is a non-trivial exercise which would require a complete > > redesign of the existing TTS interface model. > > > > For indexing information to be of any use, it has to be fed back into the > > client and used by the client. For example, tell the client to > > update/move the cursor to the last position spoken. > > > > There is absolutely no support for this data to be fed back into the > > current system. The current TTS interface has data flowing in only one > > direction, from emacs to emacpseak and from emacspeak to the TTS server > > and form the tts server to the tts synthesizer. There is no existing > > mechanism to feed information (i.e. index positions) back from the TTS > > engine to emacs. While getting this information from the TTS engine into > > the TTS server is probably reasonably easy, there is no existing channel > > to feed that information up into Emacspeak. > > > > Not only would it be necessary to define and implement a whole new model > > to incorporate this feedback, in addition to also working with TTS > > engines which do not provide indexing information, you would also likely > > need to implement some sort of multi speech cursor tracking so that the > > system can track cursor positions in different buffers. > > > > The reason this sort of functionality seems easy in systems like speakup > > or speech-dispatcher is because those systems were designed with this > > functionality. It is incprporated into the base design and part of the > > various communication protocols the design implement. Adding this > > functionality is not something which can just be 'tacked on'. > > > > The good news of course is that being open source, anyone can go ahead > > and define a new interface model and add indexing capability. However, > > it may be worth considering that it has taken 30 years of development to > > get the current model to where it is at, so I think you can expect a > > pretty steep climb initially! > > > > John Covici <covici@xxxxxxxxxxxxxx> writes: > > > > > Its a lot simpler -- indexing is supposed to simply arrange things so > > > that when reading a buffer, and you stop reading, the cursor will be > > > at or near the point where you stopped. Speakup has had this for a > > > long time and that is why I use it on Linux. But its only good for > > > the virtual console. Now speech dispatcher has indexinng built in, so > > > if you connect to that and use one of the supported synthesizers, > > > indexing works correctly and I don't see any performance hit. I think > > > all the client has to do is connect to speech dispatcher, but check me > > > on this. > > > > > > On Mon, 08 Apr 2024 08:25:27 -0400, > > > Robert Melton wrote: > > >> > > >> Is indexing supposed to be like per reading block, or like one global? Is the idea > > >> that you can be reading a buffer, go to another buffer, read some of it, then come > > >> back and continue? IE: Index per "reading block"? > > >> > > >> Assuming it is global for simplicity, it is still a heavy lift for implementation on > > >> Mac and Windows. > > >> > > >> As they do not natively report back as words are spoken, now > > >> you can get this behavior at an "Utterance" level, by installing hooks and callbacks > > >> and tracking those. With that you would need to additionally keep copies of the future > > >> utterances, even if they already where queued with the TTS. > > >> > > >> Considered from the POV of index per reading block, then you need to find ways to ident > > >> each one and its position and index them and continue reading. > > >> > > >> Sounds neat, but at least for my servers, right now, the juice isn't worth the sqeeze, I > > >> am still trying to get basic stuff like pitch multipliers working on windows via wave > > >> mangling and other basic features, hehe. > > >> > > >> > On Apr 8, 2024, at 05:20, Parham Doustdar <parham90@xxxxxxxxx> wrote: > > >> > > > >> > I understand. My question isn't whether it's possible though, or how difficult it > > >> > would be, or the steps we'd have to take to implement it. > > >> > My question is more about whether the use cases we have today make it worth it to > > >> > reconsider. All other questions we can apply the wisdom of the community to solve, if > > >> > we were convinced that the effort would be worth it. > > >> > For me, the way I've got around this is to use the next/previous paragraph > > >> > commands. The chunks are good small enough that I can "zoom in" if I want, and yet > > >> > large enough that I don't have to constantly hit next-line. > > >> > Sent from my iPhone > > >> > > > >> >> On 8 Apr 2024, at 11:13, Tim Cross <theophilusx@xxxxxxxxx> wrote: > > >> >> > > >> >> > > >> >> This is extremely unlikely to be implemented. It is non-trivial and > > >> >> would require a significant re-design of the whole interface and model > > >> >> of operation. It isn't as simple as just getting index information from > > >> >> the TTS servers which support it. That information has to then be fed > > >> >> backwards to Emacs through some mechanism which currently does not > > >> >> exist and would result in a far more complicated interface/model. > > >> >> > > >> >> As Raman said, the decision not to have this was not simply an oversight > > >> >> or due to lack of time. It was a conscious design decision. What your > > >> >> asking for isn't simply an enhancement, it is a complete redesign of the > > >> >> TTS interface model. > > >> >> > > >> >> "Parham Doustdar" (via emacspeak Mailing List) <emacspeak@xxxxxxxxxxxxx> writes: > > >> >> > > >> >>> I agree. I'm not sure which TTS engines support it. Maybe, just like notification streams > > >> >>> are supported in some servers, we can implement this feature for engines that support it? > > >> >>> Sent from my iPhone > > >> >>> > > >> >>>>> On 8 Apr 2024, at 10:24, John Covici <emacspeak@xxxxxxxxxxxxx> wrote: > > >> >>>> > > >> >>>> I know this might be contraversial, but, indexing would be very useful > > >> >>>> to me, sometimes I read long buffers and when I stop the reading, the > > >> >>>> cursor is still where I started, so no real way to do this adequately > > >> >>>> -- I would not mind if it were just down to the line, rather than > > >> >>>> individual words, but it would make emacspeak lots nicer for me. > > >> >>>> > > >> >>>>> On Fri, 05 Apr 2024 15:39:15 -0400, > > >> >>>>> "T.V Raman" (via emacspeak Mailing List) wrote: > > >> >>>>> > > >> >>>>> [1 <text/plain; us-ascii (7bit)>] > > >> >>>>> as a single call is that it ensures atomicity i.e. all of the state > > >> >>>>> gets set at one shot from the perspective of the elisp layer, so you > > >> >>>>> hopefully never get TTS that has its state partially set. > > >> >>>>> note that the other primary benefit of tts_sync_state > > >> >>>>> > > >> >>>>> Robert Melton writes: > > >> >>>>>> On threading. It is all concurrent, lots of fun protecting of the state. > > >> >>>>>> > > >> >>>>>> On language and voice, I was thinking of them as a tree, language/voice, > > >> >>>>>> as this is how Windows and MacOS seem to provide them. > > >> >>>>>> > > >> >>>>>> ---- > > >> >>>>>> > > >> >>>>>> Oh, one last thing. Should TTS Server implementations be returning a \n > > >> >>>>>> after command is complete, or is just returning nothing acceptable? > > >> >>>>>> > > >> >>>>>> > > >> >>>>>>> On Apr 5, 2024, at 14:01, T.V Raman <raman@xxxxxxxxxx> wrote: > > >> >>>>>>> > > >> >>>>>>> And do spend some time thinking of atomicity and multithreaded systems, > > >> >>>>>>> e.g. ask yourself the question "how many threads of execution are active > > >> >>>>>>> at any given time"; Hint: the answer isn't as simple as "just one > > >> >>>>>>> because my server doesn't use threads". > Raman-- > > >> >>>>>>>> > > >> >>>>>>>> Thanks so much, that clarifies a bunch. A few questions on the > > >> >>>>>>>> language / voice support. > > >> >>>>>>>> > > >> >>>>>>>> Does the TTS server maintain an internal list and switch through > > >> >>>>>>>> it or does it send the list the lisp in a way I have missed? > > >> >>>>>>>> > > >> >>>>>>>> Would it be useful to have a similar feature for voices, being > > >> >>>>>>>> first you pick right language, then you pick preferred voice > > >> >>>>>>>> then maybe it is stored in a defcustom and sent next time as > > >> >>>>>>>> (set_lang lang:voice t) > > >> >>>>>>>> > > >> >>>>>>>> > > >> >>>>>>>>> On Apr 5, 2024, at 13:10, T.V Raman <raman@xxxxxxxxxx> wrote: > > >> >>>>>>>>> > > >> >>>>>>>>> If your TTS supports more than one language, the TTS API exposes these > > >> >>>>>>>>> as a list; these calls loop through the list (dectalk,espeak, outloud) > > >> >>>>>>>> > > >> >>>>>>>> -- > > >> >>>>>>>> Robert "robertmeta" Melton > > >> >>>>>>>> lists@xxxxxxxxxxxxxxxx > > >> >>>>>>>> > > >> >>>>>>> > > >> >>>>>> > > >> >>>>>> -- > > >> >>>>>> Robert "robertmeta" Melton > > >> >>>>>> lists@xxxxxxxxxxxxxxxx > > >> >>>>> > > >> >>>>> -- > > >> >>>>> [2 <text/plain; UTF-8 (8bit)>] > > >> >>>>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx > > >> >>>>> To unsubscribe send email to: > > >> >>>>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe > > >> >>>> > > >> >>>> -- > > >> >>>> Your life is like a penny. You're going to lose it. The question is: > > >> >>>> How do > > >> >>>> you spend it? > > >> >>>> > > >> >>>> John Covici wb2una > > >> >>>> covici@xxxxxxxxxxxxxx > > >> >>>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx > > >> >>>> To unsubscribe send email to: > > >> >>>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe > > >> >>> > > >> >>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx > > >> >>> To unsubscribe send email to: > > >> >>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe > > >> > > >> -- > > >> Robert "robertmeta" Melton > > >> lists@xxxxxxxxxxxxxxxx > > >> > > >> > > > > Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx > > To unsubscribe send email to: > > emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe > Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx > To unsubscribe send email to: > emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe --
|Full archive May 1995 - present by Year|Search the archive|
If you have questions about this archive or had problems using it, please contact us.