Raman-- Thanks for keeping me focused on this feature. I actually got a solution that I am happy with, will take a few days to finish as it adds a lot of extra steps but it meets all my checkboxes for me to use the feature. 1. Doesn't mess with latency 2. Doesn't require post-processing 3. Implemented with maintained libraries So, for the curious, the way I ended up doing it is a bit of a Rube Goldberg machine but it seems to work. Apple has the idea of "spacial audio" which lets you place sounds in 3d space, so it works with simple stereo all the way up to complex 7.1 setups. Sadly, there is no way to directly Hook this up to the speech synthesizer. So, the solution is to use a buffer callback using the write function, route that to an audio player that is already bound inside a mapped spatial environment. It automatically segments the speaking in small chunks to keep latency low. Got a POC of it working, very unpolished and a little broken but it proves the theory, and the main thing I will have to do is change how I process the queue as now from the POV of the queue, 2 minutes of audio is spoken near-instantly. This means Swiftmac 2.1 will have the ability to run two or more copies each copy being in its own "spacial environment" allowing effectively left/right channel support. Heck, from playing around with it, it emulates fairly well, you could add a "behind you" Swiftmac is you wanted. Anyway, implementation will take a bit to get right, but I think all the pieces are in place now. Also think I found a way to make Beepcaps work really well. > On Mar 27, 2024, at 09:36, T.V Raman <raman@xxxxxxxxxx> wrote: > > > Also, > > 1. Always keep the twaine -- what you want to implement, and how the > "average" user sets it up separate. > 2. Barring that ie if you only implement what "works out of the box" > on every machine, then you limit yourself to what the vendor > decided you should have. > 3. Worse, it kills all innovation. > 4. For the user who wishes do no extra work, there is what they got > when they wrote that nice large check. > 5. Said differently, luser environments already exist for lusers, > focus on innovation first. > 6. If that innovation works and is useful, then one can figure out how > to bring that to the unwashed masses! > > T.V Raman writes: >> Yes, virtual devices is how this works on Linux, see the pipewire >> setup in the Emacspeak codebase where I have a 7.1 device which is >> what I use for playing music etc. >> >> The asoundrc in the emacspeak codebase has many examples. >> >> I suspect zeroing out one channel is failing because you might be >> actually ruining the audio format headers, On linux you're say it was >> "raw data" and explicity specify all the details of how to interpret >> the data, see servers/piper/pipspeak -- which may be interesting to >> you in its own right, see lisp/pip.el >> >> >> >> Robert Melton writes: >>> Raman-- >>> >>> Yeah, I chased my tail for a bit today trying to get the PCM buffer to zero out >>> half of it, seems like legit OS bug in MacOS, reaching out for more help, but >>> I have it reduced to a minimal example and still no luck. To double my >>> annoyance it works fine in the old version (NSS variant) which will be removed >>> in MacOS 15. MacOS is a second class citizen in AVFoundation support. >>> >>> That said, I might have found a solution, it would require a little setup by the >>> user but you can create new composite device on MacOS and you can >>> reduce the volume to 0 on the right or left channel. Trying to see if I can do >>> this setup programmatically and confirm I can target that device for output, but >>> it is at least a path (got the idea from the emacspeaks code). >>> >>> But enough zero progress for tonight, more zero progress tomorrow! >>> >>>> On Mar 26, 2024, at 22:00, T.V Raman <raman@xxxxxxxxxx> wrote: >>>> >>>> "Robert Melton" (via emacspeak Mailing List) <emacspeak@xxxxxxxxxxxxx> >>>> writes: >>>> >>>> >>>> One possible thing to try: >>>> >>>> if you can get your hands one the wave buffer from TTS, then it might be >>>> something as simple as zeroing out the buffers for one channel, ie >>>> alternate frames in the audio data > Raman-- >>>>> >>>>> Correct, sadly I have been unable to find a solution for the channel targeting >>>>> that doesn't follow the path TTS -> wav file -> process channels -> play wav. >>>>> >>>>> Frustrating, on iOS and even watchOS there are solutions to do exactly this, >>>>> I am still digging around for a way to do this that isn't completely gross. >>>>> >>>>>> On Mar 26, 2024, at 10:19, T.V Raman <raman@xxxxxxxxxx> wrote: >>>>>> >>>>>> I see you didn't mention multiple TTS streams, Mac users will continue >>>>>> to miss functionality that you get through async notifications >>>>>> -- >>>>> >>>>> -- >>>>> Robert "robertmeta" Melton >>>>> lists@xxxxxxxxxxxxxxxx >>>>> >>>>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx >>>>> To unsubscribe send email to: >>>>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe >>>>> >>>> >>>> -- >>> >>> -- >>> Robert "robertmeta" Melton >>> lists@xxxxxxxxxxxxxxxx >> >> -- > > -- -- Robert "robertmeta" Melton lists@xxxxxxxxxxxxxxxx
|Full archive May 1995 - present by Year|Search the archive|
If you have questions about this archive or had problems using it, please contact us.