Skip to Content.
Sympa Menu

emacspeak - Re: [Emacspeak] Swiftmac 2: pre-PR request for testers

Subject: Emacspeak discussion list

List archive

Re: [Emacspeak] Swiftmac 2: pre-PR request for testers


Chronological Thread 
  • From: Robert Melton <lists AT robertmelton.com>
  • To: "T.V Raman" <raman AT google.com>
  • Cc: Emacspeaks <emacspeak AT emacspeak.net>
  • Subject: Re: [Emacspeak] Swiftmac 2: pre-PR request for testers
  • Date: Wed, 27 Mar 2024 10:00:16 -0400

Raman--

Thanks for keeping me focused on this feature.

I actually got a solution that I am happy with, will take a
few days to finish as it adds a lot of extra steps but it
meets all my checkboxes for me to use the feature.

1. Doesn't mess with latency
2. Doesn't require post-processing
3. Implemented with maintained libraries

So, for the curious, the way I ended up doing it is a bit of a
Rube Goldberg machine but it seems to work.

Apple has the idea of "spacial audio" which lets you place
sounds in 3d space, so it works with simple stereo all the
way up to complex 7.1 setups. Sadly, there is no way to
directly Hook this up to the speech synthesizer.

So, the solution is to use a buffer callback using the write
function, route that to an audio player that is already bound
inside a mapped spatial environment. It automatically segments
the speaking in small chunks to keep latency low.

Got a POC of it working, very unpolished and a little broken
but it proves the theory, and the main thing I will have to
do is change how I process the queue as now from the POV of
the queue, 2 minutes of audio is spoken near-instantly.

This means Swiftmac 2.1 will have the ability to run two or
more copies each copy being in its own "spacial environment"
allowing effectively left/right channel support. Heck, from
playing around with it, it emulates fairly well, you could
add a "behind you" Swiftmac is you wanted.

Anyway, implementation will take a bit to get right, but I
think all the pieces are in place now.

Also think I found a way to make Beepcaps work really well.


> On Mar 27, 2024, at 09:36, T.V Raman <raman AT google.com> wrote:
>
>
> Also,
>
> 1. Always keep the twaine -- what you want to implement, and how the
> "average" user sets it up separate.
> 2. Barring that ie if you only implement what "works out of the box"
> on every machine, then you limit yourself to what the vendor
> decided you should have.
> 3. Worse, it kills all innovation.
> 4. For the user who wishes do no extra work, there is what they got
> when they wrote that nice large check.
> 5. Said differently, luser environments already exist for lusers,
> focus on innovation first.
> 6. If that innovation works and is useful, then one can figure out how
> to bring that to the unwashed masses!
>
> T.V Raman writes:
>> Yes, virtual devices is how this works on Linux, see the pipewire
>> setup in the Emacspeak codebase where I have a 7.1 device which is
>> what I use for playing music etc.
>>
>> The asoundrc in the emacspeak codebase has many examples.
>>
>> I suspect zeroing out one channel is failing because you might be
>> actually ruining the audio format headers, On linux you're say it was
>> "raw data" and explicity specify all the details of how to interpret
>> the data, see servers/piper/pipspeak -- which may be interesting to
>> you in its own right, see lisp/pip.el
>>
>>
>>
>> Robert Melton writes:
>>> Raman--
>>>
>>> Yeah, I chased my tail for a bit today trying to get the PCM buffer to
>>> zero out
>>> half of it, seems like legit OS bug in MacOS, reaching out for more help,
>>> but
>>> I have it reduced to a minimal example and still no luck. To double my
>>> annoyance it works fine in the old version (NSS variant) which will be
>>> removed
>>> in MacOS 15. MacOS is a second class citizen in AVFoundation support.
>>>
>>> That said, I might have found a solution, it would require a little setup
>>> by the
>>> user but you can create new composite device on MacOS and you can
>>> reduce the volume to 0 on the right or left channel. Trying to see if I
>>> can do
>>> this setup programmatically and confirm I can target that device for
>>> output, but
>>> it is at least a path (got the idea from the emacspeaks code).
>>>
>>> But enough zero progress for tonight, more zero progress tomorrow!
>>>
>>>> On Mar 26, 2024, at 22:00, T.V Raman <raman AT google.com> wrote:
>>>>
>>>> "Robert Melton" (via emacspeak Mailing List) <emacspeak AT emacspeak.net>
>>>> writes:
>>>>
>>>>
>>>> One possible thing to try:
>>>>
>>>> if you can get your hands one the wave buffer from TTS, then it might be
>>>> something as simple as zeroing out the buffers for one channel, ie
>>>> alternate frames in the audio data > Raman--
>>>>>
>>>>> Correct, sadly I have been unable to find a solution for the channel
>>>>> targeting
>>>>> that doesn't follow the path TTS -> wav file -> process channels ->
>>>>> play wav.
>>>>>
>>>>> Frustrating, on iOS and even watchOS there are solutions to do exactly
>>>>> this,
>>>>> I am still digging around for a way to do this that isn't completely
>>>>> gross.
>>>>>
>>>>>> On Mar 26, 2024, at 10:19, T.V Raman <raman AT google.com> wrote:
>>>>>>
>>>>>> I see you didn't mention multiple TTS streams, Mac users will continue
>>>>>> to miss functionality that you get through async notifications
>>>>>> --
>>>>>
>>>>> --
>>>>> Robert "robertmeta" Melton
>>>>> lists AT robertmelton.com
>>>>>
>>>>> Emacspeak discussion list -- emacspeak AT emacspeak.net
>>>>> To unsubscribe send email to:
>>>>> emacspeak-request AT emacspeak.net with a subject of: unsubscribe
>>>>>
>>>>
>>>> --
>>>
>>> --
>>> Robert "robertmeta" Melton
>>> lists AT robertmelton.com
>>
>> --
>
> --

--
Robert "robertmeta" Melton
lists AT robertmelton.com




Archive powered by MHonArc 2.6.19+.

Top of Page