I have seen some posts to this list and other lists, such as the speech-dispatcher list, regarding problems with getting things to work well with pulseAudio. I have also found lots of other posts, blogs and web pages where users have had issues with pulseAudio. However, I have found that with some effort, you can get pulseAudio to work well and you can take advantage of some of its advanced features. Many of these features are particularly useful to blind and VI users as they make the use of your sound hardware much more flexible. The following is an outline of my experiences and what I did to get things working. I'm posting it as I thought it might make a useful addition to the archives and hopefully, assist others in getting things to work well. It probably contains some errors and others with more knowledge or experience may feel I've got some things completely wrong. Corrections and suggestions for improvement are both welcomed and encouraged. Tim * Background I've recently been trying to get a new x86_64 based ubuntu 9.10 system working with emacspeak. This has required a switch from using the IBM ViaVoice Outloud TTS engine to using the eSpeak TTS engine as I wasn't keen to go through the hassle of installing a 32 bit version of tcl. Getting things working was challenging to say the least. However, the good news is I now appear to have things working well. In fact, there are only two issues I'd still like to resolve, very slow character echo and word echo echoing words twice. Character echo is so slow, I've had to turn it off. However, I can live with that. Word echo echoing things twice seems to come and go and I suspect I can track down the issue given some time. I have also noticed that at times, voice locking seems to get out of synch. I suspect this is due to mismatched SSML tags. It is quickly resolved by restarting the TTS or resetting things to factory defaults. As there have been a number of posts in the past regarding issues with getting espeak working and issues with things like pulseaudio etc, I thought I'd document some of what I've found in case it might be useful to others. * Hardware and platform The hardware is an Intel i7 CPU with 8 cores and 4Gb of memory. The system has an nvidia graphics card and two sound cards, an on-board Intel HDA and a SoundBlaster Audigy SE. I'm running Ubuntu Karmic (9.10). The install is pretty standard, except I've added the ubuntu sound developers PPA to get the latest packaged versions of pulseaudio. I also downloaded espeak from the sourceforge homepage and built it from sources. This turned out to be a critically important step. * Initial Situation After a fresh install of Ubuntu Karmic, I logged in and installed emacspeak. I installed the necessary dev libs to build the tclespeak.so library and then tried to use it. While I managed to get some speech out, it was of poor quality with lots of distortion and crackling. When the system was under load, latency became an issue and speech would just stop working at random intervals. Possibly the worst problem was the truncation of text sent to the TTS. Sometimes this would represent multiple words, on Other times, it would just be the last few letters. I also observed a problem with the voice lock settings. Some text just wouldn't get spoken, at least you wouldn't hear it. Turning off font-lock mode would resolve this, but that just isn't good enough. I want voice locking! I then experimented with playing other audio sources, such as mp3, wav and ogg files. I also experimented with playing them all at once and kicked of a few different programs to try and put my system under a bit of load. The sound was not good. At times it would break up, sometimes it would just stop and the quality was pretty poor. I was fairly sure it was an issue with pulseaudio and decided to dig a little deeper. * pulseAudio Things seem to be very clearly divided when it comes to pulseAudio. People seem to either love it or hate it. In searching for solutions, I found many pages on the web with titles like "How to fix pulseAduio on Ubuntu, which would turn out to be instructions on how to remove pulseaudio from your system. I also found numerous posts on lists such as the speech-dispatcher list where people just couldn't get pulseAudio to work. I began to seriously consider removing it from my system. However, before doing that, I wanted to investigate further. In reading about pulseAudio, I soon began to realise why many people were so keen on it. If the hype is to be believed, it will solve many of the problems that I've often been frustrated by on Linux. Some of the things which got my attention included - Ability to play multiple sound sources at once and control their individual volume. How good it would be to play some music in the background without drowning out my speech! - Send different sound sources to different speakers, including ones conected to other computers on my local network. I could stream speech to my system in the lounge room and listen to it sitting in a comfortable chair. I can also send my music to my stereo. - Ability to play multiple sound sources on the one cheap sound card or route sounds to different sound cards on a system with multple cards. Plus many other potentially useful features for anyone who relies on sound on a day-to-day basis. I decided to jump in with both boots and see where I ended up. * Configuring Pulse The first thing I did was make sure I restored everything to the vendor installed configuration. I ensured the pulse default config files in /etc/pulse were the package versions and removed any personal .asoundrc file I had in my home directory. I also deleted the .pulse directory in my home directory. I wanted to start with a vanilla configuration. * Hardware Synthesizer It was obvious that getting things working was going to take a bit of effort. I also had the old catch 22 that tends to come up for blind users on Linux all too often. I want to fix sound so that I get text feedback and a usable interface, but I need text feedback and a usable interface to do that. My solution was to drag out my trusty old hardware dectalk express. I think it is very useful to keep a good old hardware synth about for exactly this type of situation. Luckily, I had insisted on a serial port in this new computer. These days, with the growth in popularity for USB, serial ports are no longer standard on most new systems. They have gone the way of the Dodo and floppy drive. You will only get one if you ask for it. Even when you do, the local sales person is likely to be like mine and consider your mad for asking.Don't let yourself be talked out of it. If you have serial devices you want a serial port. There are serial to USB converters out there, but results can vary greatly. A serial port is often very useful for many tasks. For one thing, its easier to hack with than USB because you have easy access to all the pins. I connected my Dectalk and soon had emacs and emacspeak running with the dtk-exp driver. * Logging The first thing I did was increase the logging level for pulseaudio by editing the /etc/pulse/daemone.conf file and changing the logging level from notice to info. This creates a lot of useful data in /var/log/messages. I highly recommend making copies of all config files prior to editing them. Its very easy to spiral down into a confused mess when playing with this stuff because you are dealing with so many different levels of interaction. I had a number of points where things just seemed to fall apart and I'd confused myself beyond recovery. At this poit, I would copy the original files back to wipe out my changes and get back to a known 'vanilla' state. I also use this technique to confirm I've fixed the problem by intention and not by accident. Once I believe I know what the solution is and have documented it, I copy the original files back into place, reboot the system and follow my documented changes to verify it really does fix things. This may take more time, but at least I end up with greater confidence I really do know how to fix the problem. Usually, this will save time in the future when things get screwed up due to a distro update. I then started going through the log messages, looking for anything that might indicate an issue. Changing logging from notice to info creates a fair amount of data. To reduce this and help extract only the most recent, I used emacs grep mode and searched for pulseaudio log messages witht he same process ID as the current pulseaudio process. You can find the process ID of the current pulseAudio process using the ps command or by looking for the pulsAudio PID file in /var/run/<user>/pulseaudio.pid. * System Mode and User Mode Pulse Audio can run in two different modes, system mode and user mode. In system mode, a single pulseAudio daemon is started at boot time and users connect to that daemon to play audio. While this can be an easy setup in some cases, it can raise some issues of conflict, especially on a multi-user system. There are also added issues of access control, configuration and module loading that become more complex when pulseAudio runs in system mode. The other mode of operation is user mode. Under this approach, a pulseAudio daemone is started as part of your login process and is owned by you. This eliminates som eof the access control issues, gives you more control over how things are configured without needing to use super user privileges etc. User mode tend sto be the default for most distributions and is probably the best way to go. * High Priority and Realtime Scheduling One of the problems with pulseaudio is that it is still under heavy development and the documentation is a bit behind. This has been made somewhat worse because Ubuntu and other distros have been working towards improving how pulseAudio works, plus there have been some new features introduced in the latest stable Linux kernels that introduce alternative ways to change the scheduling privileges of non-root processes in a more secure manner. The end result is that much of the documentation you will find is not only out of date, it is misleading. To make matters worse, the documentation on how this stuff should now be managed simply doesn't seem to exist. Things are hinted at here and there, but there is no single comprehensive explination or howto. Much of what I've done has been based on reading between the lines and scanning change logs for hints etc. I've made some wild guesses and adopted a very scientific 'suck and see' approach. Essentially, I made a change, restarted things and looked to see if it made a difference. If it didn't, I changed things back and tried the next option. This process was continued until I found things working. I don't pretend to understand exactly why it worked, only verified that it does appear to for me. Of course, your milage may vary! The first thing I noticed in the pulse logs was that pulseAudio was failing to obtain high priority and realtime scheduling privileges. This was where the first of many confusing points were encountered. The pulseAudio developers recommend running the system with high priority processes and realtime scheduling. However, this can raise some concerns for some system administrators and distribution designers. There are two core issues with setting up processes to have a higher than normal priority and realtime scheduling. Firstly, traditionally, you needed to run as root to alter these settings. The standard way to achieve this use to be to make the binary owned by root and set its permissions so that it became a setuid program and would execute with root permissions regardless of who ran it. While this does resolve the immediate problem, it raises significant security issues. While most of these are only relevant on multi-user systems, most Linux distributions have been working very hard to eliminate the use of setuid programs wherever possible. The other problem is a genral problem associated with running any process with high priority realtime scheduling. The problem relates to runaway processes and how to kill them. If you have a high priority realtime scheduled process go into an infinite loop and it starts consuming resources, it can be nearly impossible to kill it. Basically, the problem is your user processes simply don't have high enough priority to 'get in front' and kill the process. Suddenly, you load goes through the roof, memory starts vanishing and your system becomes unusable. Often, the only fix is to do a hard reset. While both of these concerns are very legitimate, they are not as serious on a single user workstation as they can be on a multi-user system. If you have your workstation behind a firewalled router, such as is common with many DSLsetups these days and you are the only user on the system, you don't need to be overly concerned. Another approach to allowing user processes to obtain high priority and realtime scheduling is to configure the user running the process so that they can set the nice and realtime priority privileges for processes they run. The problem with this approach is that it enables the user to obtain these privileges for any process they run, not just for a specific process. This is also a potential issue for system administrators on multi-user systems. They don't want their users to have the ability to modify priorities and scheduling of resources as this would make it far to easy to cripple the system, either accidentally or intentionally. The more modern approach to this problem is to implement a framework that will enable users to run specific processes with high priority and realtime scheduling that have been approved by the system administrator (or more commonly these days, by the distribution designers) and enable this to be achieved without the end user needing access to super user privileges. The 'standard' pulseAudio documentation still recommends setting the pulseAudio binary to setuid and using a special group called pulse-rt to control which users can run pulse with realtime scheduling privileges. However, according to comments in the pulseaudio README.Debian file, you no longer need to do this for Ubuntu as it is using the rtKit package. Unfortunately, there is no concise clear documentation that clearly explains how this now all works. While I've resolved the issue, I'[m not 100% happy with how I achieved this. Part of the problem here is that after 17 years of running Linux, I'm well and truely over operating system and kernel tweaking. These days, I find such things boring and frustrating. I just want my computer to work and view it as a tool that enables me to achieve my other projects, which I'm far more interested in. As a consequence, I've not really kept up in developments relating to things like console-kit, policykit or dbus. From my reading, all or some of these play a role in assigning special privileges, such as being able to set processes to use realtime scheduling etc. It seems there are two ways that I could achieve this. The first way was through the use of policykit and a package called rtkit. The second was thorugh the use of the PAM limits configuration file. The mroe correct and modern solution is the policykit and rtkit route. I chose the easier PAM solution. Essentially, I edited the /etc/security/limits file and added the appropriate rtpriority and nice entries. I then logged out and logged back in so that the new settings wuld take affect and checked the pulseaduio logs. I was now successfullly gaining the high priority and realtime scheduling recommended for pulseaudio. The major issue with this solution is that my user account now has the ability to set high priority realtime privileges on any process I create, not just pulseAudio. However, as I am the only user of the system and as I already have su privileges, this is not a big issue. If I had the time and the interest, the propper solution would be to read up on dbus and policykit and its tools, like pklocalauthority and learn how to grant the necessary authorisations to my user account. I do need to read up on dbus as it is rapidly becoming the default message passing mechanism on Linux and I really do need to learn about policy kit. However, as things are still evolving and as the documentation is till somewhat scant, I'll wait a bit. Besides, I still have to work on becoming more tolerant to the overly verbose and frequently poorly designed use of XML that seems to be a plague in current modern setups. I'm obviously an old dinosaur that misses the concise s-expressions and key-value configuraitons of yesterday! However, I'm confident that sanity will prevail in the end. Java will die the death it deserves, XML won't be seen as the answer to every problem and we may even see some sane standardisation in how system configurations are managed. Until then ... ... Now that my pulseAudio process had high priority realtime scheduling, I found sound was more reliable and less impacted by system load. Sound didn't break up everytime I started downloading large amounts of data over my network link or started building the latest version of emacs etc. However, I stil wasn't happy with the quality of the sound and there were still log entries that needed investigation. There was more work to be done. * Sample rates and quality In addition to looking at the log entries from pulseAudio, I also used the pacmd program to query the system and find additional information regarding the state of pulse. The pacmd program is extremely useful and unlike other programs for manipulating pulseAudio, it is text based and runs fine within emacs. Using this program, you can find out details about the system and set various options or configure modules, sound sinks and sources. I noticed that there was a mismatch between the sample rate pulse was using and the 'native' sample rate of my sound card. My Audigy card likes a sample rate of 48000, but pulse was using 44100. I changed the pulse configuraiton to match my sound card. I also noticed that the native sample format for my Audigy card is s32_le. I changed the pulse configuration from its default s16le to also match my sound card. I also noticed that pulseaudio has a setting for resample-method. This can be set to use various different methods with differeing performance and quality. By default, it is set to be quite low. I experimented with different settings and found that it did affect both the quality of the output as well as the load put on your system. After a bit of experimentation, I selected speex-float-5, which appears to give good quality output without an excessive load on the system. The correct setting will depend a lot on your hardware and what type of work and sound activity you have going on. . On restarting pulse, sound quality did sound better. However, it is worth noting that I only heard the improved quality when my cards output was connected to my external amplifier and good quality stereo speakers. No real difference in quality was observed with the cheap built-in speakers attached to my HP monitor. By this point, I found pulseaudio was performming a lot better. I could now play multiple sound sources and even under quite heavy load, I did not encounter drop out, distortion or high latency. I was quite happy with my pulseaudio setup. I then proceeded to configure things to enable both my sound cards to work together with pulse. I won't go through the issues I ran into there, but can provide some useful pointers if anyone else runs into issues. In the end, I ended up with a configuration whereby I could control which sound card was used via the pacmd program and can send some clients to one sound card and some to another. * .asoundrc According to the pulseaudio website, you should create a .asoundrc file with entries for the pulse plugin. This will allow you to route any sound played via alsa through pulse. They also suggest setting up your .asoundrc so that by default, all alsa output goes through pulse. I did this with the folowing .asoundrc file and it appears to work very well. pcm.pulse { type pulse } ctl.pulse { type pulse } pcm.!default { type pulse } ctl.!default { type pulse } It is worth noting that there are some warnings regarding this setup if you are not using the udev based auto-configuraton modules to setup pulse. If you are loading the pulse modules manually or statically in the config file, you need to ensure they don't try to also bind to the default alsa device as you will get a loop. Using the udev and hal modules, pulse binds to the soundcard at a lower level and avoids this problem. By default, Ubuntu uses the udev and hal configuration, so unless you have modified the default.pa or system.pa files in the /etc/pulse directory, you can probably use an .asoundrc file such as the one above. * Upgrading PulseAudio While I was now happy with my pulseAudio configuration, there were still a couple of minor issues, such as random changes in speech volum. I therefore decided to add the ubuntu sound developers PPA to my sources list for APT and upgrade to the latest version they were working with. While I think this has made my pulse setup more stable, I'm not sure if it has really made a huge difference. However, as pulseAudio is under heavy development, it probably makes sense to be at the leading edge. I do still have my old hardware Dectalk express connected, so at least I'm not stranded if a pulse upgrade should break things. However, it is important to recognise that the pulse packages in the sound developers PPA may not be stable and using them does have risks. If you must have a very stable setup, I would advise sticking with the standard ubuntu packages. So far, the dev packages have worked well for me. * Espeak Despite getting pulse to work well, I still had problems with espeak. Text was frequently truncated, some text just didn't appear to get spoken at all. Sometimes, the speech would start at the middle of the sentence and then stop just before the last word or halfway through it. Often, words were pronounced badly and difficult to understand. I even had a couple of instances where text speaking rates varied from very slow to extremely fast. At first I thought this was a problem with the tclsepak library. I wasn't able to reproduce the problems with the stand-alone espeak program that comes as part of the distribution. However, after a few cut and pastes andplaying around, putting debug statements in tclspeak.cpp and modifying some regsub expressions int he tclsh espeak script, I began to realise that the server was sending text correctly and it had valid SSML markup. The problem had to be with the espeak library. On visiting the espeak homepage, I noticed a new version has recently been released and decided to grab it and give it a go. I donwloaded the sources and checked the ReadMe file. There wasn't much to it and building the system seemed pretty straight-forward. On checking the Makefile, I noticed that it had three different audio output options. The default was to use the portaudio library. I checked what libs the Ubuntu supplied version was built against and saw it was portaudio, so I decided to go with the default. On building the libs and installing them, I found no noticable improvement. I still had problems with text being truncated and missing words inthe text. Looking into things further, I began to think that the correct thing to do was to build the espeak library with pulse rather than portaudio. I re-built the library after commenting out the portaudio option and enabling the pulse option. I then installed the newly built library and fied up emacs and emacspeak. Success! The problems with truncated speech, misisng words and bad pronounciations are all gone. The sound quality is good and the server appears to be very stable. I now had a system that is owrking well enough for dat to day use. * tclespeak.so During my debugging sessions, I modified tclespeak.cpp to add some debug information and to log additonal information. I found a few things which didn't match with the espeak docs or in how the libesepak was being used by the espeak program that comes with the libespeak distribution. I also found some potential inefficiencies in the text being sent for synthesis, which may or may not impact on performance. This included things like multiple whitespace characters, newlines etc. There also appears to be some redundent SSML tagging going on. I' m now experimenting with some of this to see if I can improve the situation further. If I find any of the changes I experiment with improve things, I will provide patches. * Conclusions - Getting pulseaudio working correctly is a non-trivial task. It is unlikely that distributions will get this working well 'out of the box' for some time as there are so many dependent variables to consider. Things vary considerably depending on sound card hardware, system CPU, memory etc. Finding a good general default configuration that will work well for everybody is going to be very difficult. - Many of the promises made by pulseAudio developers are realisable and once you get it working, it works well. I think pulseAudio is here to stay and we need to make it work. More importantly, the effort is worthwhile as it does offer a lot of benefits. - With respect to emacspeak and the tclespeak interface, things are complex because you have four different layers to consider. However, a careful and methodical approach seems to work and provide positive rewards. I suggest the following approach 1. Get sound working with ALSA 2. Get pulseAudio working. Make sure it is using high priority RT scheduling 3. Get multiple sound sources working with pulseAudio 4. Setup a .asoundrc file so that all ALSA sound goes via pulseAudio. This will eliminate the likelyhood of contention between ALSA and pulseAudio in accessing sound hardware. 5. Make sure that the libespeak library has been built to use pulseAudio audio rather than portAudio for its output. - The espeak interface for emacsepak does not provide the same level of quality in either speech or responsiveness as ViaVoice Outloud. However, it is as good as the dectalk express in my opinion. In fact, getting use to using espeak is very similar to getting use to the hardware dectalk after having used outloud for years. There does not seem to be any plans to update outloud to work with modern libraries or to provide a 64 bit version. Like it or not, eventually, ViaVoice Outloud is likely to vanish from the scene. At this time, esepak appars to be the most viable alternative. - I am quite certain some of my assumptions are either misguided or completely wrong. In particular, I'd love to get more information on the correct way to grant authorisation to use high priority and realltime scheduling on a modern Linux system, especiallly ubuntu. If you have information or pointers, please let me know. - I find it very strange that given Ubuntu now shipps with pulseAudio as the default configuration, why a program like espeak and its libarary libespeak, is not built to use native pulse access. I wonder if there is some issue with doing this I'm not aware of or is it just simply that the espeak package maintainers haven't updated things or do they continue with the default so that the package will continue to work on both pulse and non-pulse based configurations? Given what I have encountered, it may be time to release two versions - a libespeak-pulse and a libespeak-portaudio version. -- Tim Cross tcross@xxxxxxxxxxx There are two types of people in IT - those who do not manage what they understand and those who do not understand what they manage. ----------------------------------------------------------------------------- To unsubscribe from the emacspeak list or change your address on the emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a subject of "unsubscribe" or "help".
If you have questions about this archive or had problems using it, please send mail to:
priestdo@xxxxxxxxxxx No Soliciting!Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998