Try the following: pdftotext -layout file.pdf If the PDF document is formatted in multiple columns, specify the -raw option: pdftotext -raw file.pdf which usually works. Thanks to T.V. Raman, the PDF format was improved a number of years ago to allow the entire logical structure of a document to be represented independently of its presentation. Unfortunately, XPDF doesn't support this feature )when it does, it should be easy to write a PDF to XML/XHTML conversion tool). I don't know whether there are standard conventions for representing the structure of mathematical expressions in PDF, but a solution based on MathML should be possible. Here, the problem is that software which generates PDF files would need to be adapted to include the necessary structures in the output document. If you happen to know anyone who is looking for an interesting accessibility-related computer science project, then collaborating with the author of XPDF to add support for "tagged PDF", as specified in the latest edition of the PDF Reference, would be a good suggestion. Background in C++ would be required, and I expect that substantial expertise in computer science would also be a prerequisite. ----------------------------------------------------------------------------- To unsubscribe from the emacspeak list or change your address on the emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a subject of "unsubscribe" or "help"
If you have questions about this archive or had problems using it, please send mail to:
priestdo@xxxxxxxxxxx No Soliciting!Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998