On Wed, Apr 18, 2007 at 10:15:02AM +0200, Lukas Loehrer wrote: > > At least for some pdf files, google does an excellent job at > preserving math formulas in their "View as HTML" view. Interesting. There are also PDF files that contain only scanned images of text. To read these, you need OCR software, and it now appears that quality, free as in freedom, OCR solutions are coming down the pipeline: http://code.google.com/p/ocropus/ and it shouldn't be difficult for the Emacs Lisp enthusiasts on the mailing list to write a function that will run OCR Opus on a set of image files, or even scan a page, and then read the output into an Emacs buffer. Ideally this would be an Emacs mode that lets you set scanning parameters. The OCR software itself isn't expected to be ready for release until late next year, but I'm sure members of this list will be helping with the beta testing along the way. XPDF can extract image files from PDF documents, which could then be converted to whatever format the OCR software accepts. ----------------------------------------------------------------------------- To unsubscribe from the emacspeak list or change your address on the emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a subject of "unsubscribe" or "help"
If you have questions about this archive or had problems using it, please send mail to:
priestdo@xxxxxxxxxxx No Soliciting!Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998