[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Search]

Re: Reading math formulas in PDF files

To: Kalyan Mukherjea <kalyan.infinity@xxxxxxxxxxx>
Subject: Re: Reading math formulas in PDF files
From: Lukas Loehrer <listaddr1@xxxxxxxxxxx>
Date: Wed, 18 Apr 2007 10:15:02 +0200
In-reply-to: <17957.45251.82896.787821@xxxxxxxxxxx>

At least for some pdf files, google does an excellent job at
preserving math formulas in their "View as HTML" view. Basically, what
it appears to do is replace the math symbols by their corresponding unicode
character, so ideally most of the information is preserved. Emacspeak
currently has some trouble reading such characters, but emacs 22 has
promising features to remedy this problem some time in the (hopefully
near) future. In the meantime, I use the following command to read the
name of the character at point:

(require 'descr-text)
(defun unicode-name-at (pos)
  (interactive "d")
  (let* ((char (char-after pos))
		 (unicode (or (get-char-property pos 'untranslated-utf-8)
					  (encode-char char 'ucs))))
	(message "%s" (downcase (or
				 (cadr
				  (assoc "Name"
				   (describe-char-unicode-data unicode)))
				 "Unknown character")))))

This is emacs 22 only and make sure you look at the documentation of
describe-char-unicodedata-file. Naturally, this can only work in multibyte
mode. 

Of course, the above only helps you with pdf files that were indexed by
google. It would be interesting to know how exactly a pdf must be made up for
this conversion to work and what kind of pdf to HTML converter they use.

Best regards, Lukas

Kalyan Mukherjea writes ("Re: This is off-topic? perhaps."):
> 
> The only formula in Mannin.txt (the text file produced by pdftotxt)
> caught my attention when it was read out:
> 
> I heard:
> 	32 + 42 = 52!!!
> 
> Naturally I "woke up" paid attention and realized that this was the
> rendition of the Pythagorean identity:
> 
> $3^2+ 4^2= 5^2$. 

-----------------------------------------------------------------------------
To unsubscribe from the emacspeak list or change your address on the
emacspeak list send mail to "emacspeak-request@xxxxxxxxxxx" with a
subject of "unsubscribe" or "help"

Follow-Ups:
- Re: Reading math formulas in PDF files
  - From: Jason White <jasonw@xxxxxxxxxxx>

References:
- This is off-topic? perhaps.
  - From: Kalyan Mukherjea <kalyan.infinity@xxxxxxxxxxx>
- Re: This is off-topic? perhaps.
  - From: Jason White <jasonw@xxxxxxxxxxx>
- Re: This is off-topic? perhaps.
  - From: Kalyan Mukherjea <kalyan.infinity@xxxxxxxxxxx>

Prev by Date: Re: This is off-topic? perhaps.
Next by Date: Re: Reading math formulas in PDF files
Previous by thread: Re: This is off-topic? perhaps.
Next by thread: Re: Reading math formulas in PDF files
Index(es):
- Date
- Thread

If you have questions about this archive or had problems using it, please send mail to:

priestdo@xxxxxxxxxxx No Soliciting!

Emacspeak List Archive | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Pre 1998

Emacspeak Files | Emacspeak Blog