One of my most important (and now much beloved) families of scientific journals has just added full texts in xml to pdf as its public formats. A recent example can be found at <http://www.geosci-model-dev.net/7/2867/2014/gmd-7-2867-2014.xml>. This is great but it gets better, they're also using mathml for all the inline and displayed mathematics. At this point I became slightly lightheaded :-) So, what's the smoothest way to access such content in emacspeak? Running (shr-insert-document (libxml-parse-xml-region (point-min) (point-max))) does a half decent job on the inline mathematics, I suspect largely by ignoring all the formatting. It's ignoring other things too, probably because it didn't find the DTD. Still, quite usable after 5 minutes' work. Now the hard bit. I would like to serialize all the mml constructs and include them in the resulting parse tree as text. The serialization seems doable, the python module mathDOM looks like it will do the job. I'd rather not replicate all the functionality of libxml-parse-xml-region so is there a way I can intervene in the process to handle the parsing of certain elements externally? Am I going about this all the wrong way? You'll have to forgive mesome excitement, after 30 years in research this is the first time I've gone to a public site and been guaranteed I can download material with the mathematical content intact. Now I just need to extract it. -- Peter Rayner room 343 School of Earth Sciences, University of Melbourne, 3010, Vic, Australia tel: work: +61 (0)3 8344 9708; fax: +61 (0)3 8344 7761 mobile +61 402 752 379, skype: petermorag mail-to: prayner@xxxxxxxxxxx google scholar profile <http://scholar.google.com.au/citations?user=H3up71wAAAAJ&hl=en>
|All Past Years |Current Year|
If you have questions about this archive or had problems using it, please contact us.