Learning from our mistakes!!!!!!!!!!
Hello everyone!!!
I wanted to share one of my experience while working on Hoccleve Archive project. This project invloves recovery of computer generated lexicographical files created in the early 1980’s on a now-lost piece of custom built software. We discovered how to open the files and wrote a custom script that transformed them into both simple .txt and XML files. The XML version were extensively tagged for grammatical information useful to literary scholars of middle English.
In the process of extracting information from XML and displaying it on the browser we were struck at one point because of some silly technical error. The error was because of XML escape sequences.
The XML Specification has 5 predefined entities representing special characters. Any word which has to be displayed as it is which contains these special characters should perform some preprocessing.
Character XML Entity Replacement XML Numeric Replacement
” " "
< < <
> > >
& & &
‘ ' '
For example, for displaying “v&a” on browser we have to perform preprocessing on it. It should be written as “v&a” in the XML document for displaying “v&a” on the browser. Hope, this post is informative for people working with XML documents.
References:
- http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
- http://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents