| ESPX - an ECMAScript Parser for (almost) XML, with namespaces | 
| TinyXSL - XML transform in-Script mini-Language | 
Here's the download.
Copyright (c) 2000, 2001, 2002 Cyril Jandia ( http://www.cjandia.com/ )
See the file copying.txt for copying permission.
| Abstract | 
As its poorly-imaginative name suggests, "ESPX" is an
  ECMAScript-coded parser for a subset of
  XML 1.0 -
  that is, no DTD support yet (external nor internal subset).
However, since version 20010206 it
  comes with full support for the
  XML namespaces syntax and
  (name scoping-)semantics additions to XML. Also, as a
  main implementation goal, ESPX was written with strict
  ECMAScript
  compliance in mind - see "Tested user agents"
  below.
As far as performances are concerned, please see "The performance issue" below.
Anyway, this should be considered a beta release.
For the impatient, here's the source
  code as well as a simple demo. See also:
  the FAQ, which is a TinyXSL demo
  (as you know, Small is beautiful
 ;^)
| Contents | 
| Basic Testing | 
[xmltests].
For convenience, there is also an all-in-one ZIP file.
Also, for comparison, here are the results of running, against these tests :
msxml-test.txt<SCRIPT> (in its version 1.1)xml4script-test.txt| Frequently Asked Questions Here | 
| Changes from previous releases | 
Note as ESPX is now in need of minimum user feedback, the pace of revisions should decrease (or even be null for a while). However, see "From here ..." below.
This version fixes/adds support for the following (bug fixes and/or design changes first):
XMLParser._lookForInvalidCharacters() (internal) utility function;TinyXSLProcessor.getVersion() returns -0.73;XMLParser.getVersion() now returns 20020313XMLDocument.getElementsByTagName() (thanx to Gaurav Pal ;o)XMLNode.uniqueID(), XMLDocument.createWhatever());TinyXSLProcessor.getVersion() returns -0.75;XMLParser.getVersion() now returns 20020212]]> no more allowed in element content (per http://www.w3.org/TR/REC-xml#NT-CharData);<!DOCTYPE ...>;TinyXSLProcessor.getVersion() returns -0.76;XMLParser.getVersion() now returns 20020112<!DOCTYPE ...> is now silently ignored (no more "unsupported document type declaration" error);TinyXSLProcessor.getVersion() returns -0.77;XMLParser.getVersion() now returns 20020110TinyXSLProcessor.getVersion() returns -0.78;XMLParser.getVersion() now returns 20020109xml:space, lang, etc.) :^(TinyXSLProcessor.getVersion() returns -0.79;XMLParser.getVersion() now returns 20011228xml:base;XMLParser.xmlBase property, which you can use in a fashion similar to that of XMLParser.xmlLang;TinyXSLProcessor.getVersion() returns -0.80;XMLParser.getVersion() now returns 20011205TinyXSLProcessor.getVersion() returns -0.81;XMLParser.getVersion() now returns 20011116;TinyXSLProcessor.getVersion() returns -0.82;XMLParser.getVersion() now returns 20010411Initial release.
| ESPX / TinyXSL files supported | 
| Description | 
ESPX is not a validating parser. It does
  not read any form of internal DTD subset either. All it
  does at a minimum is check the document for basic well-formedness, of proper
  elements nesting, of attribute assignments and of character/predefined
  entity references (e.g.,  , &...)
  utilization. Eventually, it builds an unoptimized tree data
  structure in memory, to represent the parsed document.
Since there is no form of DTD declarations support, ESPX as no other choice
  than to treat attribute values as CDATA (all whitespace is
  kept).
Also, CR/LF sequences and CR
  characters alone are normalized to LF once for all on input,
  just before parsing.
As it parses the document, ESPX's XMLParser object tries to
  build some kind of a DOM-like tree data structure. Note the latter is
  not compliant with the official DOM
  (see DOM Level 1). At
  most can you rely on more or less universal features semantics like
  nodeName, nodeValue, nodeType,
  parentNode and so on. But you won't find any equivalent for
  insertBefore() and the like.
The parse result tree is given by the XMLParser.document
  property. The same parser object may be reused multiple times to parse
  different documents. See the <script> tag at the top of
  simple.htm to know how all this is put to
  work.
| The performance issue | 
On a 3-year old Pentium II, 350mhz, 64mb, running NS 4.7 over Win98, ESPX parses a 12kb document and builds the DOM-like tree in less than 0.6 second, while it is done in less than 1.5 second for a document twice as big (perf1.xml and perf2.xml where markup roughly represents 33% of document size).
However, for documents above 36kb, you must be aware that the parsing/tree building durations currently experienced are simply not acceptable (more than 2 seconds). So there is quite a big place for improvement in this area. FYI: as an order of magnitude and for the same small documents (< 50kb), ESPX appears on the average to be between fifty and one hundred times slower than Microsoft's C++-coded MSXML.
Note if markup is sparse, representing less, say, than 5% of document size, then ESPX performs better (1.3 second under IE for 90kb-size spars90k.xml for example).
| Things to know for a proper use | 
ECMAScript uses Unicode);standalone="yes"); ) has
    to be written   or   (however,
    see "Supported HTML 4.0 entities"
    below);&,
    ', >, < and
    " (however, see
    "Supported HTML 4.0 entities" below);NodeList: to access a node's
    children you have to do it the ECMAScript's preferred way,
    that is, theNode[theIndex] (of course, there is a
    theNode.childCount property to know the size of the
    family);NamedNodeMap for elements' attributes either,
    instead use: theElementNode.attributes[nameOfAttribute], or
    preferably, theElementNode.get/setAttribute(nameOfAttribute);
    since attributes is an ECMAScript array, you can
    also enumerate all attributes the usual way in ECMAScript:
  var attr;
  for(attr in theElement.attributes) {
    // do something with theElement.attributes[attr]
  }
xmlText() is an attempt to provide something similar to
    Microsoft's MSXML
    DOMDocument's xml property (implemented in ESPX as a simple
    recursive function returning the XML source text recomposed from the tree
    data structure) - this one helps to debug;xml:space and
    xml:lang pre-declared attributes are properly honored on a
    per-element basis - also, in ESPX the meaning of xml:space's
    "default" value is controlled by the
    XMLParser.preserveWhiteSpace boolean property (for which
    false is to strip insignificant white spaces, while
    true is to keep them all); as far as xml:lang's
    default is concerned, it is given by XMLParser.xmlLang (which
    is a string property);XMLDocumentFactory plays a role similar to that of the
    DOM's
    DOMImplementation.ECMAScript user agents which are
    still around, ESPX does not take advantage of
    throw/try ... catch-based error
    detection/handling - it seems even something like NS 4.7 doesn't know what
    to do with a try ... catch statement;private or protected in
    more serious (read: strongly-typed) OO languages; please do not use them
    unless you can't do your business an other way;XMLParser() constructor function
    for currently recognized error cases - test this part carefully;| Examples | 
For now, only simple.htm, databind.htm and the FAQ as TinyXSL sample. Others may follow.
| From here ... | 
As far as future work directions are concerned, they are likely to include, for the most urgent in any order:
XMLParser, if only for use by TinyXSL;ECMAScript's property of being a reflexive
      language and thus would compile expressions written in such a subset
      into ECMAScript functions.Of course, if you find yourself able to devise an interesting use of ESPX, and better yet, to implement any of the preceding, I can't do better than inviting you to join in the effort.
| Pending ... | 
| Wish list | 
| Supported HTML 4.0 entities | 
Most of them, including:
From Latin-1 Entities:
| Character | Entity | Decimal | Hex | Rendering in the browser | |
|---|---|---|---|---|---|
| Entity | Decimal | ||||
| no-break space = non-breaking space |   |   |   | ||
| inverted exclamation mark | ¡ | ¡ | ¡ | ¡ | ¡ | 
| cent sign | ¢ | ¢ | ¢ | ¢ | ¢ | 
| pound sign | £ | £ | £ | £ | £ | 
| currency sign | ¤ | ¤ | ¤ | ¤ | ¤ | 
| yen sign = yuan sign | ¥ | ¥ | ¥ | ¥ | ¥ | 
| broken bar = broken vertical bar | ¦ | ¦ | ¦ | ¦ | ¦ | 
| section sign | § | § | § | § | § | 
| diaeresis = spacing diaeresis | ¨ | ¨ | ¨ | ¨ | ¨ | 
| copyright sign | © | © | © | © | © | 
| feminine ordinal indicator | ª | ª | ª | ª | ª | 
| left-pointing double angle quotation mark = left pointing guillemet | « | « | « | « | « | 
| not sign = discretionary hyphen | ¬ | ¬ | ¬ | ¬ | ¬ | 
| soft hyphen = discretionary hyphen | ­ | ­ | ­ |  |  | 
| registered sign = registered trade mark sign | ® | ® | ® | ® | ® | 
| macron = spacing macron = overline = APL overbar | ¯ | ¯ | ¯ | ¯ | ¯ | 
| degree sign | ° | ° | ° | ° | ° | 
| plus-minus sign = plus-or-minus sign | ± | ± | ± | ± | ± | 
| superscript two = superscript digit two = squared | ² | ² | ² | ² | ² | 
| superscript three = superscript digit three = cubed | ³ | ³ | ³ | ³ | ³ | 
| acute accent = spacing acute | ´ | ´ | ´ | ´ | ´ | 
| micro sign | µ | µ | µ | µ | µ | 
| pilcrow sign = paragraph sign | ¶ | ¶ | ¶ | ¶ | ¶ | 
| middle dot = Georgian comma = Greek middle dot | · | · | · | · | · | 
| cedilla = spacing cedilla | ¸ | ¸ | ¸ | ¸ | ¸ | 
| superscript one = superscript digit one | ¹ | ¹ | ¹ | ¹ | ¹ | 
| masculine ordinal indicator | º | º | º | º | º | 
| right-pointing double angle quotation mark = right pointing guillemet | » | » | » | » | » | 
| vulgar fraction one quarter = fraction one quarter | ¼ | ¼ | ¼ | ¼ | ¼ | 
| vulgar fraction one half = fraction one half | ½ | ½ | ½ | ½ | ½ | 
| vulgar fraction three quarters = fraction three quarters | ¾ | ¾ | ¾ | ¾ | ¾ | 
| inverted question mark = turned question mark | ¿ | ¿ | ¿ | ¿ | ¿ | 
| Latin capital letter A with grave = Latin capital letter A grave | À | À | À | À | À | 
| Latin capital letter A with acute | Á | Á | Á | Á | Á | 
| Latin capital letter A with circumflex | Â | Â | Â | Â | Â | 
| Latin capital letter A with tilde | Ã | Ã | Ã | Ã | Ã | 
| Latin capital letter A with diaeresis | Ä | Ä | Ä | Ä | Ä | 
| Latin capital letter A with ring above = Latin capital letter A ring | Å | Å | Å | Å | Å | 
| Latin capital letter AE = Latin capital ligature AE | Æ | Æ | Æ | Æ | Æ | 
| Latin capital letter C with cedilla | Ç | Ç | Ç | Ç | Ç | 
| Latin capital letter E with grave | È | È | È | È | È | 
| Latin capital letter E with acute | É | É | É | É | É | 
| Latin capital letter E with circumflex | Ê | Ê | Ê | Ê | Ê | 
| Latin capital letter E with diaeresis | Ë | Ë | Ë | Ë | Ë | 
| Latin capital letter I with grave | Ì | Ì | Ì | Ì | Ì | 
| Latin capital letter I with acute | Í | Í | Í | Í | Í | 
| Latin capital letter I with circumflex | Î | Î | Î | Î | Î | 
| Latin capital letter I with diaeresis | Ï | Ï | Ï | Ï | Ï | 
| Latin capital letter ETH | Ð | Ð | Ð | Ð | Ð | 
| Latin capital letter N with tilde | Ñ | Ñ | Ñ | Ñ | Ñ | 
| Latin capital letter O with grave | Ò | Ò | Ò | Ò | Ò | 
| Latin capital letter O with acute | Ó | Ó | Ó | Ó | Ó | 
| Latin capital letter O with circumflex | Ô | Ô | Ô | Ô | Ô | 
| Latin capital letter O with tilde | Õ | Õ | Õ | Õ | Õ | 
| Latin capital letter O with diaeresis | Ö | Ö | Ö | Ö | Ö | 
| multiplication sign | × | × | × | × | × | 
| Latin capital letter O with stroke = Latin capital letter O slash | Ø | Ø | Ø | Ø | Ø | 
| Latin capital letter U with grave | Ù | Ù | Ù | Ù | Ù | 
| Latin capital letter U with acute | Ú | Ú | Ú | Ú | Ú | 
| Latin capital letter U with circumflex | Û | Û | Û | Û | Û | 
| Latin capital letter U with diaeresis | Ü | Ü | Ü | Ü | Ü | 
| Latin capital letter Y with acute | Ý | Ý | Ý | Ý | Ý | 
| Latin capital letter THORN | Þ | Þ | Þ | Þ | Þ | 
| Latin small letter sharp s = ess-zed | ß | ß | ß | ß | ß | 
| Latin small letter a with grave = Latin small letter a grave | à | à | à | à | à | 
| Latin small letter a with acute | á | á | á | á | á | 
| Latin small letter a with circumflex | â | â | â | â | â | 
| Latin small letter a with tilde | ã | ã | ã | ã | ã | 
| Latin small letter a with diaeresis | ä | ä | ä | ä | ä | 
| Latin small letter a with ring above = Latin small letter a ring | å | å | å | å | å | 
| Latin small letter ae = Latin small ligature ae | æ | æ | æ | æ | æ | 
| Latin small letter c with cedilla | ç | ç | ç | ç | ç | 
| Latin small letter e with grave | è | è | è | è | è | 
| Latin small letter e with acute | é | é | é | é | é | 
| Latin small letter e with circumflex | ê | ê | ê | ê | ê | 
| Latin small letter e with diaeresis | ë | ë | ë | ë | ë | 
| Latin small letter i with grave | ì | ì | ì | ì | ì | 
| Latin small letter i with acute | í | í | í | í | í | 
| Latin small letter i with circumflex | î | î | î | î | î | 
| Latin small letter i with diaeresis | ï | ï | ï | ï | ï | 
| Latin small letter eth | ð | ð | ð | ð | ð | 
| Latin small letter n with tilde | ñ | ñ | ñ | ñ | ñ | 
| Latin small letter o with grave | ò | ò | ò | ò | ò | 
| Latin small letter o with acute | ó | ó | ó | ó | ó | 
| Latin small letter o with circumflex | ô | ô | ô | ô | ô | 
| Latin small letter o with tilde | õ | õ | õ | õ | õ | 
| Latin small letter o with diaeresis | ö | ö | ö | ö | ö | 
| division sign | ÷ | ÷ | ÷ | ÷ | ÷ | 
| Latin small letter o with stroke = Latin small letter o slash | ø | ø | ø | ø | ø | 
| Latin small letter u with grave | ù | ù | ù | ù | ù | 
| Latin small letter u with acute | ú | ú | ú | ú | ú | 
| Latin small letter u with circumflex | û | û | û | û | û | 
| Latin small letter u with diaeresis | ü | ü | ü | ü | ü | 
| Latin small letter y with acute | ý | ý | ý | ý | ý | 
| Latin small letter thorn | þ | þ | þ | þ | þ | 
| Latin small letter y with diaeresis | ÿ | ÿ | ÿ | ÿ | ÿ | 
From Entities for Symbols and Greek Letters:
| Character | Entity | Decimal | Hex | Rendering in the browser | |
|---|---|---|---|---|---|
| Entity | Decimal | ||||
| Latin small f with hook = function = florin | ƒ | ƒ | ƒ | ƒ | ƒ | 
| bullet = black small circle | • | • | • | • | • | 
| horizontal ellipsis = three dot leader | … | … | … | … | … | 
| trade mark sign | ™ | ™ | ™ | ™ | ™ | 
From Special Entities:
| Character | Entity | Decimal | Hex | Rendering in the browser | |
|---|---|---|---|---|---|
| Entity | Decimal | ||||
| quotation mark = APL quote | " | " | " | " | " | 
| ampersand | & | & | & | & | & | 
| less-than sign | < | < | < | < | < | 
| greater-than sign | > | > | > | > | > | 
| Latin capital ligature OE | Œ | Œ | Œ | Œ | Œ | 
| Latin small ligature oe | œ | œ | œ | œ | œ | 
| Latin capital letter S with caron | Š | Š | Š | Š | Š | 
| Latin small letter s with caron | š | š | š | š | š | 
| Latin capital letter Y with diaeresis | Ÿ | Ÿ | Ÿ | Ÿ | Ÿ | 
| modifier letter circumflex accent | ˆ | ˆ | ˆ | ˆ | ˆ | 
| small tilde | ˜ | ˜ | ˜ | ˜ | ˜ | 
| en space |   |   |   | ||
| em space |   |   |   | ||
| thin space |   |   |   | ||
| zero width non-joiner | ‌ | ‌ | ‌ |  |  | 
| zero width joiner | ‍ | ‍ | ‍ |  |  | 
| left-to-right mark | ‎ | ‎ | ‎ |  |  | 
| right-to-left mark | ‏ | ‏ | ‏ |  |  | 
| en dash | – | – | – | – | – | 
| em dash | — | — | — | — | — | 
| left single quotation mark | ‘ | ‘ | ‘ | ‘ | ‘ | 
| right single quotation mark | ’ | ’ | ’ | ’ | ’ | 
| single low-9 quotation mark | ‚ | ‚ | ‚ | ‚ | ‚ | 
| left double quotation mark | “ | “ | “ | “ | “ | 
| right double quotation mark | ” | ” | ” | ” | ” | 
| double low-9 quotation mark | „ | „ | „ | „ | „ | 
| dagger | † | † | † | † | † | 
| double dagger | ‡ | ‡ | ‡ | ‡ | ‡ | 
| per mille sign | ‰ | ‰ | ‰ | ‰ | ‰ | 
| single left-pointing angle quotation mark | ‹ | ‹ | ‹ | ‹ | ‹ | 
| single right-pointing angle quotation mark | › | › | › | › | › | 
| euro sign | € | € | € | € | € | 
| Limitations | 
There is no documentation except this document and comments in the source code.
Also, and apart from bugs to discover, the implementation is in need of improvement in several areas, including:
throw/try ... catch at all, the code is too
    much tricky and/or boring sometimes.| Tested user agents (to be updated regularly) | 
These have been successfully tested with ESPX / TinyXSL:
| Platform | Product name | Version(s) | Built-in XML support versions ? | ECMAScriptimplementation level used for ESPX | 
|---|---|---|---|---|
| Mac | Microsoft Internet Explorer | 5.0 | 5.x and above | ??? | 
| Windows | Microsoft Internet Explorer | 4.x, 5.x | 5.x and above | JScript 3 in 4.x browsers (latest is JScript 5.5 (?)) | 
| Windows, Linux | Netscape Navigator | 4.x, 6.0 | 6.0 and above | JavaScript1.2 in 4.x browsers (latest is JavaScript1.5 (?)) | 
| Windows | Opera | 5.0 | ??? | JavaScript1.2 (?) | 
| Reporting bugs | 
Please report bugs to me. When reporting bugs please be sure to
  include easy-to-reproduce test cases for, either, IE 4.x or 5.x, or NS 4.x
  or 6.0. I'm also interested in the Linux platform- and
  WMLScript-testing feedback, if applicable. Create a zip file
  containing all the necessary files, and attach the zip file to your
  email.
Ideas, comments, suggestions for improvements, especially bug fixes, are always welcome, as usual. Thanks in advance.
March 13, 2002
Cyril Jandia