These days I'm studying tricks and technologies as to xml processing with java. So as I mentioned in a previous post I wrote about the overview on it. In particular, I mentioned the DOM way which is based on DOM, Document Object Model, a standard Object Model of XML maintained by the W3C Consortium. Here I first give some simple details about DOM itself.
For a simple xml document shown below:
<?xml version="1.0" encoding="UTF-8" ?>
<song genre="rock">
<name>My December</name>
<singer>Linkin Park</singer>
</song>
The DOM tree-like structure should be like this (E indicates a element node and T indicates a text node):
E:song
|--T:characters(whitespace)
|--E:name---T:characters(My December)
|--T:characters(whitespace)
|--E:singer---T:characters( Linkin Park)
|--T:characters(whitespace)
As depicted above, the root node song has five child nodes among wich two have their child nodes. I wanna emphasize the text node here. Before I start going deep into xml processing these days, I even don't know the existence of so-called text nodes. Because in Delphi, they're just ignored. So the DOM tree-like structure is like this:
E:song---E:name
|--E:singer
Only an element is called a node. I think this is quite intuitive, though definitely the official DOM structure is more theoretically complete. But with the white space and other text nodes the process of xml parsing is complicated. The example is worth a thousand words. Let's see how the simple xml document is parsed defferently in Java and Delphi:
In Java (exceptions are left unhandled):
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File(<xml file name>);
Element root = doc.getDocumentElement();
NodeList list = root.getChildNodes();
// A simple helper method
printStr("name: " + list.item(1).getFirstChild().getNodeValue());
printStr("singer: " + list.item(3).getFirstChild().getNodeValue());
In Delphi:
var
XMLDoc: IXMLDocument;
XMLNode, CtlNode: IXMLNode;
i, index: integer;
str: string;
begin
str = '';
XMLDoc := TXMLDocument.Create(nil);
XMLNode = XMLDoc.ChildNodes.Nodes['song'];
for i := 0 to XMLNode.ChildNodes.Count - 1 do
begin
str := str + XMLNode.ChildNodes.Nodes[i].NodeValue;
end;
end;
Apparently, the Java version is more awkward and will be more complicated provided the xml document is very long. This is because the element nodes can't be sequentially accessed due to the existence of white space text nodes. In contrast, with text nodes ignored, the Delphi version is quite clear and adaptive to document of any size. As I know, besides Java many implementations (at least Javascript, as I know) of DOM are aware of the text nodes, especially the white space text nodes.
So various kinds of helper method are used by developers to improve this awkward situation.
Method 1:
private Node getNodeByName(final NodeList list, final String name) {for (int i = 0; i < list.getLength(); i++) {final Node node = list.item(i);
// to pass the white space node
if (name.equals(node.getNodeName())) {return node;
}
}
return null; // not found
}
Method 2:
...
NodeList list = e.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {Node n = list.item(i);
// to pass the white space node
if (!(n instanceof Element)) { continue; }nsFixup((Element) n, map, false);
}
And I believe there must be more.
I really don't see any benifits of keeping the awareness of text nodes until now. But If you know, tell me please.

No comments:
Post a Comment