I am a simple programmer that learned perl out of the necessity to parse Unix formatted files and crunch some data. Just like everyone else.
Typically a crunching data script is born by visually inspecting the look of the file to be parsed, guessing how to parse it, write a perl loop and using the appropriate split or regular expression. Then you move on and do something useful with it.
Of course a quick glance at a file fails to reveal the underlying representation, or the rules used on it to handle corner cases. For example, parsing a comma-separated line is fine as long as there are no escape characters, or not quote-grouping, which is not always obvious during visual inspection. These special rules will appear unexpectedly in a production system, most likely because you only looked at a sample of the file, and not at all the possible combinations. God forbid you actually read the documentation for the file (and in an open source system, the challenge is to find documentation that actually matches the file format, but that is a separate story)
Recently I have been using XPath in .NET to parse, pull and extract information out of XML files. Before XPath I used to be one of those `find the node, now call a function to search the matching children node, repeat until found'-kind of person. XPath has made me a happier man. I realized that part of the pain in dealing with simple text files can be easily addressed with XPath and XML: There is a single format that you can use (and a set of tools that produce and consume valid XML) and a simple way of fetching nicely structured data (as opposed to files like /etc/inittab, /etc/fstab, /etc/termcap or the terminfo database.
Don't get me wrong. Termcap is a great file format if you have a single implementation of the beast, the only API call you know about is strtok(2) and you just learned how to test for end-of-string marker in C.
I know the above sounds completely obvious to everyone. But I liked my little realization this week. I think I might be on the path to XML Zen.
I used this tutorial to learn XPath. First match in Google.
Duncan saw a talk at Stanford by Todd Proebsting. In his talk he mentioned that it would be nice to have extensions to handle XML from your favorite programming language as XML was becoming ubiquitous. XPath gets close to this, but its typically implemented as a library routine. Then Don Box went to a conference and pushed for the same idea. Wild speculation about what Microsoft could be doing begun.
The above gave Duncan an idea: he wanted to be able to use XPath-like expressions within C# to address nodes
It ocurred to me that we could hack our C# compiler to implement Duncan's idea with relatively ease. The idea would be to flag XmlNode with a special attribute (say, [Dynamic]) and then have the compiler resolve "Member Access" expressions with dynamic code instead of using static code.
At compile time the compiler will figure out what "This.String.Method" means. One interpretation could well be `In namespace "This", pick class "String" and lookup the member "Method"'. This in turn becomes "Fetch from class-id XXXX the field YYYY". We could use the [Dynamic] flag to let the compiler know that after resolving the meaning of a particular element in a member access expression, it should not try to statically resolve the meaning of it, but do it dynamically and generate code accordingl.
So given: "XmlNode n = GetNode ();" and the expression "j = n.Types [5].Dingus" would become:
XmlNode temp1 = n.SelectNodes ("Types"); XmlNode temp2 = temp1 [5]; // Gets the fifth node XmlNode temp3 = temp2.SelectNodes ("Dingus"); j = temp3;
If you can annotate the node with the XML Schema, the compiler could do strong type checking as well:
[Schema ("Dingus")] XmlNode n = GetNode ();
There are issues to be addressed here, like how would C# cope with identifiers like "my-element-name".
Posted on 13 Feb 2003