Another pet peeve: writing data validation rules

This isn’t really innovative, not in the slightest, but it’s something that needs some innovation quite seriously being poured into it.

I sometimes have to work with datasets, documents that structure data in a meaningful way, and occasionally have to contend with documents that have a structure unlike anything I’ve seen before, even when it claims to be in a given format.

The source of my frustration is Wedge, of all things. Over the last month, I gave Wedge a management tool for organisation and activation of plugins, extensions of the core functionality of the software.

And, because I’m a bit of a fool, I wrote the manifest file – the file that directs what has to happen – in XML. I don’t honestly know any other language that isn’t a pure programming language that allows me to describe the operations that have to occur, and to do so in a manner where the document itself implies what has to happen.

I have the plugin block, it contains blocks for the plugin’s name, description, author information, that sort of thing, before going on to describe new database tables, hooks to register, scheduled tasks to set up, settings to initialise, and even readme files for the plugin. (And more that’s unfinished)

Now, I’m a great fan of having tools to make my life easier. When it comes round to offering plugins for Wedge, either of my own, or a centrally organised site, I want something that will review the file for me and validate that the XML file provided is correct, meaningful and contains everything it needs to.

Except that writing something to really validate it automatically is a drag. Sure, I can rewrite the code I have which actually processes the file, but that means writing, and maintaining, a separate tool that has to be downloaded and run to make sense of it.

Now, XML files are supposed to (generally) be able to be validated through a Document Type Declaration, or DTD, but after a day or so’s head-scratching I finally gave up trying to write one of those for my format. Partly that my format’s not that typical, and partly that I don’t feel entirely comfortable being harsh about its validity – I don’t want to make a list of up to 16 items in a list be in the exact right order if listed. In this day and age of computing power, I should be able to put something together that does it for me – and DTD’s aren’t it.

So, I went searching and found the RelaxNG specification, which is great for handling the sort of things that DTD can’t, but it’s just so verbose it isn’t even funny. Fortunately I have copy/paste keys at the ready.

What I’d love to see is something meaningful, that allows me to indicate the structure of XML (or indeed, any file format I care to think about), and be descriptive – even thorough if I so choose – but without having to make me write lots and lots and lots about something fairly straightforward.

It should also allow me to state the type of data I’m expecting, so that the validation isn’t just structural or semantic but direct of what I’m actually working with.

I don’t think this is unreasonable, and if we had such data structures handy, we’d be able to validate incoming files more readily – generally – which has security benefits. If you have a system that accepts uploading of files, you can scan them through this system to ensure they are valid, which means you can trap malware at source rather than hoping to catch it at destination…

Still, I as a programmer can dream…

This entry was posted in Uncategorized. Bookmark the permalink.