Friday, February 4, 2011

Python Documentation

I had an interesting time understanding how to use Python's json library, which is very sparsely documented and completely lacks examples.  I found a page on using json in Python which was very helpful, but I did not read the code examples quite deeply enough, and the annotations were limited.

I wanted to be able to take a "bug" object and turn it into JSON data, and then take that JSON data and reconstruct the bug.  The example is for the generic case of serializing a class, and therefore stores metadata about the class, like its name, module, and package, along with the other data.  I wanted to avoid that if possible, and so stripped out the dynamic class construction and the like, and effectively wrote a pair of functions that took a dictionary and turned it into an object, and took an object and turned it into a dictionary.

However, I tried making one of the bug's instance variables a dictionary as well, and the program crashed when converting from JSON to Python object, seeming to have dropped the surrounding dictionary, and only considering the internal dictionary.  This seemed to me like completely bizarre behavior, and took a fair bit of poking around, including multiple stack overflow questions and spending time on #python (a truly painful process) in order to understand completely.

The specific problem was the object_hook parameter of json's load function.  The description of this parameter in the documentation is:
object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting).
If you read this closely, you may see what I was missing.  Remember that I was coming from the perspective of needing to serialize exactly one object, who's contents would all be data structures (lists or dicts), or primitives.  It hadn't really crossed my mind that an object could contain other objects that would need to be serialized.  Silly in retrospect of course, but not immediately intuitive if you aren't thinking about it.

So what was happening when I was seeing only the internal dict was exactly what was supposed to happen - json was applying the object_hook function to each nested dict in turn.  This meant that you could not (nor would you want to) use object_hook to apply to only one type of object, it needs to at least potentially handle any possible object that could be thrown at it.  I ended up implementing a much cleaner way of (un)serializing bugs once I understood exactly what object_hook is doing - which I'll claim isn't completely clear from the language used in the Python docs.

No comments:

Post a Comment