Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Thursday, February 17, 2011

Mutable Types as Default Parameters in Python

I just came across a little feature/bug in Python I thought worth sharing.

In Python, you can set default values for parameters, like so:

def func(param="hello"):

The result of which is a function you can call either with or without a parameter.  This seems simple enough, and works very nicely.  However what if you want to use something more complex as the default parameter?  Say, a list?

def func(param=[]):

This works just fine, exactly like the previous example.  But one thing that may or may not immediately be obvious is every call to this function without an explicit parameter will share the same list.  Python is smart enough not to re-parse the default parameters ever time the function is called, meaning that only one empty list is ever created in defining or using func().  Usually this has no effect on the programmer - if your default values are primitive or immutable types, which the vast majority of default values likely are, there is no loss to the function sharing the value between different calls - it won't be changed.  And even when mutable types are used, it's rare to gain any benefit from changing a mutable data structure the rest of the program cannot access, and as such most cases of mutable types as default parameters are used simply for accessing, not mutating, the data.

But one example where that is not the case is in a constructor.  Consider:

class BlogPost:
    def __init__(comments=[]):


When constructing a BlogPost object, I don't want to take the time to explicitly tell it there are no comments, so I would usually not even bother populating that parameter. On its own, this isn't enough to get me into trouble. Neither is creating multiple BlogPosts, nor is accessing, changing, or otherwise working with a BlogPost dangerous. And so you could go quite some time developing before you find yourself in a situation where you have constructed multiple BlogPost objects and where you update the comments array of one of them. As soon as you do that, however, all other existing BlogPost objects will also have the same comment in their comments list, because it's actually the same list!

Once you think about it this behavior is not terribly shocking - you might even go so far as to say it's intuitive. However if you aren't thinking about it, it's a very bizarre problem to debug when it happens to you.

Friday, February 4, 2011

Python Documentation

I had an interesting time understanding how to use Python's json library, which is very sparsely documented and completely lacks examples.  I found a page on using json in Python which was very helpful, but I did not read the code examples quite deeply enough, and the annotations were limited.

I wanted to be able to take a "bug" object and turn it into JSON data, and then take that JSON data and reconstruct the bug.  The example is for the generic case of serializing a class, and therefore stores metadata about the class, like its name, module, and package, along with the other data.  I wanted to avoid that if possible, and so stripped out the dynamic class construction and the like, and effectively wrote a pair of functions that took a dictionary and turned it into an object, and took an object and turned it into a dictionary.

However, I tried making one of the bug's instance variables a dictionary as well, and the program crashed when converting from JSON to Python object, seeming to have dropped the surrounding dictionary, and only considering the internal dictionary.  This seemed to me like completely bizarre behavior, and took a fair bit of poking around, including multiple stack overflow questions and spending time on #python (a truly painful process) in order to understand completely.

The specific problem was the object_hook parameter of json's load function.  The description of this parameter in the documentation is:
object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting).
If you read this closely, you may see what I was missing.  Remember that I was coming from the perspective of needing to serialize exactly one object, who's contents would all be data structures (lists or dicts), or primitives.  It hadn't really crossed my mind that an object could contain other objects that would need to be serialized.  Silly in retrospect of course, but not immediately intuitive if you aren't thinking about it.

So what was happening when I was seeing only the internal dict was exactly what was supposed to happen - json was applying the object_hook function to each nested dict in turn.  This meant that you could not (nor would you want to) use object_hook to apply to only one type of object, it needs to at least potentially handle any possible object that could be thrown at it.  I ended up implementing a much cleaner way of (un)serializing bugs once I understood exactly what object_hook is doing - which I'll claim isn't completely clear from the language used in the Python docs.