Recipe7.4.Using the cPickle Module on Classes and Instances


Recipe 7.4. Using the cPickle Module on Classes and Instances

Credit: Luther Blissett

Problem

You want to save and restore class and instance objects using the cPickle module.

Solution

You often need no special precautions to use cPickle on your classes and their instances. For example, the following works fine:

import cPickle class ForExample(object):     def _ _init_ _(self, *stuff):         self.stuff = stuff anInstance = ForExample('one', 2, 3) saved = cPickle.dumps(anInstance) reloaded = cPickle.loads(saved) assert anInstance.stuff == reloaded.stuff

However, sometimes there are problems:

anotherInstance = ForExample(1, 2, open('three', 'w')) wontWork = cPickle.dumps(anotherInstance)

This snippet causes a TypeError: "can't pickle file objects" exception, because the state of anotherInstance includes a file object, and file objects cannot be pickled. You would get exactly the same exception if you tried to pickle any other container that includes a file object among its items.

However, in some cases, you may be able to do something about it:

class PrettyClever(object):     def _ _init_ _(self, *stuff):         self.stuff = stuff     def _ _getstate_ _(self):         def normalize(x):             if isinstance(x, file):                 return 1, (x.name, x.mode, x.tell( ))             return 0, x         return [ normalize(x) for x in self.stuff ]     def _ _setstate_ _(self, stuff):         def reconstruct(x):             if x[0] == 0:                 return x[1]             name, mode, offs = x[1]             openfile = open(name, mode)             openfile.seek(offs)             return openfile         self.stuff = tuple([reconstruct(x) for x in stuff])

By defining the _ _getstate_ _ and _ _setstate_ _ special methods in your class, you gain fine-grained control about what, exactly, your class' instances consider to be their state. As long as you can define such state in picklable terms, and reconstruct your instances from the unpickled state in some way that is sufficient for your application, you can make your instances themselves picklable and unpicklable in this way.

Discussion

cPickle dumps class and function objects by name (i.e., through their module's name and their name within the module). Thus, you can dump only classes defined at module level (not inside other classes and functions). Reloading such objects requires the respective modules to be available for import. Instances can be saved and reloaded only if they belong to such classes. In addition, the instance's state must also be picklable.

By default, an instance's state is the contents of the instance's _ _dict_ _, plus whatever state the instance may get from the built-in type the instance's class inherits from, if any. For example, an instance of a new-style class that subclasses list includes the list items as part of the instance's state. cPickle also handles instances of new-style classes that define or inherit a class attribute named _ _slots_ _ (and therefore hold some or all per-instance state in those predefined slots, rather than in a per-instance _ _dict_ _). Overall, cPickle's default approach is often quite sufficient and satisfactory.

Sometimes, however, you may have nonpicklable attributes or items as part of your instance's state (as cPickle defines such state by default, as explained in the previous paragraph). In this recipe, for example, I show a class whose instances hold arbitrary stuff, which may include open file objects. To handle this case, your class can define the special method _ _getstate_ _. cPickle calls that method on your object, if your object's class defines it or inherits it, instead of going directly for the object's _ _dict_ _ (or possibly _ _slots_ _ and/or built-in type bases).

Normally, when you define the _ _getstate_ _ method, you define the _ _setstate_ _ method as well, as shown in this recipe's Solution. _ _getstate_ _ can return any picklable object, and that object gets pickled, and later, at unpickling time, passed as _ _setstate_ _'s argument. In this recipe's Solution, _ _getstate_ _ returns a list that's similar to the instance's default state (attribute self.stuff), except that each item is turned into a tuple of two items. The first item in the pair can be set to 0 to indicate that the second one will be taken verbatim, or 1 to indicate that the second item will be used to reconstruct an open file. (Of course, the reconstruction may fail or be unsatisfactory in several ways. There is no general way to save an open file's state, which is why cPickle itself doesn't even try. But in the context of our application, we can assume that the given approach will work.) When reloading the instance from pickled form, cPickle calls _ _setstate_ _ with the list of pairs, and _ _setstate_ _ can reconstruct self.stuff by processing each pair appropriately in its nested reconstruct function. This scheme can clearly generalize to getting and restoring state that may contain various kinds of normally unpicklable objectsjust be sure to use different numbers to tag each of the various kinds of "nonverbatim" pairs you need to support.

In one particular case, you can define _ _getstate_ _ without defining _ _setstate_ _: _ _getstate_ _ must then return a dictionary, and reloading the instance from pickled form uses that dictionary just as the instance's _ _dict_ _ would normally be used. Not running your own code at reloading time is a serious hindrance, but it may come in handy when you want to use _ _getstate_ _, not to save otherwise unpicklable state but rather as an optimization. Typically, this optimization opportunity occurs when your instance caches results that it can recompute if they're absent, and you decide it's best not to store the cache as a part of the instance's state. In this case, you should define _ _getstate_ _ to return a dictionary that's the indispensable subset of the instance's _ _dict_ _. (See Recipe 4.13) for a simple and handy way to "subset a dictionary".)

Defining _ _getstate_ _ (and then, normally, also _ _setstate_ _) also gives you a further important bonus, besides the pickling support: if a class offers these methods but doesn't offer special methods _ _copy_ _ or _ _deepcopy_ _, then the methods are also used for copying, both shallowly and deeply, as well as for serializing. The state data returned by _ _getstate_ _ is deep-copied if and only if the object is being dee-copied, but, other than this distinction, shallow and deep copies work very similarly when they are implemented through _ _getstate_ _. See Recipe 4.1 for more information about how a class can control the way its instances are copied, shallowly or deeply.

With either the default pickling/unpickling approach, or your own _ _getstate_ _ and _ _setstate_ _, the instance's special method _ _init_ _ is not called when the instance is getting unpickled. If the most convenient way for you to reconstruct an instance is to call the _ _init_ _ method with appropriate parameters, then you may want to define the special method _ _getinitargs_ _, instead of _ _getstate_ _. In this case, cPickle calls this method without arguments: the method must return a pickable tuple, and at unpickling time, cPickle calls _ _init_ _ with the arguments that are that tuple's items. _ _getinitargs_ _, like _ _getstate_ _ and _ _setstate_ _, can also be used for copying.

The Library Reference for the pickle and copy_reg modules details even subtler things you can do when pickling and unpickling, as well as the thorny security issues that are likely to arise if you ever stoop to unpickling data from untrusted sources. (Executive summary: don't do thatthere is no way Python can protect you if you do.) However, the techniques I've discussed here should suffice in almost all practical cases, as long as the security aspects of unpickling are not a problem (and if they are, the only practical suggestion is: forget pickling!).

See Also

Recipe 7.2; documentation for the standard library module cPickle in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net