Saturday, 17 October 2015

Python Weekly # 1 : Notes on developing an import hook

Notes on developing an import hook

Before we start I should give a warning, this article covers import hooks in python 2.7. The hook mechanism has changed from Python 3.5, although the mechanism in Python 2.7 is still supported.

I am a big fan of using JSON to allow configuration of my application, especially where the configuration is more towards the back end functionality. Many of of my applications have variations of either Class or object factories based on JSON data - using JSON dictionaries as templates for the formation of individual classes or instances, and I soon realised that a more "standard" way of approaching JSON would be ideal.

Using the Python import hook mechanism (See PEP 0302), I set out to develop a way of importing JSON file, and automatically generating classes etc from the data within the JSON - an implementation of data as code.

The result is the importjson library - (available as source on GitHub, and an installable package from PyPi - Python Package Index)

This article documents some of what learned during the development process.

Python exposes two types of import hooks - sys.path_hooks and sys.meta_path :
sys.path_hooks
These hooks are best used when there will be a specific entry on sys.path which only needs the special importer - for instance the zipimport module uses this mechanism.
sys.meta_path
These hooks are appropriate when a special sys.path entry will not exist - i.e. where the things to be imported could exist in anywhere.
It was clear that sys.meta_path is ideal for my purposes, since in my applications often co-locate JSON files with my python files - and I am sure this is the case with many other developers. Also using sys.meta_path mechanism would still work if the JSON files are kept separate from the python files.

The Importer Protocol is well documented in PEP 0302, so I will not cover the details here, but I will cover a few important notes for anyone trying something similar - things I tripped over.

Installation

Your importer not only needs to implement the Importer Protocol, but it will also need to install it self onto either sys.path_hooks, or sys.meta_path.

An entry in sys.meta_path should be an object instance which implements at least the find_module method from the Importer Protocol, while an entry in sys.path_hooks will need to be a callable which when passed a path (from sys.path), returns an object which implements at least the find_module method from the the Importer Protocol (the callable can clearly be a class which accepts the sys.path entry as an argument to the constructor).

Reuse and caching

Regardless of whether the Importer is installed on sys.path_hooks or sys.meta_path the instance will be reused, so care should be taken when deciding which data should be stored on a per instance basis. 

Due to the way that Python caches sys.path_hook entries (sys.path_importer_cache), once an Importer successfully finds one module within an sys.path entry the same Importer will be used to load all other modules in the same sys.path entry. This caching make a sys.path_hook type importer not suitable for situations where many types of importable file might exist within a given directory.

Translation of module names

The Importer Protocol consists of two methods which have a very simple signatures :
  • finder.find_module(fullname, path=None) 
  • finder.load_module(fullname )
The detail of what these methods need to do are documented in PEP 0302, so I wont cover it again here, but I will note :


Both methods are passed the fullname of the module - i.e. the fully dotted name of the module being loaded, so both methods need to be able to translate the fullname of the module into a file name which can be read and converted to a module, and it makes sense for that translation to be consistent (I cannot think of a situation where you want inconsistency here).

In my module I solved this by the find_module method storing the translation of fullname to file name into a class wide dictionary, and the load_module method uses the same dictionary to identify which file to open.

How you do the translation depends in part whether you have a sys.path_hook or sys.meta_path installed Importer :
  • With a sys.path_hook importer, the importer instance is initialized with an entry from sys.path (i.e. a file path), and the find_module method is then passed the full qualified name and the path of the most immediate parent module which applies (or None is the module being imported is a top-level module). Using either the path from the initializer, or the path given as the find_module argument, you can identify the path to the specific module.
  • With a sys.meta_path importer, the find_module method is passed a fully qualified module name, and the path the most immediate parent module (or None if the module being imported is a top-level module). The importer should use the the path to the parent if provided, but the importer will need to determine how and where to look for top level modules. My JSON importer does it's own traversal of sys.path to find where to search, but there are other options.

And finally :

During the development cycle beware when reloading your importer module, as this could well result in multiple entries in sys.path_hooks or sys.meta_paths, which could lead to inconsistent test results - take care to remove old entries before reloading.

If during development your importer does go completely wrong, and breaks your ability to import other modules, you have a few different options :
  • Remove the importer entry from sys.meta_path
  • Remove the importer entry from sys.path_hooks and from sys.path_importer_cache
  •  If all else fails, exit the interpreter and start again.
Writing your own import hook may seem scary at first, but in fact it is relatively simple - the key thing is coming up with the idea in the first place.

No comments:

Post a Comment