Notes on developing an import hook
Before we start I should give a warning, this article covers import hooks in python 2.7. The hook mechanism has changed from Python 3.5, although the mechanism in Python 2.7 is still supported.I am a big fan of using JSON to allow configuration of my application, especially where the configuration is more towards the back end functionality. Many of of my applications have variations of either Class or object factories based on JSON data - using JSON dictionaries as templates for the formation of individual classes or instances, and I soon realised that a more "standard" way of approaching JSON would be ideal.
Using the Python import hook mechanism (See PEP 0302), I set out to develop a way of importing JSON file, and automatically generating classes etc from the data within the JSON - an implementation of data as code.
The result is the importjson library - (available as source on GitHub, and an installable package from PyPi - Python Package Index)
This article documents some of what learned during the development process.
Python exposes two types of import hooks - sys.path_hooks and sys.meta_path :
- sys.path_hooks
- These hooks are best used when there will be a specific entry on sys.path which only needs the special importer - for instance the zipimport module uses this mechanism.
- sys.meta_path
- These hooks are appropriate when a special sys.path entry will not exist - i.e. where the things to be imported could exist in anywhere.
The Importer Protocol is well documented in PEP 0302, so I will not cover the details here, but I will cover a few important notes for anyone trying something similar - things I tripped over.
Installation
An entry in sys.meta_path should be an object instance which implements at least the find_module method from the Importer Protocol, while an entry in sys.path_hooks will need to be a callable which when passed a path (from sys.path), returns an object which implements at least the find_module method from the the Importer Protocol (the callable can clearly be a class which accepts the sys.path entry as an argument to the constructor).
Reuse and caching
Regardless of whether the Importer is installed on sys.path_hooks or sys.meta_path the instance will be reused, so care should be taken when deciding which data should be stored on a per instance basis.Due to the way that Python caches sys.path_hook entries (sys.path_importer_cache), once an Importer successfully finds one module within an sys.path entry the same Importer will be used to load all other modules in the same sys.path entry. This caching make a sys.path_hook type importer not suitable for situations where many types of importable file might exist within a given directory.
Translation of module names
The Importer Protocol consists of two methods which have a very simple signatures :- finder.find_module(fullname, path=None)
- finder.load_module(fullname )
Both methods are passed the fullname of the module - i.e. the fully dotted name of the module being loaded, so both methods need to be able to translate the fullname of the module into a file name which can be read and converted to a module, and it makes sense for that translation to be consistent (I cannot think of a situation where you want inconsistency here).
In my module I solved this by the find_module method storing the translation of fullname to file name into a class wide dictionary, and the load_module method uses the same dictionary to identify which file to open.
How you do the translation depends in part whether you have a sys.path_hook or sys.meta_path installed Importer :
- With a sys.path_hook importer, the importer instance is initialized with an entry from sys.path (i.e. a file path), and the find_module method is then passed the full qualified name and the path of the most immediate parent module which applies (or None is the module being imported is a top-level module). Using either the path from the initializer, or the path given as the find_module argument, you can identify the path to the specific module.
- With a sys.meta_path importer, the find_module method is passed a fully qualified module name, and the path the most immediate parent module (or None if the module being imported is a top-level module). The importer should use the the path to the parent if provided, but the importer will need to determine how and where to look for top level modules. My JSON importer does it's own traversal of sys.path to find where to search, but there are other options.
And finally :
During the development cycle beware when reloading your importer module, as this could well result in multiple entries in sys.path_hooks or sys.meta_paths, which could lead to inconsistent test results - take care to remove old entries before reloading.If during development your importer does go completely wrong, and breaks your ability to import other modules, you have a few different options :
- Remove the importer entry from sys.meta_path
- Remove the importer entry from sys.path_hooks and from sys.path_importer_cache
- If all else fails, exit the interpreter and start again.
No comments:
Post a Comment