In the last post, I introduced the concepts of the module object, module, and package, concrete objects that exist within the Python runtime, as well as some basic ideas about packaging, finding, and loading.
In this post, I’ll go over the process of finding, what it means to find something, and what happens next.
A Clarifying point
I’ve been very careful to talk about finding vs. loading vs. listing in this series of posts. There’s a reason for that: in Python 2, the terms "Finder" and "Importer" were used interchangeably, leading to (at least on my part) massive confusion. In actual fact, finders, hooks, loaders, and listers are all individual objects, each with a single, unique method with a specific signature. The method name is different for each stage, so it is theoretically possible to define a single class that does all three for a given category of module object, and only in that case, I believe, should we talk about an "Importer."
In Python 2.6 and 2.7, the definitive Finder class is called
pkgutil.ImpImporter, and the Loader is called
pkgutil.ImpLoader; this was a source of much of my confusion. In Python 3, the term "Importer" is deprecated and "Finder" is used throughout
importlib. I will be using "Finder" from now on.
import <fullname> command is called, a procedure is triggered. That procedure then:
- attempts to find a corresponding python module
- attempts to load that corresponding module into bytecode
- Associates the bytecode with the name via sys.modules[fullname]
- Exposes the bytecode to the calling scope.
- Optionally: writes the bytecode to the filesystem for future use
Finding is the act of identifying a resource that corresponds to the import string and that can be compiled into a meaningful Python module. The import string is typically called the fullname.
Finding typically involves scanning a collection of resources against a collection of finders. Finding ends when finder
A, given fullname
B, reports that a corresponding module can be found in resource
C, and that the resource can be loaded with loader
Finders come first, and MetaFinders come before all other kinds of finders.
Most finding is done in the context of
sys.path; that is, Python’s primary means of organizing Python modules is to have them somewhere on the local filesystem. This makes sense. Sometimes, however, you want to get in front of that scan and impose your own logic: you want the root of an import string to mean something else. Maybe instead of
directory.file, you want it to mean
table.row.cell, or you want it to mean
website.path.object, to take one terrifying example.
That’s what you do with a MetaFinder: A MetaFinder may choose to ignore the entire sys.path mechanism and do something that has nothing to do with the filesystem, or it may have its own filesystem notion completely separate from
sys.path, or it may have its own take on what to do with
A Finder (both MetaFinder and FileFinder) is any object with the following method:
[Loader|None] find_module([self|cls], fullname:string, path:[string|None])
The find_module method returns None if it cannot find a loader resource for the provided fullname. The path is optional; in the standard Python implementation, when the path is None it means “use `sys.path`”; when it’s set, it’s the path in which to look.
A MetaFinder is placed into the list
sys.meta_path by whatever code needs the MetaFinder, and it persists for the duration of the runtime, unless it is later removed or replaced. Being a list, the search is ordered; first match wins. MetaFinders may be instantiated in any way the developer desires before being added into
PathHooks and PathFinders
PathHooks are how
sys.path is scanned to determine the which Finder should be associated with a given directory path.
A PathHook is a function (or callable):
[Finder|None] <anonymous function>(path:string)
A PathHook takes a given directory path and, if the PathHook can identify a corresponding FileFinder for the modules in that directory path and return a constructed instance of that FileFinder, otherwise it returns None.
sys.meta_path finder returns a Loader, the full array of
sys.paths ⨯ sys.path_hooks is compared until a PathHook says it can handle the path and the corresponding finder says it can handle the fullname. If no match happens, Python’s default FileFinder class is instantiated with the path.
This means that for each path in
sys.paths, the list of
sys.path_hooks is scanned; the first function to return an importer is handed responsibility for that path; if no function returns, the default FileFinder is returned; the default FileFinder returns only the default SourceFileLoader which (if you read to the end of part one) blocks our path toward heterogeneous packages.
PathHooks are placed into the list
sys.meta_path, the list is ordered and first one wins.
There’s some confusion over the difference between the two objects, so let’s clarify one last time.
☞ Use a meta_finder (A Finder in
sys.meta_path) when you want to redefine the meaning of the import string so it can search alternative paths that may have no reference to a filesystem path found in
sys.path; an import string could be redefined as a location in an archive, an RDF triple of document/tag/content, or table/row_id/cell, or be interpreted as a URL to a remote resource.
☞ Use a path_hook (A function in
sys.path_hooks that returns a FileFinder) when you want to re-interpret the meaning of an import string that refers to a module object on or accessible by
sys.path; PathHooks are important when you want to add directories to sys.path that contain something other than
.so modules conforming to the Python ABI.
☝ A MetaFinder is typically constructed when it is added to
sys.meta_path; a PathHook instantiates a FileFinder when the PathHook function lays claim to the path. The developer instantiates a MetaFinder before adding it to
sys.meta_path; it’s the PathHook function that instantiates a FileFinder.
Note that PathHooks are for paths containing something other than the traditional (and hard-coded) source file extensions. The purpose of a heterogeneous source file finder and loader is to enable finding in directories within
sys.path that contain other source files syntaxes alongside those traditional sources. I need to eclipse (that is, get in front of) the default FileFinder with one that understands more suffixes than those listed in either
imp.get_suffixes() (Python 2) or
importlib._bootstrap.SOURCE_SUFFIXES (Python 3). I need one that will return the Python default loader if it encounters the Python default suffixes, but will invoke our own source file loader when encountering one of our suffixes.
We’ll talk about loading next.