21Aug

Menders, Makers, Mentors

Posted by Elf Sternberg as chat

Andrea Goulet is giving me an existential crisis. The CEO of a software development consultation shop, she recently wrote an article called Menders vs. Makers, and something happened this week that makes me think, maybe I’m in the wrong line of work. I’m starting to suspect I’m a mender in a business that only values makers.

This week, I was working on a code base that provided a hierarchical tag editor for an inventory system. I had recently added a new feature that made it possible to see individual elements of the tag system on the Collection page; you not longer had to go visit a single object to see if it had, for example, a location tag; you could just say on the Collection page, “Show me all the objects that have a location tag, and add a new column, location.”

Now that we were able to see the tags, a new problem was found: it wasn’t possible to delete tags. Odd nobody had noticed that before. Since I was the last person in that code base, it was my duty to fix it. Down into the legacy code I went.

The tagging code was, well, intermingled. Validating the tags, determining the changes between the version on the client and the version on the server, writing those changes back, were all in a single gigantic Backbone sync method involving empty arrays, for loops, and concat methods. I spent about four hours, during which I:

  • Replaced all for loops with map / reduce / filter
  • Separated the model validation into its own method
  • Used underscores’s intersection / union / difference functions to create instruction sets for deleting and adding to the tag system
  • Used Backbone’s set([_], (void 0), {unset: true}) method to delete the tags, rather than hammer the event bus with a series of change events in a each loop.

I struggled a lot to make sure I was using names that explained what each thing did.

In short, I did with my code what I did with my writing: try to make every line a pleasure to read, something that told a story about what was happening and what was going to happen next. I hope when someone sees overlappingTags = _.intersection(newTags, restrictedTagNames), it’s obvious what’s happening, and it should create anticipation that soon there will be a line that checks to see if overlappingTags has anything in it and, if it does, reports an error with the offending tags.

I’ve always had fun doing stuff like that, turning unreadable mash into clarity. Even my recent bragging project, Polyloader, is actually a fix for the “All Python on the filesystem ends in .py” bug that sorta firewalls Python syntax from the rest of the language universe.

I’ve found this industry doesn’t really like menders. Code editors, people who go in after the fact and apply measures both aesthetic and qualitative to the code they see, are often seen as nothing but agency overhead by managers.

On the other hand, I’ve yet to meet another developer who resented menders. They like menders; they want to learn from menders how make code better. Menders tend to be older, tend to know more, tend to be broadly learned and strongly opinionated. Nothing “just gets thrown there.” It has to be fixed, it has to work, it has to be right. And I’ve yet to meet a software developer who didn’t want to get it right. Often, they just don’t know how, or nobody’s ever told them how.

Let’s show them how.

13Aug

Programmers need a class in aesthetics.

Posted by Elf Sternberg as chat


Sometimes it’s a little hilarious to read the back-and-forth of academics. My favorite is this exchange from Roman R. Redziejowski and Brian Ford over packrat parsing. Redziejowski writes

PEG is not good as a language specification tool. The most basic property of a specification is that one can clearly see what it specifies. And this is, unfortunately, not true for PEG.

To which Ford responds,

Such permissiveness can create unexpected syntactic subtleties, of course, and caution and good taste are in order: a powerful syntax description paradigm also means more rope for the careless language designer to hang himself with.

No points for complaining that Ford ends his sentence with a preposition.

This exchange highlights an issue in the programming language community that stands out for me. There’s a debate raging between two camps, with Google Go at one pole and Haskell at the other. Google Go is fundamentally an ugly language, one the designers admit up front is meant to make mediocre programmers productive, to constrain them from hurting themselves while making them capable of producing working code. And while it’s fine for that, consider the Microsoft “wizards” of the mid-1990s that pumped out huge blocks of C++ that nobody, not even the template designers, could understand; when it comes to Go, that’s where we’re headed. On the other hand, Haskell is fundamentally a beautiful language that’s really, really hard to understand; you have to immerse yourself in decisions where you, yourself describe the constraints with precision, with care, with taste.

Ira Glass has a speech, On Storytelling, in which he says, about being creative,

We get into it because we have good taste, but there’s like a gap.

The first couple years that you’re making stuff, what you’re making isn’t so good, It’s trying to be good, it has ambition to be good, but it’s not quite that good.

But your taste, the thing that got you into the game, your taste is still killer. And your taste is so good that you can tell that what you’re making is kind of a disappointment to you, you know what I mean?

The thing is, this is true of storytelling, of drawing, of any creative endeavor. A lot of programmers don’t get into programming because they view it as a creative endeavor. They view it as puzzle solving. They view it as engineering. They view it as a way to make money fast.

They have no taste.

Often, they don’t want to have taste. They want to get the job done and get paid. “Taste” slows them down and gets in the way. Aesthetic decisions about code layout and arrangement, they believe, are irrelevant to getting the job done.

This isn’t true, of course; Tasteless Go is still as unmaintainable as tasteless C++. It’s possible to write aesthetically horrifying Haskell. Let’s not even talk about Perl.

I believe this is the fundamental dividing line betnween Go, C, and C++ on the one side, and Rust, Clojure, and Haskell on the other. The whole point of Go is make programmers with no interest in taste or aesthetics write programs that work. Maintainability is secondary.

Which goes back to my tweet above. Java and Go programmers want to write the first kind. Haskell and Lisp programmers and their descendents love to write the second type. But my experience with reading and writing in a variety of lanugages convinces me we frequenty end up at the third with no help for it.

The solution is to teach aesthetics. To teach people that readability and maintainability matter more than just getting the job done.  That if it doesn’t make you feel good the day after you wrote it, re-write it.

After all, sometimes your code will live much longer than you expect.

This feels like something that deserves clarification. It’s not that I fear any and all of my projects becoming popular. I would love for some of them to become very popular. Polyloader would be awesome, as would Tumble. But I draw a distinction between tools, products, and examples. Catalogia is a product, and I don’t want to be tech support. There’s a huge difference between getting something right, and teaching the average user about the cupholder that came with his desktop machine. Tumble and Polyloader are tools: I want them to reach the widest possible audience and make that audience, my fellow developers, smarter and happier and more effective. The Backbone Store is an example, but examples are just examples. If they’re inadequate to the state of the art, it’s my duty to revise or remove them, or at least comment on their deprecation, but I’m not going to help individual users understand what’s going on.

11Aug

What comes next?

Posted by Elf Sternberg as chat, programming

After publishing The Semantics of Python Import and explicating on the history and internals of how Python turns source code into running operations, I thought I had a pretty clear idea of what to do next. I extended Hy such that it was now possible to write an entire Django application in a Lisp dialect, which was cool, and started on Catalogia, a program that would help me index, search, clean up and organize my music collection. The idea behind Catalogia was to demonstrate that writing an entire Django application in Hy was possible. I’ve done that, even going so far as to demonstrate that it’s possible to replace the boilerplate of generic views with a Lisp macro to generate the boilerplate automatically at compile-time.

The problem is that Catalogia isn’t done, but I’m already bored with it. This is a classic problem in side projects, I know, but I’m trying to figure out what to do with it.

What I’d really like to do, now that I’ve got a viable Lisp running on and auto-transpiling to Python, is write even more tools to extend Hy even further. I don’t like the existing suite of PEGs for Python; I’m completely spoiled by David Majda’s PEG.js, which marries PEG to Javascript with the absolute minimal amount of boiler plate possible; I’d like to port something that succint to Python. Lexer/Parser technology is one of those spaces that’s assumed to be “solved,” but there are still places where it could be better, especially in the UX of the development process, and there’s also an entirely new Lexer/Parser theory called the Derivatives of Determinite Finite Automata that has one implementation (in Racket, natch) that I’d like to see happening in a popular language like Python.

What I’d also like to do is strike out on my own and build on the experience Hy gave me to build a language research platform for Python. Something like GardenSnake, but complete, on top of which it would be easy, even trivial, to add new tiles that construct whole new operators in Python. I’d like to be able to pipe and compose Python instructions in a point-free syntax; how cool would that be without having to run through a transpiler? Just write in this “extended” Python, call Python, run Python, and have it work. (Psst: Polyloader is a key component of this idea.) Something that could be rolled back simply, providing plug-and-play additions to the Python grammar/compiler, just by adding a single call in your script to polyloader.install()?

Meanwhile, all the other desires are piling up. I want to run through this class, and this class. (I already have the textbooks.) I want to move this blog off effin’ WordPress onto something sane, and then Dockerize the sanity. I want to finish my basic editor for my stories, with all the front-end stuff that’s been missing for so long, and do a visual refresh, and all the other critical things that happen when the Web Guy’s Website Doesn’t Get Revamped.

I really should commit to Catalogia. I’m just afraid of it becoming popular.

Module Iterators, as defined in pkgutil.py, aren’t really part of the mess that has been imposed on us by PEP-302 and its follow-on attempts to rationalize the loading process, but they’re used by so many different libraries that when we talk about creating a new general class of importers, we have to talk about iterators.

Iterators, after all, are why I started down this project in the first place. It was Django’s inability to find heterogeneously defined modules that I set out to fix.

Iterators are define in the pgkutil module; their entire purpose is, given some kind of reference to an archive, to be able to list the contents of that archive, and to recursively descend into that archive if it happens to be a tree-like structure.

When you call pkgutil.iter_modules(path, prefix), you get back a list of all the modules within that path or, if no path is supplied, all the paths in sys.path. As I pointed out in my last post, the paths is sys.path aren’t necessarily paths on the filesystem or, if they are, they’re not necessarily directory paths. All that matters is that for each path, a path_hook exists that can return a Finder, and that Finder has a method for listing the contents of the path found.

In Python 2, pkgutil depends upon Finders (those things we said were attached to meta_path and path_hooks) to have a special function called iter_modules; if it does, that function is used to list the contents of the “path”.

In Python 3, the functools.singledispatch tools is used to differentiate between different Finders; once a Finder has been identified by path_hooks, the singledispath us used to find a corresponding resource iterator for that Finder. It doesn’t necessarily have to be a method on the Finder, although the default has a classmethod that is its finder.

An iterator is pretty straightforward; once you know the “path” (resource identifier) and the Finder for that path, you can call a function that checks for the presence of modules. In the case of FileFinder, that function is a combination of listdir, isfile, and isdir/isfile to check fordir/__init__ pairs indicating a submodule.

For our purposes, of course, we had to provide a path_hook that eclipses the existing path_hook, and we had a provide a Finder that was more precisely ours than the inherited base FileFinder, so that single dispatch would find ours before it found FileFinder‘s and still work correctly.


There is one other module I have to worry about: modulefinder. It’s not used often, it’s not used by Django or any of the other major tools that I usually use, and it’s never been covered by Python Module of the Week. That doesn’t mean that it’s hard-coding of the ‘.py’ suffix isn’t problematic. I’m just not sure what to do about it at this point.

It’s time to come around to a point that’s been bugging me for a long time: why is the Python import routine so, well, so darned convoluted? The answer is “history,” basically the history of Python and the attempt to turn import foo.bar.baz into a tool that’s incredibly easy to use and understand for the common programmer, yet flexible enough to give the advanced programmer the power to redefine it into whatever else it has to mean.

We’ve talked about how the system has two different loading systems: the sys.meta_path and the sys.path_hooks, and how the latter is just as arbitrary as the former: the last path_hook is for the filesystem, so it runs os.isdir() on every item in sys.path and only offers to handle the ones that returns true, and it only runs after everything else has been run, so:

  • If a meta_path interpreted an import fullname with respect to a path that’s a directory, the default won’t get it,
  • If a path_hook said it could handle it, the default won’t get it,

… and so on.  The whole point of  using first-one-wins priority pathing is to leave the responsibility for not failing up to the developer. The default really is the fallback position, and it uses only a subset of sys.path.  The formal type of a sys.path entry is… no type at all. It could be a string, a filesystem directory iterator, an object that interacts with a path_hook. It could be anything at all. The only consideration is that, if it can’t be coerced into a string that os.isdir() can reject, you had better handle it before it falls through to the default.

It’s really time to call it like it is: sys.path and sys.path_hooks are a special case for loading. They’re the original special case, but that’s what they are. They lead to weird results like one finder saying it can handle foo.bar.baz and another foo.bar.quux, turning the leading elements of the fullname into arbitrary and meaningless tokens.

I wish I could call for a more rational import system, one in which we talked only about resource managers which had the ability to access resource archives, iterate through the contents, identify corresponding resources, load the contents of that resource, and compilers that could identify the text that had just been accessed (via whatever metadata was available) and turn it into a Python module.

But we can’t. Python is too well-established to put up with such rationalizing shenanigans, and too many people are dependent upon the existing behavior to help make it so. Python was born when NFS was the thing, when there were no real open-source databases, no object stores. Python was released two years before the Mosaic web browser! It would be far too disruptive. So we’ll keep getting PEPs forever trying to rationalize the irrational.

That’s okay. It gives me something to get paid for.

But, it does point out one major flaw: because Finders and Loaders are so intimately linked, even if we manage to rationalize FileFinder and SourceFileLoader, that’s only with respect to the Filesystem. We’ll have to make equivalent loader/finders for any other sort of accessor, be it Zipfiles or any of the other wacky resource pools that people have come up with.

Unfortunately, I don’t have a good plan for those. Fortunately, filesystems are still the most common way of storing and loading libraries, so concentrating on those gets us 99% of the way there.

The Semantics of Python Import, Part 3: Loaders

In the last post we discussed Finders. The whole point of a Finder is to find a resource stored somewhere (usually a file on a filesystem, but it could be anything– a row in a database, a webpage, a range in a zip file) and supply the appropriate loader for it.

More accurately, there is a “FinderFinder” mechanism by which sys.meta_path and sys.path are searched to find the best Finder to run against a resource, and then the Finder is invoked to find the loader to load the resource. This lets Python differentiates between the archive (resource type– folder, database, zipfile, etc), the resource itself (file, row/column, zipfile index), and the type of that resource: source code (.py), compiled Python bytecode (.pyc or .pyo), or a compiled binary (.so or .dll) file that conforms to the Python ABI.

The point of the Loader is to take what the Finder has found and convert that resource into a stream of characters, which it then turns into Python executable code. Compared to the Finder, the Loader is pretty simple.

Typically, the Loader does whatever work is necessary to read in and convert (for example, to uncompress) the resource, compile it, attach the resulting compiled code as the executable to a new Module object, decorate the object with metadata, and then attach that new module object to the calling context, as well as caching a copy in sys.modules.

That’s more or less it.

Python 3.4 introduces the idea of a ModuleSpec, which describes the relationship between a module and its loader, in much the same way that the ModuleType describes a relationship between a module and the modules that import it.

Unfortunately for my needs, ModuleSpec doesn’t address several critical issues that we care about for the Heterogeneous Python project. It doesn’t really address the disconnect between Finders, Loaders, and the navigation of archives; Finders and Loaders are still very much related to each other with respect to the way a resource is identified and incorporated into the Python running instance.

Typical import tutorials focus on one of two different issues: loading Python source out of alternative resource types (like databases or websites), or loading alternative source code that cannot ever be confused with or treated as Python source. An example of the latter would be to have a path hook early in sys.path_hooks that says, “That path there belongs to me, and it contains CSV files, and when you import from it, the end result is an array of processed CSV rows.” By putting it before all other path hooks, that prevents Python from Finding inside that path and rejecting its contents for not having any .py files.

Our goals are different: A directory in sys.paths should be able to have a mixed code: CSV files, Hy (lisp) files, regular Python files, and byte-compiled Python files, and the loader/finder pair should be able to understand and interpret all of them correctly.

To do that, the loader has to be able to find the right compiler at load time. But there’s a problem: Python 2 hard-codes what suffixes (filename extensions) it recognizes and compiles as Python in the imp builtin module; in Python 3 these suffixes are constants defined in a private section of importlib; in either case, they are unavailable for modification. This lack of access to the extensions list prevents the discovery of heterogenous source code packages.

We have to get in front of Python’s native handlers, supply our own Finder that recognizes all our code-like suffixes, provides a source code loader that provides our compilers for our own suffixes and falls back on Python’s native loader behavior when we encounter native suffixes.

I can now announce that Polyloader accomplishes this.  After you import polyloader, you call polyloader.install(compiler, [extensions]) for files that compiler can handle, and it… works.

It works well with Hy. And it works performantly and without breakage on a modern Django application, allowing you to write Django models, views, urls, management commands, even manage.hy and settings.hy, in Hy.

There are three more posts in this series: Python Package Iterators, the resource-vs-compiler problem, and a really crazy idea that may break Python– or may finally get around all the other code that hard-codes “.py” problematically (I’m looking at you, django.core.migrations.loader, and you, modulefinder).

The Semantics of Python Import series has been important because the work I’m doing there supports work I’m doing elsewhere, namely modernizing the Hy import system to work with importlib, and then further shimming the import system to provide for heterogeneous source loaders.

My showcase for all this has been Catalogia, a music collection management program written in Hy and Django. Catalogia shows off all the inner workings of my Polyloader shim, and shows that even Django’s bizarre metaprogramming won’t break with Polyloader active. Download my version of Hy (that’s important, as I haven’t made a PR to the Hy people yet; I’m still testing Polyloader to make sure it’s not broken) into a (preferably Python 3) virtualenv, install Django, install Catalogia… and it might work. No promises.

But while I’m working on Python, I had another headache. One of Catalogia’s issues was that I wanted to find nested albums. My collection has over a thousand albums (I’m old, okay? When I was a teenager, collecting vinyl was the way to go. I still have my high school’s glee club albums on flippin’ vinyl!), and I’m sure somewhere along the line I messed up and mvd a file to the wrong place.

Catalogia’s primary key is the path to an MP3 file. The organizational scheme I’ve always used is “Artist – Album/Song.mp3″, so knowing if somehow one album folder wound up inside another was a good integrity check; that’s not supposed to happen.

PostgreSQL (which I’m using) doesn’t have POSIX basename and dirname functions. Why would it? But I really needed them, because I needed the dirname (folder path) to an MP3 file, so I know what folder an album represents, so I could check for nested albums.

So I wrote dirname and basename for PostgreSQL. The unit tests are taken from the Python unit tests for posixpath.py, and so ought to be competently congruent with Python, which is my whole point. They’re even fun to watch:

    name    | basetest | baseexpected | dirtest  | direxpected | base | dir  
------------+----------+--------------+----------+-------------+------+------
 /foo/bar   | bar      | bar          | /foo     | /foo        | PASS | PASS
 /          |          |              | /        | /           | PASS | PASS
 foo        | foo      | foo          |          |             | PASS | PASS
 ////foo    | foo      | foo          | ////     | ////        | PASS | PASS
 //foo//bar | bar      | bar          | //foo    | //foo       | PASS | PASS
 /foo/bar   | bar      | bar          | /foo     | /foo        | PASS | PASS
 /foo/bar/  |          |              | /foo/bar | /foo/bar    | PASS | PASS

Y’know, in case I’ve forgotten how to SQL.

But the best part came later, when I tried to test out whether or not it worked on my own database. The comparison was actually kinda… well…

    SELECT DISTINCT dirname(a.path) AS parent,
                    dirname(b.path) AS child
    FROM catalog_mp3 as a,
         catalog_mp3 as b,
    WHERE dirname(a.path) != dirname(b.path)
    AND   dirname(b.path) ~ ('^' || dirname(a.path));

On my laptop, that took 11 minutes and 52 seconds to run on a sample set of about 200 albums (folders). It was really disheartening. But then I realized that part of the reason it ran so slowly was because it was comparing title paths, not album paths, and it was performing the dirname comparison over and over.

The fix was obvious. Use a Common Table Expression to make a temporary table of the album paths, then run comparisons against the CTE:

    WITH prepped_paths AS (
      SELECT DISTINCT dirname(path) AS dpath FROM catalog_mp3)
         SELECT a.dpath AS parent, b.dpath AS child
         FROM prepped_paths AS a, prepped_paths AS b
         WHERE a.dpath != b.dpath
         AND a.dpath ~ ('^' || b.dpath);

That took 3 seconds, a speed-up factor of almost 240! Not too shabby at all.

But then I wondered… what if I preprocessed the comparison expression, and used LIKE instead of regexp?

     WITH prepped_paths AS (
       SELECT DISTINCT dirname(path) AS dpath,
                       (dirname(path) || '%') AS mpath
       FROM catalog_mp3)
         SELECT a.dpath AS parent, b.dpath AS child
         FROM prepped_paths AS a, prepped_paths AS b
         WHERE a.dpath != b.dpath
         AND a.dpath LIKE b.mpath;

88ms. A speed-up factor of 8000! Whee!

This does leave me with a conundrum, however. I could lock Catalogia down into using only PostgreSQL, which wouldn’t leave me heartbroken as I’m a PostgreSQL snob. The alternative is to store album paths with the album title fields. Which I may end up doing anyway in order to support MySQL and SQLite… but with a real RDBMS it’s derivable information, so the PostgreSQL way makes me much happier.

In the last post, I introduced the concepts of the module object, module, and package, concrete objects that exist within the Python runtime, as well as some basic ideas about packaging, finding, and loading.

In this post, I’ll go over the process of finding, what it means to find something, and what happens next.

A Clarifying point

I’ve been very careful to talk about finding vs. loading vs. listing in this series of posts. There’s a reason for that: in Python 2, the terms “Finder” and “Importer” were used interchangeably, leading to (at least on my part) massive confusion. In actual fact, finders, hooks, loaders, and listers are all individual objects, each with a single, unique method with a specific signature. The method name is different for each stage, so it is theoretically possible to define a single class that does all three for a given category of module object, and only in that case, I believe, should we talk about an “Importer.”

In Python 2.6 and 2.7, the definitive Finder class is called pkgutil.ImpImporter, and the Loader is called pkgutil.ImpLoader; this was a source of much of my confusion. In Python 3, the term “Importer” is deprecated and “Finder” is used throughout importlib. I will be using “Finder” from now on.

Finding

When the import <fullname> command is called, a procedure is triggered. That procedure then:

  • attempts to find a corresponding python module
  • attempts to load that corresponding module into bytecode
  • Associates the bytecode with the name via sys.modules[fullname]
  • Exposes the bytecode to the calling scope.
  • Optionally: writes the bytecode to the filesystem for future use

Finding is the act of identifying a resource that corresponds to the import string and that can be compiled into a meaningful Python module. The import string is typically called the fullname.

Finding typically involves scanning a collection of resources against a collection of finders. Finding ends when finder A, given fullname B, reports that a corresponding module can be found in resource C, and that the resource can be loaded with loader D.”

MetaFinders

Finders come first, and MetaFinders come before all other kinds of finders.

Most finding is done in the context of sys.path; that is, Python’s primary means of organizing Python modules is to have them somewhere on the local filesystem. This makes sense. Sometimes, however, you want to get in front of that scan and impose your own logic: you want the root of an import string to mean something else. Maybe instead of directory.file, you want it to mean table.row.cell, or you want it to mean website.path.object, to take one terrifying example.

That’s what you do with a MetaFinder: A MetaFinder may choose to ignore the entire sys.path mechanism and do something that has nothing to do with the filesystem, or it may have its own filesystem notion completely separate from sys.path, or it may have its own take on what to do with sys.path. That last is how zipimporter works; it looks for ZIP files on sys.path and attempts to interpret them as packages. (This doesn’t confuse the default finder as its default behavior is to ignore everything in sys.path that isn’t a directory. The formal type of sys.path is… debatable.)

A Finder (both MetaFinder and FileFinder) is any object with the following method:

[Loader|None] find_module([self|cls], fullname:string, path:[string|None])

The find_module method returns None if it cannot find a loader resource for the provided fullname. The path is optional; in the standard Python implementation, when the path is None it means “use `sys.path`”; when it’s set, it’s the path in which to look.

A MetaFinder is placed into the list sys.meta_path by whatever code needs the MetaFinder, and it persists for the duration of the runtime, unless it is later removed or replaced. Being a list, the search is ordered; first match wins. MetaFinders may be instantiated in any way the developer desires before being added into sys.meta_path.

PathHooks and PathFinders

PathHooks are how sys.path is scanned to determine the which Finder should be associated with a given directory path.

A PathHook is a function (or callable):

[Finder|None] <anonymous function>(path:string)

A PathHook takes a given directory path and, if the PathHook can identify a corresponding FileFinder for the modules in that directory path and return a constructed instance of that FileFinder, otherwise it returns None.

If no sys.meta_path finder returns a Loader, the full array of sys.paths ⨯ sys.path_hooks is compared until a PathHook says it can handle the path and the corresponding finder says it can handle the fullname. If no match happens, Python’s default FileFinder class is instantiated with the path.

This means that for each path in sys.paths, the list of sys.path_hooks is scanned; the first function to return an importer is handed responsibility for that path; if no function returns, the default FileFinder is returned; the default FileFinder returns only the default SourceFileLoader which (if you read to the end of part one) blocks our path toward heterogeneous packages.

PathHooks are placed into the list sys.path_hooks; like sys.meta_path, the list is ordered and first one wins.

The Takeaway

There’s some confusion over the difference between the two objects, so let’s clarify one last time.

Use a meta_finder (A Finder in sys.meta_path) when you want to redefine the meaning of the import string so it can search alternative paths that may have no reference to a filesystem path found in sys.path; an import string could be redefined as a location in an archive, an RDF triple of document/tag/content, or table/row_id/cell, or be interpreted as a URL to a remote resource.

Use a path_hook (A function in sys.path_hooks that returns a FileFinder) when you want to re-interpret the meaning of an import string that refers to a module object on or accessible by sys.path; PathHooks are important when you want to add directories to sys.path that contain something other than .py, .pyc/.pyo, and .so modules conforming to the Python ABI.

A MetaFinder is typically constructed when it is added to sys.meta_path; a PathHook instantiates a FileFinder when the PathHook function lays claim to the path. The developer instantiates a MetaFinder before adding it to sys.meta_path; it’s the PathHook function that instantiates a FileFinder, passing it the path as an argument to __init__.

Next

Note that PathHooks are for paths containing something other than the traditional (and hard-coded) source file extensions. The purpose of a heterogeneous source file finder and loader is to enable finding in directories within sys.path that contain other source files syntaxes alongside those traditional sources. I need to eclipse (that is, get in front of) the default FileFinder with one that understands more suffixes than those listed in either imp.get_suffixes() (Python 2) or importlib._bootstrap.SOURCE_SUFFIXES (Python 3). I need one that will return the Python default loader if it encounters the Python default suffixes, but will invoke our own source file loader when encountering one of our suffixes.

We’ll talk about loading next.

A minor bug in the Hy programming language has led me down a rabbit hole of Python’s internals, and I seem to have absorbed an awful lot of Python’s import semantics. The main problem can best be described this way: In Python, you call the import function with a string; that string gets translated in some way into python code. So: what are the exact semantics of the python import command?

Over the next couple of posts, I’ll try to accurately describe what it means when you write:

import alpha.beta.gamma
from alpha import beta
from alpha.beta import gamma
from .delta import epsilon

In each case, python is attempting to resolve the collection of dotted names into a module object.

module object: A resource that is or can be compiled into a meaningful Python module. This resource could a file on a filesystem, a cell in a database, a remote web object, a stream of bytes in an object store, some content object in a compressed archive, or anything that can meaningfully be described as an array of bytes (Python 2) or characters (Python 3). It could even be dynamically generated!

module: The organizational unit of Python code. A namespace containing Python objects, including classes, functions, submodules, and immediately invoked code. Modules themselves may be collected into packages.

package: A python module which contains submodules or even subpackages. The most common packaging scheme is a directory folder; in this case the folder is a module if it contains an __init__.py file, and it is a package if it contains other modules. The name of the package is the folder name; the name of a submodule would be foldername.submodule. This is called regular packaging. An alternative method (which I might cover later) is known as namespace packaging.

Python has a baroque but generally flexible mechanism for defining how the dotted name is turned into a module object, which it calls module finding, and for how that module object is turned into a code object within the current Python session, called module loading.

Python also has a means for module listing. Listing is usually done on a list of paths, using an appropriate means for finding (identifying) the contents at the end of a path as Python modules.

The technical definition of a package is a module with a __path__, a list of paths that contain submodules for the package. Subpackages get their own__path__. A package can therefore accommodate . and .. prefixes in submodules, indicating relative paths to sibling modules. A package can also and to its own __path__ collection to enable access to submodules elsewhere.

 


The problem I am trying to solve:

Python module listing depends upon a finder resolving a path to a container of modules, usually (but not necessarily) a package. The very last finder is the default one: after all alternatives provided by users have been exhausted, Python reverts to the default behavior of analyzing the filesystem, as one would expect. The default finder is hard-coded to use the Python builtin imp.get_suffixes() function, which in turn hard-codes the extensions recognized by the importer.

If one wants to supply alternative syntaxes for Python and have heterogenous packages (for examples, packages that contain some modules ending in .hy, and others .py, side-by-side)… well, that’s just not possible.

Yet.

In the next post, I’ll discuss Python’s two different finder resolution mechanisms, the meta_path and the path_hook, and how they differ, and which one we’ll need to instantiate to solve the problem of heterogenous Python syntaxes. The actual solution will eventually involve eclipsing Python’s default source file handler with one that enables us to define new source file extensions at run-time, recognize the source file extensions, and supply the appropriate compilers for those source files, while falling back on the default behavior correctly when the extension is one Python can handle by itself.

My hope is that, once solved, this will further enable the development of Python alternative syntaxes. Folks bemoan the explosion of Javascript precompilers, but the truth is that it has in part led to a revival in industrial programming languages and a renaissance in programming language development in general.   Python, with its AST available and exposed at runtime, is eminently suitable as an alternative language research platform– except for this one little problem.

I am trying to solve that problem.

Calendar

August 2016
M T W T F S S
« Jul    
1234567
891011121314
15161718192021
22232425262728
293031