Module Iterators, as defined in pkgutil.py, aren’t really part of the mess that has been imposed on us by PEP-302 and its follow-on attempts to rationalize the loading process, but they’re used by so many different libraries that when we talk about creating a new general class of importers, we have to talk about iterators.

Iterators, after all, are why I started down this project in the first place. It was Django’s inability to find heterogeneously defined modules that I set out to fix.

Iterators are define in the pgkutil module; their entire purpose is, given some kind of reference to an archive, to be able to list the contents of that archive, and to recursively descend into that archive if it happens to be a tree-like structure.

When you call pkgutil.iter_modules(path, prefix), you get back a list of all the modules within that path or, if no path is supplied, all the paths in sys.path. As I pointed out in my last post, the paths is sys.path aren’t necessarily paths on the filesystem or, if they are, they’re not necessarily directory paths. All that matters is that for each path, a path_hook exists that can return a Finder, and that Finder has a method for listing the contents of the path found.

In Python 2, pkgutil depends upon Finders (those things we said were attached to meta_path and path_hooks) to have a special function called iter_modules; if it does, that function is used to list the contents of the “path”.

In Python 3, the functools.singledispatch tools is used to differentiate between different Finders; once a Finder has been identified by path_hooks, the singledispath us used to find a corresponding resource iterator for that Finder. It doesn’t necessarily have to be a method on the Finder, although the default has a classmethod that is its finder.

An iterator is pretty straightforward; once you know the “path” (resource identifier) and the Finder for that path, you can call a function that checks for the presence of modules. In the case of FileFinder, that function is a combination of listdir, isfile, and isdir/isfile to check fordir/__init__ pairs indicating a submodule.

For our purposes, of course, we had to provide a path_hook that eclipses the existing path_hook, and we had a provide a Finder that was more precisely ours than the inherited base FileFinder, so that single dispatch would find ours before it found FileFinder‘s and still work correctly.


There is one other module I have to worry about: modulefinder. It’s not used often, it’s not used by Django or any of the other major tools that I usually use, and it’s never been covered by Python Module of the Week. That doesn’t mean that it’s hard-coding of the ‘.py’ suffix isn’t problematic. I’m just not sure what to do about it at this point.

It’s time to come around to a point that’s been bugging me for a long time: why is the Python import routine so, well, so darned convoluted? The answer is “history,” basically the history of Python and the attempt to turn import foo.bar.baz into a tool that’s incredibly easy to use and understand for the common programmer, yet flexible enough to give the advanced programmer the power to redefine it into whatever else it has to mean.

We’ve talked about how the system has two different loading systems: the sys.meta_path and the sys.path_hooks, and how the latter is just as arbitrary as the former: the last path_hook is for the filesystem, so it runs os.isdir() on every item in sys.path and only offers to handle the ones that returns true, and it only runs after everything else has been run, so:

  • If a meta_path interpreted an import fullname with respect to a path that’s a directory, the default won’t get it,
  • If a path_hook said it could handle it, the default won’t get it,

… and so on.  The whole point of  using first-one-wins priority pathing is to leave the responsibility for not failing up to the developer. The default really is the fallback position, and it uses only a subset of sys.path.  The formal type of a sys.path entry is… no type at all. It could be a string, a filesystem directory iterator, an object that interacts with a path_hook. It could be anything at all. The only consideration is that, if it can’t be coerced into a string that os.isdir() can reject, you had better handle it before it falls through to the default.

It’s really time to call it like it is: sys.path and sys.path_hooks are a special case for loading. They’re the original special case, but that’s what they are. They lead to weird results like one finder saying it can handle foo.bar.baz and another foo.bar.quux, turning the leading elements of the fullname into arbitrary and meaningless tokens.

I wish I could call for a more rational import system, one in which we talked only about resource managers which had the ability to access resource archives, iterate through the contents, identify corresponding resources, load the contents of that resource, and compilers that could identify the text that had just been accessed (via whatever metadata was available) and turn it into a Python module.

But we can’t. Python is too well-established to put up with such rationalizing shenanigans, and too many people are dependent upon the existing behavior to help make it so. Python was born when NFS was the thing, when there were no real open-source databases, no object stores. Python was released two years before the Mosaic web browser! It would be far too disruptive. So we’ll keep getting PEPs forever trying to rationalize the irrational.

That’s okay. It gives me something to get paid for.

But, it does point out one major flaw: because Finders and Loaders are so intimately linked, even if we manage to rationalize FileFinder and SourceFileLoader, that’s only with respect to the Filesystem. We’ll have to make equivalent loader/finders for any other sort of accessor, be it Zipfiles or any of the other wacky resource pools that people have come up with.

Unfortunately, I don’t have a good plan for those. Fortunately, filesystems are still the most common way of storing and loading libraries, so concentrating on those gets us 99% of the way there.

The Semantics of Python Import, Part 3: Loaders

In the last post we discussed Finders. The whole point of a Finder is to find a resource stored somewhere (usually a file on a filesystem, but it could be anything– a row in a database, a webpage, a range in a zip file) and supply the appropriate loader for it.

More accurately, there is a “FinderFinder” mechanism by which sys.meta_path and sys.path are searched to find the best Finder to run against a resource, and then the Finder is invoked to find the loader to load the resource. This lets Python differentiates between the archive (resource type– folder, database, zipfile, etc), the resource itself (file, row/column, zipfile index), and the type of that resource: source code (.py), compiled Python bytecode (.pyc or .pyo), or a compiled binary (.so or .dll) file that conforms to the Python ABI.

The point of the Loader is to take what the Finder has found and convert that resource into a stream of characters, which it then turns into Python executable code. Compared to the Finder, the Loader is pretty simple.

Typically, the Loader does whatever work is necessary to read in and convert (for example, to uncompress) the resource, compile it, attach the resulting compiled code as the executable to a new Module object, decorate the object with metadata, and then attach that new module object to the calling context, as well as caching a copy in sys.modules.

That’s more or less it.

Python 3.4 introduces the idea of a ModuleSpec, which describes the relationship between a module and its loader, in much the same way that the ModuleType describes a relationship between a module and the modules that import it.

Unfortunately for my needs, ModuleSpec doesn’t address several critical issues that we care about for the Heterogeneous Python project. It doesn’t really address the disconnect between Finders, Loaders, and the navigation of archives; Finders and Loaders are still very much related to each other with respect to the way a resource is identified and incorporated into the Python running instance.

Typical import tutorials focus on one of two different issues: loading Python source out of alternative resource types (like databases or websites), or loading alternative source code that cannot ever be confused with or treated as Python source. An example of the latter would be to have a path hook early in sys.path_hooks that says, “That path there belongs to me, and it contains CSV files, and when you import from it, the end result is an array of processed CSV rows.” By putting it before all other path hooks, that prevents Python from Finding inside that path and rejecting its contents for not having any .py files.

Our goals are different: A directory in sys.paths should be able to have a mixed code: CSV files, Hy (lisp) files, regular Python files, and byte-compiled Python files, and the loader/finder pair should be able to understand and interpret all of them correctly.

To do that, the loader has to be able to find the right compiler at load time. But there’s a problem: Python 2 hard-codes what suffixes (filename extensions) it recognizes and compiles as Python in the imp builtin module; in Python 3 these suffixes are constants defined in a private section of importlib; in either case, they are unavailable for modification. This lack of access to the extensions list prevents the discovery of heterogenous source code packages.

We have to get in front of Python’s native handlers, supply our own Finder that recognizes all our code-like suffixes, provides a source code loader that provides our compilers for our own suffixes and falls back on Python’s native loader behavior when we encounter native suffixes.

I can now announce that Polyloader accomplishes this.  After you import polyloader, you call polyloader.install(compiler, [extensions]) for files that compiler can handle, and it… works.

It works well with Hy. And it works performantly and without breakage on a modern Django application, allowing you to write Django models, views, urls, management commands, even manage.hy and settings.hy, in Hy.

There are three more posts in this series: Python Package Iterators, the resource-vs-compiler problem, and a really crazy idea that may break Python– or may finally get around all the other code that hard-codes “.py” problematically (I’m looking at you, django.core.migrations.loader, and you, modulefinder).

The Semantics of Python Import series has been important because the work I’m doing there supports work I’m doing elsewhere, namely modernizing the Hy import system to work with importlib, and then further shimming the import system to provide for heterogeneous source loaders.

My showcase for all this has been Catalogia, a music collection management program written in Hy and Django. Catalogia shows off all the inner workings of my Polyloader shim, and shows that even Django’s bizarre metaprogramming won’t break with Polyloader active. Download my version of Hy (that’s important, as I haven’t made a PR to the Hy people yet; I’m still testing Polyloader to make sure it’s not broken) into a (preferably Python 3) virtualenv, install Django, install Catalogia… and it might work. No promises.

But while I’m working on Python, I had another headache. One of Catalogia’s issues was that I wanted to find nested albums. My collection has over a thousand albums (I’m old, okay? When I was a teenager, collecting vinyl was the way to go. I still have my high school’s glee club albums on flippin’ vinyl!), and I’m sure somewhere along the line I messed up and mvd a file to the wrong place.

Catalogia’s primary key is the path to an MP3 file. The organizational scheme I’ve always used is “Artist – Album/Song.mp3″, so knowing if somehow one album folder wound up inside another was a good integrity check; that’s not supposed to happen.

PostgreSQL (which I’m using) doesn’t have POSIX basename and dirname functions. Why would it? But I really needed them, because I needed the dirname (folder path) to an MP3 file, so I know what folder an album represents, so I could check for nested albums.

So I wrote dirname and basename for PostgreSQL. The unit tests are taken from the Python unit tests for posixpath.py, and so ought to be competently congruent with Python, which is my whole point. They’re even fun to watch:

    name    | basetest | baseexpected | dirtest  | direxpected | base | dir  
------------+----------+--------------+----------+-------------+------+------
 /foo/bar   | bar      | bar          | /foo     | /foo        | PASS | PASS
 /          |          |              | /        | /           | PASS | PASS
 foo        | foo      | foo          |          |             | PASS | PASS
 ////foo    | foo      | foo          | ////     | ////        | PASS | PASS
 //foo//bar | bar      | bar          | //foo    | //foo       | PASS | PASS
 /foo/bar   | bar      | bar          | /foo     | /foo        | PASS | PASS
 /foo/bar/  |          |              | /foo/bar | /foo/bar    | PASS | PASS

Y’know, in case I’ve forgotten how to SQL.

But the best part came later, when I tried to test out whether or not it worked on my own database. The comparison was actually kinda… well…

    SELECT DISTINCT dirname(a.path) AS parent,
                    dirname(b.path) AS child
    FROM catalog_mp3 as a,
         catalog_mp3 as b,
    WHERE dirname(a.path) != dirname(b.path)
    AND   dirname(b.path) ~ ('^' || dirname(a.path));

On my laptop, that took 11 minutes and 52 seconds to run on a sample set of about 200 albums (folders). It was really disheartening. But then I realized that part of the reason it ran so slowly was because it was comparing title paths, not album paths, and it was performing the dirname comparison over and over.

The fix was obvious. Use a Common Table Expression to make a temporary table of the album paths, then run comparisons against the CTE:

    WITH prepped_paths AS (
      SELECT DISTINCT dirname(path) AS dpath FROM catalog_mp3)
         SELECT a.dpath AS parent, b.dpath AS child
         FROM prepped_paths AS a, prepped_paths AS b
         WHERE a.dpath != b.dpath
         AND a.dpath ~ ('^' || b.dpath);

That took 3 seconds, a speed-up factor of almost 240! Not too shabby at all.

But then I wondered… what if I preprocessed the comparison expression, and used LIKE instead of regexp?

     WITH prepped_paths AS (
       SELECT DISTINCT dirname(path) AS dpath,
                       (dirname(path) || '%') AS mpath
       FROM catalog_mp3)
         SELECT a.dpath AS parent, b.dpath AS child
         FROM prepped_paths AS a, prepped_paths AS b
         WHERE a.dpath != b.dpath
         AND a.dpath LIKE b.mpath;

88ms. A speed-up factor of 8000! Whee!

This does leave me with a conundrum, however. I could lock Catalogia down into using only PostgreSQL, which wouldn’t leave me heartbroken as I’m a PostgreSQL snob. The alternative is to store album paths with the album title fields. Which I may end up doing anyway in order to support MySQL and SQLite… but with a real RDBMS it’s derivable information, so the PostgreSQL way makes me much happier.

In the last post, I introduced the concepts of the module object, module, and package, concrete objects that exist within the Python runtime, as well as some basic ideas about packaging, finding, and loading.

In this post, I’ll go over the process of finding, what it means to find something, and what happens next.

A Clarifying point

I’ve been very careful to talk about finding vs. loading vs. listing in this series of posts. There’s a reason for that: in Python 2, the terms “Finder” and “Importer” were used interchangeably, leading to (at least on my part) massive confusion. In actual fact, finders, hooks, loaders, and listers are all individual objects, each with a single, unique method with a specific signature. The method name is different for each stage, so it is theoretically possible to define a single class that does all three for a given category of module object, and only in that case, I believe, should we talk about an “Importer.”

In Python 2.6 and 2.7, the definitive Finder class is called pkgutil.ImpImporter, and the Loader is called pkgutil.ImpLoader; this was a source of much of my confusion. In Python 3, the term “Importer” is deprecated and “Finder” is used throughout importlib. I will be using “Finder” from now on.

Finding

When the import <fullname> command is called, a procedure is triggered. That procedure then:

  • attempts to find a corresponding python module
  • attempts to load that corresponding module into bytecode
  • Associates the bytecode with the name via sys.modules[fullname]
  • Exposes the bytecode to the calling scope.
  • Optionally: writes the bytecode to the filesystem for future use

Finding is the act of identifying a resource that corresponds to the import string and that can be compiled into a meaningful Python module. The import string is typically called the fullname.

Finding typically involves scanning a collection of resources against a collection of finders. Finding ends when finder A, given fullname B, reports that a corresponding module can be found in resource C, and that the resource can be loaded with loader D.”

MetaFinders

Finders come first, and MetaFinders come before all other kinds of finders.

Most finding is done in the context of sys.path; that is, Python’s primary means of organizing Python modules is to have them somewhere on the local filesystem. This makes sense. Sometimes, however, you want to get in front of that scan and impose your own logic: you want the root of an import string to mean something else. Maybe instead of directory.file, you want it to mean table.row.cell, or you want it to mean website.path.object, to take one terrifying example.

That’s what you do with a MetaFinder: A MetaFinder may choose to ignore the entire sys.path mechanism and do something that has nothing to do with the filesystem, or it may have its own filesystem notion completely separate from sys.path, or it may have its own take on what to do with sys.path. That last is how zipimporter works; it looks for ZIP files on sys.path and attempts to interpret them as packages. (This doesn’t confuse the default finder as its default behavior is to ignore everything in sys.path that isn’t a directory. The formal type of sys.path is… debatable.)

A Finder (both MetaFinder and FileFinder) is any object with the following method:

[Loader|None] find_module([self|cls], fullname:string, path:[string|None])

The find_module method returns None if it cannot find a loader resource for the provided fullname. The path is optional; in the standard Python implementation, when the path is None it means “use `sys.path`”; when it’s set, it’s the path in which to look.

A MetaFinder is placed into the list sys.meta_path by whatever code needs the MetaFinder, and it persists for the duration of the runtime, unless it is later removed or replaced. Being a list, the search is ordered; first match wins. MetaFinders may be instantiated in any way the developer desires before being added into sys.meta_path.

PathHooks and PathFinders

PathHooks are how sys.path is scanned to determine the which Finder should be associated with a given directory path.

A PathHook is a function (or callable):

[Finder|None] <anonymous function>(path:string)

A PathHook takes a given directory path and, if the PathHook can identify a corresponding FileFinder for the modules in that directory path and return a constructed instance of that FileFinder, otherwise it returns None.

If no sys.meta_path finder returns a Loader, the full array of sys.paths ⨯ sys.path_hooks is compared until a PathHook says it can handle the path and the corresponding finder says it can handle the fullname. If no match happens, Python’s default FileFinder class is instantiated with the path.

This means that for each path in sys.paths, the list of sys.path_hooks is scanned; the first function to return an importer is handed responsibility for that path; if no function returns, the default FileFinder is returned; the default FileFinder returns only the default SourceFileLoader which (if you read to the end of part one) blocks our path toward heterogeneous packages.

PathHooks are placed into the list sys.path_hooks; like sys.meta_path, the list is ordered and first one wins.

The Takeaway

There’s some confusion over the difference between the two objects, so let’s clarify one last time.

Use a meta_finder (A Finder in sys.meta_path) when you want to redefine the meaning of the import string so it can search alternative paths that may have no reference to a filesystem path found in sys.path; an import string could be redefined as a location in an archive, an RDF triple of document/tag/content, or table/row_id/cell, or be interpreted as a URL to a remote resource.

Use a path_hook (A function in sys.path_hooks that returns a FileFinder) when you want to re-interpret the meaning of an import string that refers to a module object on or accessible by sys.path; PathHooks are important when you want to add directories to sys.path that contain something other than .py, .pyc/.pyo, and .so modules conforming to the Python ABI.

A MetaFinder is typically constructed when it is added to sys.meta_path; a PathHook instantiates a FileFinder when the PathHook function lays claim to the path. The developer instantiates a MetaFinder before adding it to sys.meta_path; it’s the PathHook function that instantiates a FileFinder, passing it the path as an argument to __init__.

Next

Note that PathHooks are for paths containing something other than the traditional (and hard-coded) source file extensions. The purpose of a heterogeneous source file finder and loader is to enable finding in directories within sys.path that contain other source files syntaxes alongside those traditional sources. I need to eclipse (that is, get in front of) the default FileFinder with one that understands more suffixes than those listed in either imp.get_suffixes() (Python 2) or importlib._bootstrap.SOURCE_SUFFIXES (Python 3). I need one that will return the Python default loader if it encounters the Python default suffixes, but will invoke our own source file loader when encountering one of our suffixes.

We’ll talk about loading next.

A minor bug in the Hy programming language has led me down a rabbit hole of Python’s internals, and I seem to have absorbed an awful lot of Python’s import semantics. The main problem can best be described this way: In Python, you call the import function with a string; that string gets translated in some way into python code. So: what are the exact semantics of the python import command?

Over the next couple of posts, I’ll try to accurately describe what it means when you write:

import alpha.beta.gamma
from alpha import beta
from alpha.beta import gamma
from .delta import epsilon

In each case, python is attempting to resolve the collection of dotted names into a module object.

module object: A resource that is or can be compiled into a meaningful Python module. This resource could a file on a filesystem, a cell in a database, a remote web object, a stream of bytes in an object store, some content object in a compressed archive, or anything that can meaningfully be described as an array of bytes (Python 2) or characters (Python 3). It could even be dynamically generated!

module: The organizational unit of Python code. A namespace containing Python objects, including classes, functions, submodules, and immediately invoked code. Modules themselves may be collected into packages.

package: A python module which contains submodules or even subpackages. The most common packaging scheme is a directory folder; in this case the folder is a module if it contains an __init__.py file, and it is a package if it contains other modules. The name of the package is the folder name; the name of a submodule would be foldername.submodule. This is called regular packaging. An alternative method (which I might cover later) is known as namespace packaging.

Python has a baroque but generally flexible mechanism for defining how the dotted name is turned into a module object, which it calls module finding, and for how that module object is turned into a code object within the current Python session, called module loading.

Python also has a means for module listing. Listing is usually done on a list of paths, using an appropriate means for finding (identifying) the contents at the end of a path as Python modules.

The technical definition of a package is a module with a __path__, a list of paths that contain submodules for the package. Subpackages get their own__path__. A package can therefore accommodate . and .. prefixes in submodules, indicating relative paths to sibling modules. A package can also and to its own __path__ collection to enable access to submodules elsewhere.

 


The problem I am trying to solve:

Python module listing depends upon a finder resolving a path to a container of modules, usually (but not necessarily) a package. The very last finder is the default one: after all alternatives provided by users have been exhausted, Python reverts to the default behavior of analyzing the filesystem, as one would expect. The default finder is hard-coded to use the Python builtin imp.get_suffixes() function, which in turn hard-codes the extensions recognized by the importer.

If one wants to supply alternative syntaxes for Python and have heterogenous packages (for examples, packages that contain some modules ending in .hy, and others .py, side-by-side)… well, that’s just not possible.

Yet.

In the next post, I’ll discuss Python’s two different finder resolution mechanisms, the meta_path and the path_hook, and how they differ, and which one we’ll need to instantiate to solve the problem of heterogenous Python syntaxes. The actual solution will eventually involve eclipsing Python’s default source file handler with one that enables us to define new source file extensions at run-time, recognize the source file extensions, and supply the appropriate compilers for those source files, while falling back on the default behavior correctly when the extension is one Python can handle by itself.

My hope is that, once solved, this will further enable the development of Python alternative syntaxes. Folks bemoan the explosion of Javascript precompilers, but the truth is that it has in part led to a revival in industrial programming languages and a renaissance in programming language development in general.   Python, with its AST available and exposed at runtime, is eminently suitable as an alternative language research platform– except for this one little problem.

I am trying to solve that problem.

14Jun

The weirdest interview I ever had…

Posted by Elf Sternberg as chat

The weirdest interview I ever had was shortly after I got laid off from Isilon. One-fifth of Isilon had been laid off in one brutal morning, and I was among them. The economy was deep into the recession, and like everyone else who had been laid off recently I had put out my resume and my shingle, then settled into my basement to teach myself everything I hadn’t learned in the previous eight years.

A large hardware-based American computer company whose name I won’t repeat, but which was once headed by a former US presidential contender infamous for an ad featuring demon sheep, reached out through its recruitment efforts and asked if I would come in for an interview. They said my skills were a perfect fit for a new product they had in mind.

The interview was way the hell out in Bellevue, hours and miles from my house, and utterly inconvenient for me. When I got there, I learned that their “new product” was basically a direct competitor for Isilon’s; they were trying to get into the mass storage market, and were having a hard time of it.

I was scheduled to be there all day. But as the first hour wound down, I already knew the job wasn’t for me. I didn’t really have the skills they wanted; I wasn’t all that familiar with the C/C++ side of the business, after all. I knew more about the Isilon product from the configuration side than anyone outside of field support, but not much about the internals. More than that, the job didn’t sound interesting. Java and Swing weren’t technologies I was particularly interested in. The office environment seemed cold and unfriendly. The commute was murderous. And I’d just gotten cut from an infrastructure job; why would I want another one?

When the second interviewer came in, I said to him, “I’m sorry, but this job isn’t really for me. I think we should call it here.”

He looked at me, confused. “You mean you just want to… stop?”

I said, “I’m not going to take the job.”

“But…” He stopped, utterly still, for a few moments, then rose and left. A few minutes later, he came back and said, “The candidate isn’t allowed to end the interview process.”

I told him flatly, “I don’t think you can legally stop me from leaving the building.”

He stopped for a moment and looked very confused.  “No, that’s true,” he said. “Okay, I guess I’ll, um, escort you out.”

“Thank you.”

To this day, I have no idea what he meant by “The candidate isn’t allowed to end the interview process.”  I suspect that what he got from the higher ups he went to talk to was “There’s no process for the candidate ending the interview,” and went from there, but… weird.

28May

Making any language loadable in Python

Posted by Elf Sternberg as Lisp, python

I may have inadvertently contributed to the Coffeescriptification of Python. And in doing so, I have learned way too much about Python internals.

Coffeescript probably wasn’t the first programming language that transpiled to Javascript, but it was the first that caught on. Coffeescript and all its successors, like Clojurescript, Gorilla, and Earl Grey, exist because Javascript is a terrible language written on top of a truly universal runtime: the browser. Then some bright person took that runtime and put it on the server and called it Node, and then someone else took it and put it in the database and called it PLv8, and folks who’d been battle-hardened living in the browser could now be productive on the full stack. Transpilers exists because, rather than write in that terrible language, they allow people to write code in a language they like and it ends up all running in Javascript in the end.

Python doesn’t have a universe like this. Because Python is actually a fairly good dynamic language, and the syntax is pretty much the best of the pick of its competition from Perl, Ruby, Visual Basic, or obscurities like Pike or Io. But the Python runtime is not much loved; the global interpreter lock constrains Python’s performance in many ways.

There are a few, though, including:

  • Hy, a Lisp
  • Mochi, a Python derivative with Erlang actors
  • Dogelang, an ML-imitating… thing

With the coming Gilectomy, Python may finally move past some of its more egregious runtime limitations, and when it does, maybe alternative language developer will start to look at it more seriously, and develop languages that run on top of it the way Pure or Elm run on top of Javascript, yet manage to do so while honoring functional purity in a very real way.

But Python has one other limitation: the extension “.py” is hard coded into the import library that ships with python. Hy had a fairly good work-around that let it load its libraries and python side-by-side, and Doge seems to have a similar solution, but none of them were universal.

Then I learned that Hy doesn’t work with Django.

This all started out because I wanted to write a Django app. It had been awhile, and I had a specific desire to take what I’d learned writing mp_suggest and do something a little more complex with it, as a way of organizing the several thousands of CDs and their rips I have on my home NAS server.

I’m sure there are Django apps out there that already do this, but I wanted to write my own, and as you’ll see if you follow that like mp_suggest runs on Hy. Most of Django worked, but custom commands couldn’t be found by a Hy-based version of manage.hy, which annoyed me.

An itch that needed scratching! It took me two weeks; I probably put about 20 hours into untangling all of it. The real issue was that Hy was hooking into Python’s sys.meta_path, when what was needed was something even earlier in the import chain, a sys.path_hook. Path hooks are black magic. I’ll have a more technical entry up in a little bit.

In the meantime, here’s the patch. At first, it was just hooking up the path_hook, but as I was looking at the code, I realized that the hy-specific parts were actually fairly limited. It came down to two items: filename extensions, and a compiler. Everything else was just… boilerplate.

So I abstracted it out.

It’s now possible, by adding a few extra imports to your manage.py file, to build a Django app where:

  • Your views are written in Hy
  • Your models are written in Mochi
  • Your custom commands are written in Doge

And it all just works.

There have always been attitudes that I don’t understand. James Hague’s recent Death of a Language Dilettante is one of them. Now, most people know that I’m a language dilettante; that I love everything new and shiny coming out of the programming language space, even if it does seem that in some ways that space has been tapped out, with languages becoming ever more blunt on one side with Go and Python, and ever more esoteric on the other with Agda and Idris.

James issues a challenge: “Give a language with a poor reputation (JavaScript, Perl) to someone who knows it passably well and–this is the key–has a strong work ethic. Let the language dilettante use whatever he or she wants, something with the best type system, hygenic macros, you name it. Give them both a real-world task to accomplish. My money is on the first person by a wide margin.”

Well, sure. Tooling matters. Experience matters. In the ${DAY_JOB} I write a ton of Javascript, and I write it more than passably well. I also write a ton of Python, again more than passably well. Those are two languages at which I consider myself working at an expert level.

On the other hand, I also love esoteric languages and, more importantly, every time I’ve worked in one, it’s made me a better programmer in the languages I use professionally. Learning Haskell made me really appreciate your responsibility for separating TheWorld from the program, and taught me the power of higher-order functions; learning Lisp made me really appreciate the power of expressions qua expressions in a way Haskell tried hard to make so simple you didn’t notice them. When you work in a “real world” language, those matters are present in everyday details, and knowing how to work with them is a gift, not a disease.

Right now, my “esoteric” language is Hy. Hy is an implementation of Lisp that produces a Python AST and runs on the Python VM. Hy implements its Python importlib layer incorrectly, and in a way that means certain Django features are unavailable to Hy developers, and I intend to fix it.

Unfortunately, Python’s importlib is pretty esoteric in its own right, and implementing a new pather/finder/loader has been a beast. But I’m closing in on a solution, and I intend to have it working soon.

And then I can do what I set out to do, which is write a Django app. Only I can do it in Hy.

And along the way, maybe get a new Django command, newhyproject, out of it, as well as newhyapp and maybe newhymigrations. But those are stretch goals. Right now, all I want is for Hy to interoperate with Django correctly, and then I can think about future projects.

So I’ll take James’s challenge, but I’ll have to say: “Both is good.”

There’s a comment I get from my peers from time to time that boils down to: “Your code is tight and expressive, and the names are well-chosen, but it’s ugly and it makes reviewers uncomfortable.” And that’s when I realized that functional programming, by its very nature, causes us to write visually unattractive code.

The human eye likes rhythm. Aesthetically pleasing visual arrangements often involve creating a sense of flow, a sense of time, and a sense of transition from one scene to the next. Repetition is a key element of beauty. Artists and musicians come up with motifs and set up the expectation of repetition, and then maintain, modify, or diverge from the repetition into a new landscape where a new repetitious motif must be established and maintained in order to sustain the impression of beauty.

This is utterly in contrast to DRY: Don’t Repeat Yourself.

Well-constructed, highly functional source code written in a traditional enterprise language like Javascript, Python or C++ has very little repetition. It’s not supposed to. The comforting rhythm of a long if/else tree is broken out into a lookup table. The comforting rhythm of long transformations is itself transformed into a brutally compact map/reduce. As we try to make the syntax of the language itself more succinct (the Javascript keyword function has been brutally shortened to =>, for one very recent example), familiar “hooks” onto which the eye might rest and understand become harder to find.

It’s not all this bad. Well-written Haskell is actually very pretty to look at, but it’s also daunting to realize that every word on the page is dense with meaning.  It’s also significant that Haskell’s organization within a source file isn’t sequential; the compiler hooks up all the definitions in the correct order regardless of their location.  This gives the developer greater freedom to organize code into meaningful units, but it can also mean that understanding the file isn’t as simple as reading it in order from top to bottom.

That sense of intimidation is even greater when you apply it to more traditional languages that can support a functional style. A language like Javascript can support strong functional idioms, but in doing so, you end up writing the code equivalent of Shostakovitch and the “difficult listening hour.”

Calendar

July 2016
M T W T F S S
« Jun    
 123
45678910
11121314151617
18192021222324
25262728293031