The horrifying truth: most programs work by coincidence, not planning

Posted by Elf Sternberg as Uncategorized

In my ever-shrinking spare time, I write stories. When I’m writing a particularly long story, or dabbling in the 300-odd episodic space opera I’ve been working on for twenty-five years or so, I have to read and re-read the story to make sure that every plot thread of the story has been closed in a satisfying manner, every macguffin has been stowed away, and every character’s character has been fully revealed and shows consistency throughout. Even then, when I go back and re-read some of my work, I see lines that were supposed to lead somewhere, but didn’t, and sometimes a character says something about an event in the past, but the actual scene being referenced has been cut out for whatever reason.

Even in the best books, while the main plot remains resolved, a sub-plot might not actually be completely hooked up. The most famous of these is in Raymond Chandler’s The Big Sleep: we still have no idea what happened to victim Owen Taylor.

Human beings can forgive an omission like that if something else about the book is good. For Chandler, it’s all about style, and the style he invented, private detective noir, is breathtaking in its originality.

Computers, on the other hand, are spare and unforgiving. If a user forgets to hook something up, a crash is inevitable. It’s only a matter of when and where.

I recently praised MyPy, a new type hint system for Python that, I claimed, eliminates an entire class of errors while writing Python.

I think it’s important to emphasize how important I think MyPy is, because here’s the horrible truth: computer programmers have no idea what they’re doing.

Imagine, if you will, a newly installed private inventory management system (PIMS) for a large company. Different offices will send you spreadsheet of data, rows of inventory, and column headers to describe what the rows mean. Meaning is the most important thing here; computers are just glorified calculators, it’s human beings that apply meaning to what they’re doing.

Without the column headers, though, a row of spreadsheet data is meaningless. It’s just numbers and names. A human might be able to guess that the column with entries like “New York” and “Boston” is cities, but what do all those numbers mean? And without specialized software, the computer doesn’t care about cities at all; they’re just strings of letters.

The PIMS system lets you upload the spreadsheet, and then you, the human, apply meaning. “That’s a computer.” “That’s a piece of software.” “That’s a chair.” Programmers tend to abstract things to their basics. “Computers and chairs need to be shipped; software can just be sent via email; real estate can’t be transferred at all.” That sort of thing. Inventory may have location, and ownership: “That is Bob’s chair in the Chicago office.”

So here’s the horrifying truth: most programs, internally, don’t apply any meaning to what it is they’re handling. This means the programmer didn’t apply meaning. The programmer had meaning in his head, but was so busy getting from input to output, applying that meaning and rules along the way, trying to hit a deadline, that encoding that meaning in the code gets lost. A spreadsheet starts out as rows of hand-labeled columns; it ends up in the database as entries in various hand-labeled tables of rows and columns. In the middle, it lives as blobs.

Blobs. That’s it. A Python list or a Javascript array is arbitray: it can contain lists of anything: numbers, strings, other lists, a mix of all of the above. The same is true of the dictionary. They’re just “objects”; they have no meaning. The programmer writes functions that handle ListsOfShippableThings, which in turn calls functions that handle OneShippableThing, which in turn call a TransportCompany, and so on. But what he passes them is a List. He could accidentally call ShipShippableThings with a List of RealEstateThings. That might be found in testing; it might be found when someone tries to ship something; it might actually work, in that an InventoryItem is marked to be shipped even though it’s a square city block in downtown Manhattan!

Almost all web software is written this way. We call it “Duck Typing:” if it walks like a duck and quacks like a duck, it’s a duck. If a RealEstateThing is an inventory item with a location, the ShipShippableThings function might say, “It’s an inventory item at a place, yep, we can ship that.”

The beauty of MyPy and Typescript is that, with good taste in naming things, and proper training, you can’t write software where you try to ship a RealEstateThing; long before the code runs, your editor or repository checker will say, “You have code here where you’re passing a RealEstateThing to a TransportCompany through ShipShippableThing. That doesn’t make sense.”

And it doesn’t. But when you’re a programmer, it’s easy to forget, in the hundreds of things you might be keeping track of, you might just try to run everything through “CheckIfNeedsShipping” code, never realizing that your list contains things that can’t be shipped.

There are all sorts of examples. Grocery stores have things that can’t be eaten and don’t spoil; pharmacies are full of things that can’t be injected. Constraint is one of the most powerful ideas in computer science, and when we wrote software that freed us from the constraints of having to tell the computer how much memory we needed for an object, we also lost the constraints we had on having to describe that object clearly.

I was very disappointed when I read Eric Elliot’s You might not need Typescript (or Static Types), because he claims that Typescript, and its constraint checking, slowed him down and didn’t reduce bug count. He talks about how typing is “distracting”; his developers would rather just use blobs of text that know exactly what they’re passing around. He says unit tests are a great way to know if his code is working, but that’s true only if he tests the right things.

I’ve been writing in Javascript and Python for twenty years now. (Not kidding about that, either. Seriously. In 1996 I’d been working in Perl for four years professionally already.) Nothing, and I mean nothing, is more exciting, more useful, and more indicative that the software industry is finally growing up than the popularity of static typing for these languages.

Elliot describes duck typing as “checking that looks at the structure of a value rather than its name or class.” In older, rigid languages like C and C++, a similar thing is called Structural Typing; it doesn’t matter what I, the programmer, claims that thing is; all that matters is that it has the right layout in memory. If coincidentally the “PizzaDeliveryGuy” and “LaunchNukesOrders” have similar layouts in memory (Maybe a launch code is the same number of bytes as a phone number!), well…

What MyPy and Typescript do is called Nominal Typing. Computer functions are about intent. “I intend to order a pizza.” We teach how to name functions because we want to clearly communicate intent. The same thing should be true of the data we work on: it should describe what it encapsulates.

I have no idea how big Elliot’s programs are, or how much time he spends trying to figure out why something crashed. Nominal typing has reduced the amount of time I spend it on by half. To me, that’s a strategic benefit no amount of “go fast and break things” will ever match.

Comment Form

Subscribe to Feed



December 2016
« Nov   Jan »