If you write Python, you should use MyPy.

Posted by Elf Sternberg as Uncategorized

I’ve discovered MyPy at work, and it works with Python 2.7 so amazingly well that I’m here to say, if you haven’t added it to your workflow, you need to immediately.

The last project I worked on at my day job, an automated inventory and services manager, was 2200 lines of one class object. When I was done, it was 1800 lines and 145 functions split over ten files, with separate classes for services and inventory, intake and output, local and network transformation, authorization, and utility. It’s primary purpose was to take in CSV (comma separated values– the text format for spreadsheets) and output JSON. We wanted to add other input formats, we wanted the output to be able to handle different formats as well, and we wanted to move the network transformation phase to an asynchronous handler with recovery-on-failure.

Doing that took a lot of heavy lifting. Figuring out what functions were for, what the dataflow looked like, what each step of the transformation required, how to handle conflicts with objects already in the inventory. Like most shops, we have rules about how to use docstrings that look a lot like jDoc, but by complete luck I found PEP-484 and MyPy the program that actually does Python type-hint checking, and immediately started using it aggressively.

The internals of the original, a one-off written for a single customer, had very strong assumptions about CSV inbound and JSON outbound. Functions were literally named with csv_ and json_ prefixes, even though internally everything was just Python dictionaries and lists!

I named things, and by their purpose: Classes called InventoryItem and TrackingService. I named collections of things. I named containers. And I added MyPy type hints to everything, like this (Code is from a personal project, mp_suggest, an early MP3 inventory manager I wrote a while ago.):

def make_album_deriver(opts, found, likely):
    # type: (Dict[Text, Text], Text, Text) -> Text
    if 'album' in opts:
        return opts[u'album']

    if 'usedir' in opts:
        return found

        return (ascii_or_nothing(likely) or sfix(found))

That little one-line comment there guarantees that anyone calling this function must send it what’s listed in the type line, and nothing else. MyPy will track back and make sure that the variables passed by the caller are actually created as a dictionary, a string, and a string! It will track forward and make sure that the receiver always uses the results as a string! It will ensure that the dictionary is created with strings as keys and values— no lists, no sets, no nested dictionaries are legal values. It will make sure that ascii_or_nothing and sfix take strings and return strings.

Programmers in dynamic, duck-typed languages like Python (and Javascript, Perl, Ruby, even Lisp) tend to assume we know what we’re doing. Given a problem, we come up with a solution, and then we start coding toward the solution, as if building and assembling a jigsaw puzzle along the way. On a very big project, we can miss things. Often, what we miss is error handling and corner cases. We can write a function and twenty minutes later call it, all the while assuming we can correctly remember what the calling protocol was. Often, we are wrong, and learning we’re wrong is sometimes a painstaking effort in type management. I believe half my debugging time is being a human typechecker: “Wait, it’s supposed to be passing back class X, but I’m getting a list. Why?”

PEP-484 eliminates all of that, and that makes PEP-484 the biggest leap forward in high-quality Python development since PEP-8. By eliminating the single most common class of running errors in all of Python, you can double your productivity. If you are not using MyPy, you are at a tactical disadvantage, and you will remain so until you adopt it.

Comment Form

Subscribe to Feed



December 2016
« Nov   Jan »