Python isn’t English and iterator “labels”

Us python fanboys like to think of python as similar to English and thus more readable. Let’s examine a simple piece of code:

for item in big_list:
    if item.cost > 5:
        continue
    item.purchase()

For our discussion there are only 3 kinds of people:

  1. People who have never seen a line of code in their life.
  2. Have programmed in other languages but have never seen python.
  3. Python programmers.
We’ll dabble between the first 2 groups and how they parse the above. Let’s try to forget what we know about python or programming and read that in English:
  • “for item in big_list” – either we’re talking about doing something for a specific item in a big_list or we’re talking about every single item. Ambiguous but the first option doesn’t really make sense so that’s fine.
  • “if item.cost > 5″ – non-programmers are going to talk about the period being in a strange place, but programmers will know exactly what’s up.
  • “continue” – That’s fine, keep going. English speakers are going to get the completely wrong idea. As programmers we’ve grown used to this convention though its meaning in English is very specifically equivalent to what pythonistas call “pass” or “nop” in assembly. We really should have called this “skip” or something.
  • “item.purchase()” – non-programmers are going to ask about the period and the parentheses but the rest grok that easily.

So I’m pretty sure this isn’t English. But it’s fairly readable for a programmer. I believe programmers of any of the top 8 languages on the TIOBE index can understand simple python. I definitely can’t say the same for Lisp and Haskell. Not that there’s anything wrong with Lisp/Haskell, these languages have specialized syntax for their honorable reasons.

Continue is a silly word, what about iterator labels?

Let’s say I want to break out of an outer loop from a nested loop, eg:

for item in big_list:
    for review in item.reviews:
        if review < 3.0:
            # next item or next review?
            continue
        if review > 9.0:
            # stop reading reviews or stop looking for items?
            break

Java supports specific breaks and continues by adding labels to the for loops but I think we can do better. How about this:

items_gen = (i for i in big_list)
for item in items_gen:
    for review in item.reviews:
        if review < 3.0:
            items_gen.continue()
        if review > 9.0:
            items_gen.break()

But how can that even be possible you may ask? Well, nowadays it isn’t but maybe one day if python-ideas like this idea we can have nice things. Here’s how I thought it could work: a for-loop on a generator can theoretically look like this:

while True:
    try:
        item = next(gen)
        # do stuff with item
    except StopIteration:
        break

But if it worked like I propose below we can support the specific breaks and continues:

while True:
    try:
        item = next(gen)
        # do stuff with item
    except gen.ContinueIteration:
        pass
    except gen.StopIteration:
        break
    except StopIteration:
        break

So every generator could have a method which throws its relevant exception and we could write specific breaks and continues. Or if you prefer a different spelling could be “break from mygen” or “continue from mygen” as continue and break aren’t allowed as method names normally.

I think this could be nice. Although many times I found myself using nested loops I actually preferred to break the monster into 2 functions with one loop each. That way I could use the return value to do whatever I need in the outer loop (break/continue/etc). So perhaps it’s a good thing the language doesn’t help me build monstrosity’s and forces me to flatten my code. I wonder.

About these ads

10 thoughts on “Python isn’t English and iterator “labels”

  1. Regarding the labeled iterator idea:
    First of all, the implementation is flawed. Suppose I choose to re-implement item.reviews to be a @property that returns a generator? Now item_gen.continue(), which raises an exception, will cause the item.reviews iterator to continue instead of item_gen. So bare minimum you’re going to have to create singleton exception types that are unique to each instance of a generator in order to make this approach work (there are better ways to implement labels at the language level, but given these singletons your implementation is more or less complete).

    There’s a more fundamental reason that this concept has been _explicitly left out_ of python despite a vocal group of supporters. In fact, it was proposed as a language feature in PEP3136 (http://www.python.org/dev/peps/pep-3136/); your proposal is pretty much exactly Proposal E of this PEP. This PEP was rejected by Guido (the Benevolent Dictator for Life of the Python language) in this message on the development list: http://mail.python.org/pipermail/python-3000/2007-July/008663.html

    The general consensus of the core of the python community is that there are absolutely no situations where this type of structure is necessary, and that in the vast majority of cases which might tempt its use, refactoring or reorienting the code to use other language structures would make it more readable.

    • Yes, I did mean that every generator needed its own exception to be caught and that’s why it was “except gen.ContinueIteration” where “gen” was the generator instance. Also, I do agree with Guido’s opinion on the subject. Thanks for this excellent citation included reply.

  2. Pingback: Yuval Greenfield: Python isn’t English and... | Python | Syngu

  3. Python already got plenty of tools to handle this use case. No need to create new ones. In your example, you could do:

    from itertools import takewhile
    less_than_9 = takewhile(lambda x: x 3.0)
    for item in between_9_and_3:
    print item

    This has many advantages over the ordinaryfor loop with breaks and continues:

    - you can use it as unix shell pipes and move lines around or add some to plug and change the filters
    - you don’t have to follow the if else logic to understand what your data, you know at any time what the stream of your data is: less_than_9, or between_9_and_3, or else.
    - you only deal with iterables, which mean you have the whole python toolset to deal with iterables: unicity, sorting, enumerating, etc are one code line away.
    - the main for loop is now cleared of filtering logic, so you can concentrate on what it does, without wondering on what it does it and when.

    Among the cons:
    - it’s not beginer friendly anymore, which slows down code ownership by newcomers in the project
    - it’s harder to debug in ipdb

  4. Typo, and since I can’t edit…

    from itertools import takewhile
    less_than_9 = takewhile(lambda x: x 3.0)
    for item in between_9_and_3:
    print item

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s