Posts Tagged ‘Python’


Why Python’s whitespace rule is right

In Language design,Python on October 18, 2011 by Matt Giuca Tagged: , , ,

Python is famous among programming languages for its fairly unique syntax: rather than being delimited by curly braces or “begin/end” keywords, blocks are delimited by indentation. Indenting a line is like adding an opening curly brace, and de-denting is like a closing curly brace. When people criticise Python, it is usually the first complaint: “why would I want to use a language which requires me to indent code?” Indeed, while programmers are very used to indenting their code, they are very un-used to being forced to do so, and I can understand why they may take it as an insult that a language tells them how to write code. I don’t usually like to get into syntax arguments, because I find them very superficial — it is much more important to discuss the semantics of a language than its syntax. But this is such a common argument among Python detractors, I wanted to address it. Python is right, and it’s just about the only language that is.

I think the rub is that programmers like to think of languages as a tool, and tools should be as flexible as possible. I think in general it is a good principle for programming languages not to enforce conventions. Languages that do tend to annoy people who don’t subscribe to the same conventions. For example, the Go programming language enforces the “One True Brace Style” — every opening curly brace must appear on the same line as the function header or control statement. This irritates me because that’s not my preferred convention. But the indentation convention is so universal that it is considered bad programming practice to not indent in all cases. (There is disagreement over tabs vs spaces, the number of spaces, etc, but we all agree that indentation is good.) There is not a single situation in any country, in any programming language, or at any skill level, in which is it acceptable to not indent your code the way Python requires it. Therefore, it is technically redundant to have a language that is not whitespace-sensitive. Any language that is not whitespace-sensitive requires (by universal convention) that programmers communicate the scoping of the code in two distinct manners for every single line of code: braces (or begin/end) and indentation. You are required to make sure that these two things match up, and if you don’t, then you have a program that doesn’t work the way it looks like it works, and the compiler isn’t going to tell you.

There are two solutions to this problem. 1: Make the compiler tell you. Force the programmer to indent and put in curly braces, and have the compiler check the indentation and give either a warning or error if they don’t match up. Now you’ve solved the problem of accidentally getting it wrong, but now what is the point of requiring curly braces at all? The programmer would just be doing extra work to please the compiler. We may as well go with 2: take out the curly braces and just have the compiler determine the blocks based on indentation.

When you really analyse it, Python’s whitespace sensitivity is actually the only logical choice for a programming language, because you only communicate your intent one way, and that intent is read the same way by humans and computers. The only reason to use a whitespace-insensitive language is that that’s the way we’ve always done things, and that’s never a good reason. That is why my programming language, Mars, has the same indentation rule as Python.

* * *

An interesting aside: there is a related syntax rule in Python which doesn’t seem quite so logical: you are required to place a colon at the end of any line preceding an indent. I haven’t fully tested this, but I’m pretty sure there is no technical reason for that (the parser could still work unambiguously without that colon), and it doesn’t seem to add much to the readability either. I slavishly followed this rule in Mars too, because as a Python programmer it “feels right” to me. But perhaps it would be better to drop it.


A Ruby porting challenge

In Language design on January 27, 2011 by Matt Giuca Tagged: ,

Having an in-agreement discussion with Tim about Ruby. I’ll write a full post later, but for now (to make sure I don’t make an ass of myself later, if anyone has a better answer), who can port this small Python function to Ruby?

def allprefixes(x):
    prefixes = []
    for i in range(len(x)+1):
    return prefixes

Expected output:

>>> allprefixes([1,2,3])
[[], [1], [1, 2], [1, 2, 3]]

See the problem? Any non-hacky solutions? (I have mine, but it is hacky, and I’ll save it for the follow-up post.)

Edit: Arg, OK I just found out about Ruby’s … operator. That makes things significantly easier (having previously known only about the .. operator). Never mind then! But it is still a nasty default.


Why nondeterminism is awesome

In Language design on June 19, 2010 by Matt Giuca Tagged: , ,

I’ve been using the Mercury programming language for several years for my PhD project. Now Mercury is a functional logic language (predominantly logic), which means instead of functions, it has predicates. Predicates are like subroutines (they have “output parameters” but don’t actually return values), but with an added power called “nondeterminism”. I’ll explain what that means shortly, but the point of this post is: I’ve never used it. I understand it fully but I’ve never used it in a real program because the opportunity never came up. I think a lot of people think it just adds complexity to the language with no real benefit. But the other day, I discovered how useful it can be.

Basically, nondeterminism means that a predicate can succeed multiple times, producing multiple results. The caller is largely unaware that the callee produced multiple results, as it isn’t returned as a “set” or “list”, it simply causes the caller to also produce multiple results. To give a quick example, say I have the nondeterministic predicate p:

:- pred p(int::out) is nondet.
p(X) :-
        X = 1
        X = 2
        X = 3

This predicate succeeds three times, with the solutions {X = 1, X = 2 and X = 3}. Now when you call p, you don’t have to do anything special — you just automatically become nondeterministic too. So take predicate q:

:- pred q(int::out) is nondet.
q(Y) :-
    p(X),         % Set X to the result of p
    Y = X + 1.    % Return X + 1

Now q just looks like regular code, but because p is nondeterministic, q also has three solutions: {Y = 2, Y = 3 and Y = 4}.

So what’s it all useful for?

Well in my compiler, I had a lot of code looking like this (only much more complex):

% A predicate that a tree, and returns a set of values
:- pred tree_processor(tree::in, set(val)::out) is det.
tree_processor(X, Vals) :-
    % Call another predicate to get a set of values
    get_values(X, Vals1),

    % Recurse on the children of X and union all results together
    XChildren = tree_children(X),   % Children as a set, XChildren, Vals2Power),
    Vals2 = set.power_union(Vals2Power),

    % Union both sets together into the final set of results
    Vals = set.union_list([Vals1, Vals2]).

This code is what I call “set-oriented”. Not only does this predicate deal with set objects all over the place, but the “get_values” and “tree_children” predicates must return a set, and any predicate which calls this code must accept a set.

There are a couple of problems with this code. Firstly, I need to number each of the sets “Vals1” and “Vals2” (the actual code had four sets). Secondly, note that I do a good deal of work in the recursion. The recursive call produces a set, but since I am recursing over a set, I need to map over each child, resulting in a set of sets of results. Then I have to power_union (flatten the set of sets into a set). This furious mapping and unioning is a hallmark of set-oriented code and it was steadily spreading throughout my code.

Then I realised: This is what nondeterminism is for. The properties of my code (here and in several other places) were:

  • A predicate which produces a set of values as a result,
  • The final set is a union of a number of separately derived sets (Vals1, Vals2),
  • The predicate deals with other sets of values,
  • When dealing with a set of values, it produces results for each value in the set, independently (it maps over the set, doesn’t fold).

What this means is, I could convert this and other predicates to nondeterminism to save a lot of complexity, making the code shorter and easier to read (assuming the reader understands nondeterminism):

% A predicate that takes a tree, and returns single value (with multiple solutions)
:- pred tree_processor(tree::in, val::out) is nondet.
tree_processor(X, Val) :-
        % Call another predicate to get a set of values
        get_values(X, Val)          % Many solutions
        % Recurse on the children of X
        tree_child(X, XChild),      % Many children
        tree_processor(XChild, Val) % Many solutions

Look what I’ve got now:

  • Just one name for all of the results (“Val”).
  • Cleanly separated (via the “;” disjunction operator) the distinct computations of the result set (which were previously bound to Vals1 and Vals2).
  • No calls to map or union. Once a value is bound nondeterministically (such as XChild), it automatically “maps” all subsequent code. And the resulting set is automatically a “union” of every binding of Val.
  • A nice declarative reading: “tree_processor(X, Val) succeeds if Val is a ‘get_values’ of X (whatever that means), or if XChild is any child of X, and tree_processor(XChild, Val) succeeds.”

I did have to convert all the functions to nondeterminism at once, but I can stop at any level. It is fairly easy to convert back and forth between “sets of values” and “nondeterminism” (use solutions.solutions_set to convert a nondeterministic predicate’s results into a set; use set.member to convert a set into nondeterministic results).

So hey, this turned out to be quite a useful pattern, and it’s only possible in a logic language such as Prolog or Mercury.

In other languages

Haskell allows a similar nondeterministic style, perhaps surprisingly, via the list data type (treating it as a monad), but that’s a completely separate topic. As ever, it’s really clean if you can wrap your head around monads.

Python’s generators offer something like nondeterminism. It sort of gives you half the features — it is easy to produce results in nondeterministic style, but not to consume them. To explain, using the example above:

# A function that takes a tree, and yields a sequence of values
def tree_processor(x):
    # Call another function to get a sequence of values
    for val in get_values(x):   # Sequence of solutions
        yield val

    # Recurse on the children of X
    for xchild in tree_child(x):    # Many children
        for val in tree_processor(xchild):
            yield val

This has the advantage of not having to produce sets (vals1 and vals2) and union them. However, unlike nondeterminism, calling a generator doesn’t automatically cause the rest of the caller to map over the generator’s results — you have to explicitly map or for-loop over the results. As a language designer, I wonder if there is a middle-ground — some generator-like mechanism which does automatically map over the results but which doesn’t take me all the way to a logic language.

Also note that while nondeterminism in Prolog/Mercury represents a set of results, in Haskell and Python it represents a sequence as the results come out in a definitive order.


Python 2: My new URI/Unicode crusade

In Python on March 14, 2010 by Matt Giuca Tagged: , , ,

You may recall in 2008 I filed a bug on unicode URIs in Python 3, had a massive argument with the Python community, and ended up successfully getting a patch (a complete rewrite of urllib.parse.quote and unquote) accepted in Python 3.

Well two years later, I finally had the stamina to check out the situation with unicode URIs in Python 2. It’s just as bad, if not worse, than it was in Python 3. So I’m doing it all over again!

I’ve just submitted three patches (1, 2, 3) on four separate bugs relating to urllib.quote and urllib.unquote, all of which I already fixed in Python 3. Hopefully this time, the existing Python 3 precedent will mean less arguing. Also the fact that I made three separate patches will mean they’ll be accepted or rejected individually, rather than what happened last time, which was me having to maintain a giant patch fixing a dozen bugs over two months.


Thou Shalt Not Modify A List During Iteration

In Python on February 12, 2009 by Matt Giuca Tagged: , , ,

This came up recently when discussing a friend’s code. It relates specifically to Python, but I think the rule is more general.

Firstly, let me be clear that in this article, when I say “modify”, I mean inserting or removing items from the list. Merely updating or mutating the list items is fine.

Now, consider the following Python code:

elems = ['a','b','c']
for e in elems:
    print e

This prints:


The author of this code snippet probably expected it to iterate over each element of the list, print it out, then remove it so that the list ends up empty (this may be desirable if, for example, you are removing items conditionally). Hence, he is probably surprised that 'b' was skipped, and furthermore, remains in the list:

>>> elems

We can see why only 'a' and 'c' were touched by imagining that Python translates the loop into the following equivalent lower-level code:

elems = ['a','b','c']
i = 0
while i < len(elems):
    e = elems[i]
    print e
    i += 1

This has the same result. Now it’s clear that what’s happening: when you remove the 'a' from elems, the remaining elements slide down the list. The list is now ['b', 'c']. Now 'b' is at index 0, and 'c' is at index 1. Since the next iteration is going to look at index 1 (which is the 'c' element), the 'b' gets skipped entirely.

This rule doesn’t seem to be documented officially anywhere. It’s just a generally-followed rule. I believe the behaviour is, well, unspecified behaviour. Here is a discussion of the issue.

Also note that the same rule applies to the set and dict types (and similar), only those types will actually raise a RuntimeError. (If you know how hash tables work, it should be clear why — there’s no safe way to let that slide).

This rule should be applied in all languages, not just Python. While other languages have different rules, the basic principle above still applies — you’ll just get shot in different ways. (It would even apply to C, if you wrote your own code to insert or remove elements from an array). Also, this sort of code is invariably confusing.

So how can you work around it?

  • You could translate your code into the while-loop form, as above. Then you have explicit access to the variable i, and you can hand-code a fix. (For example, I could choose not to increment i when I am removing an element).
  • You could compute the set of modifications to make to the list during iteration, then apply them afterwards.
  • You could construct a new list during iteration rather than mutating the existing one. (For example, rather than removing all the elements which satisfy a condition, insert into a new list all the elements which don’t).
  • A solution recommended in the discussion linked above is to duplicate the input list and iterate over the copy, mutating the original.

There is always a way around. You just have to be creative!


On Python’s Whitespace

In Language design,Python on January 13, 2009 by Matt Giuca Tagged: , ,

It seems like a certainty that any given Slashdot article involving Python will feature at least a handful of top-level comments where people complain about Python’s indentation/whitespace rule, regardless of the topic of the news item. There’s probably a race to be the first to complain about it in any given post.

I usually bite. This time, I decided to bite hard, and I’ve decided to re-publish my response here because I think it was rather blog-worthy.

AuMatar says:

On the other hand, I’ve spent at least a full work week of my life fixing problems due to whitespace. Guido made a major f*** up there- by removing braces but not strictly defining whitespace, he’s created a language where it’s possible to have two identical looking pieces of code do very different things. If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error, it would have been fine. As it is I’d say about 15-20% of the time I spent doing Python was spent fixing these kinds of bugs.

To which I replied:

Guido made a major f*** up there- by removing braces but not strictly defining whitespace

Stop. First, the whitespace rule in Python *is* strictly defined.

The formal, exact, unambiguous specification of how Python interprets whitespace is in the official language reference – Lexical analysis [].

It’s pretty wordy, but I’ve studied it and it’s quite precise. The relevant section is here:

“Firstly, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight”

This is exactly the same as the default behaviour of Unix `expand`.

[Guido has] created a language where it’s possible to have two identical looking pieces of code do very different things.

It depends what you mean by “looking”. To you, perhaps 1 tab looks the same as 4 spaces. To me, maybe it looks the same as 2 spaces. To Jeff, maybe it looks like a red dot in his specially-configured editor. To Python, it happens to look the same as 8 spaces.

DO NOT MIX TABS AND SPACES. Then, I guarantee you that any two pieces of code which look the same to you (whether they use tabs or spaces) will also look the same to Python. (You don’t have to enforce this across a whole file, just on a per-block basis, but it’s best if your whole project has an agreed indentation standard).

If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error.

That’s silly. Then you’d be at Guido’s whim; you’d have to indent the way he chose. This way, you can choose any indentation you like. Tabs, 2 spaces, 4 spaces, 3 tabs if you like. As long as you are internally-consistent, Python will be happy.

My second point to you: If you are pasting code from somewhere into your code, and you do not fix up indentation so it matches the surrounding code, you are worse than Hitler. Or at least very lazy. I don’t care if you are using Python or C or Brainf***.

If you carelessly paste 1-tab-indented code into a surrounding block which is 4-tab-indented, and don’t fix it up, then how do you think I will feel when I open it in my editor configured to expand tabs to 2 spaces instead. It will be totally unreadable — and this is why we indent in the first place (in any language, that is).

Python forces you to tidy this up, and that can only be a good thing. If your code is confusing Python, it’s probably confusing a bunch of other readers as well.


Mandelbrot Set – A Rorschach test on fire

In Maths on August 26, 2008 by Matt Giuca Tagged: , ,

I’ve been listening to the marvellous works of Jonathan Coulton (best known among gamers for Still Alive, the song at the end of Portal). These songs are nerdy and great, and really get in your head.

Now when a song gets in my head with the chorus lyrics:

Just take a point called c in the complex plane. Let z1 be z squared plus c. And z2 is z1 squared plus c. And z3 is z2 squared plus c, and so on. If the series of z‘s will always stay close to z and never trend away, that point is in the Mandelbrot set.

– From the song Mandelbrot Set by Jonathan Coulton – free download

My inner code monkey can’t help but want to implement it. So I did.

I resolved to implement a program which generates a view of the Mandelbrot set just using the lyrics of the song, plus this blog post I found by the song’s author with some minor addenda:

  • You take a point called c, not z, in the complex plane (I corrected the lyrics above; you can easily hear “Zee” as “C” in the song anyhow).
  • You need to initialize the first z to 0+0i.

With this in mind, I got out my trusty Python and started hacking away, writing the song lyrics in comments and filling in the code. I hit “go” and ended up with this:

Mandelbrot Set, computed just from the lyrics

Mandelbrot Set, computed just from the lyrics

(Note that it took a couple of goes to get the parameters right, but I didn’t have to tinker with the algorithm itself at all!)

Fantastic!! Instantly generated that famous “bulbous pointy form”.

Then I hit the wiki and got a few tips on producing a coloured version. With a little bit more hacking, I managed to produce this:

Coloured Mandelbrot Set

Coloured Mandelbrot Set

Note the colour bands: not an image compression artefact but a property of the method used to select the colour (based on the discrete number of iterations it took for the point to “trend away”). There’s a way the Wikipedia article suggests to solve this, which is how they generate the spectacular image at the top, but I haven’t get it working yet.

Also note that the coloured version is “just for fun”. This is a set of numbers – you’re either in or you’re out. The black pixels in the coloured version are in the set; the coloured ones basically show you by how much that pixel missed out on being in the set. But the first picture I generated is a mathematically correct (approximation of) the set itself.

I’d post the code, but I just figured out that WordPress doesn’t allow non-image attachments. Grr! I’ll figure out some way later on.

And by the way, because some of us were discussing it (from the same post),

I’ve always wanted for [Benoît Mandelbrot] to hear it, but I fear his corrections. Which is why I just emailed him the mp3. Hopefully he will not be offended by the line in which I joke that he is not dead (yet). Or by the above-described mathematical shortcomings. Or by the “too-much-rocking” that I put in there.

Well it looks like he at least read the lyrics, and was pretty impressed 🙂