Posts Tagged ‘syntax’


Why Python’s whitespace rule is right

In Language design,Python on October 18, 2011 by Matt Giuca Tagged: , , ,

Python is famous among programming languages for its fairly unique syntax: rather than being delimited by curly braces or “begin/end” keywords, blocks are delimited by indentation. Indenting a line is like adding an opening curly brace, and de-denting is like a closing curly brace. When people criticise Python, it is usually the first complaint: “why would I want to use a language which requires me to indent code?” Indeed, while programmers are very used to indenting their code, they are very un-used to being forced to do so, and I can understand why they may take it as an insult that a language tells them how to write code. I don’t usually like to get into syntax arguments, because I find them very superficial — it is much more important to discuss the semantics of a language than its syntax. But this is such a common argument among Python detractors, I wanted to address it. Python is right, and it’s just about the only language that is.

I think the rub is that programmers like to think of languages as a tool, and tools should be as flexible as possible. I think in general it is a good principle for programming languages not to enforce conventions. Languages that do tend to annoy people who don’t subscribe to the same conventions. For example, the Go programming language enforces the “One True Brace Style” — every opening curly brace must appear on the same line as the function header or control statement. This irritates me because that’s not my preferred convention. But the indentation convention is so universal that it is considered bad programming practice to not indent in all cases. (There is disagreement over tabs vs spaces, the number of spaces, etc, but we all agree that indentation is good.) There is not a single situation in any country, in any programming language, or at any skill level, in which is it acceptable to not indent your code the way Python requires it. Therefore, it is technically redundant to have a language that is not whitespace-sensitive. Any language that is not whitespace-sensitive requires (by universal convention) that programmers communicate the scoping of the code in two distinct manners for every single line of code: braces (or begin/end) and indentation. You are required to make sure that these two things match up, and if you don’t, then you have a program that doesn’t work the way it looks like it works, and the compiler isn’t going to tell you.

There are two solutions to this problem. 1: Make the compiler tell you. Force the programmer to indent and put in curly braces, and have the compiler check the indentation and give either a warning or error if they don’t match up. Now you’ve solved the problem of accidentally getting it wrong, but now what is the point of requiring curly braces at all? The programmer would just be doing extra work to please the compiler. We may as well go with 2: take out the curly braces and just have the compiler determine the blocks based on indentation.

When you really analyse it, Python’s whitespace sensitivity is actually the only logical choice for a programming language, because you only communicate your intent one way, and that intent is read the same way by humans and computers. The only reason to use a whitespace-insensitive language is that that’s the way we’ve always done things, and that’s never a good reason. That is why my programming language, Mars, has the same indentation rule as Python.

* * *

An interesting aside: there is a related syntax rule in Python which doesn’t seem quite so logical: you are required to place a colon at the end of any line preceding an indent. I haven’t fully tested this, but I’m pretty sure there is no technical reason for that (the parser could still work unambiguously without that colon), and it doesn’t seem to add much to the readability either. I slavishly followed this rule in Mars too, because as a Python programmer it “feels right” to me. But perhaps it would be better to drop it.


On Python’s Whitespace

In Language design,Python on January 13, 2009 by Matt Giuca Tagged: , ,

It seems like a certainty that any given Slashdot article involving Python will feature at least a handful of top-level comments where people complain about Python’s indentation/whitespace rule, regardless of the topic of the news item. There’s probably a race to be the first to complain about it in any given post.

I usually bite. This time, I decided to bite hard, and I’ve decided to re-publish my response here because I think it was rather blog-worthy.

AuMatar says:

On the other hand, I’ve spent at least a full work week of my life fixing problems due to whitespace. Guido made a major f*** up there- by removing braces but not strictly defining whitespace, he’s created a language where it’s possible to have two identical looking pieces of code do very different things. If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error, it would have been fine. As it is I’d say about 15-20% of the time I spent doing Python was spent fixing these kinds of bugs.

To which I replied:

Guido made a major f*** up there- by removing braces but not strictly defining whitespace

Stop. First, the whitespace rule in Python *is* strictly defined.

The formal, exact, unambiguous specification of how Python interprets whitespace is in the official language reference – Lexical analysis [].

It’s pretty wordy, but I’ve studied it and it’s quite precise. The relevant section is here:

“Firstly, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight”

This is exactly the same as the default behaviour of Unix `expand`.

[Guido has] created a language where it’s possible to have two identical looking pieces of code do very different things.

It depends what you mean by “looking”. To you, perhaps 1 tab looks the same as 4 spaces. To me, maybe it looks the same as 2 spaces. To Jeff, maybe it looks like a red dot in his specially-configured editor. To Python, it happens to look the same as 8 spaces.

DO NOT MIX TABS AND SPACES. Then, I guarantee you that any two pieces of code which look the same to you (whether they use tabs or spaces) will also look the same to Python. (You don’t have to enforce this across a whole file, just on a per-block basis, but it’s best if your whole project has an agreed indentation standard).

If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error.

That’s silly. Then you’d be at Guido’s whim; you’d have to indent the way he chose. This way, you can choose any indentation you like. Tabs, 2 spaces, 4 spaces, 3 tabs if you like. As long as you are internally-consistent, Python will be happy.

My second point to you: If you are pasting code from somewhere into your code, and you do not fix up indentation so it matches the surrounding code, you are worse than Hitler. Or at least very lazy. I don’t care if you are using Python or C or Brainf***.

If you carelessly paste 1-tab-indented code into a surrounding block which is 4-tab-indented, and don’t fix it up, then how do you think I will feel when I open it in my editor configured to expand tabs to 2 spaces instead. It will be totally unreadable — and this is why we indent in the first place (in any language, that is).

Python forces you to tidy this up, and that can only be a good thing. If your code is confusing Python, it’s probably confusing a bunch of other readers as well.