Articles

Why Python’s whitespace rule is right

In Language design, Python on October 18, 2011 by Matt Giuca Tagged: , , ,

Python is famous among programming languages for its fairly unique syntax: rather than being delimited by curly braces or “begin/end” keywords, blocks are delimited by indentation. Indenting a line is like adding an opening curly brace, and de-denting is like a closing curly brace. When people criticise Python, it is usually the first complaint: “why would I want to use a language which requires me to indent code?” Indeed, while programmers are very used to indenting their code, they are very un-used to being forced to do so, and I can understand why they may take it as an insult that a language tells them how to write code. I don’t usually like to get into syntax arguments, because I find them very superficial — it is much more important to discuss the semantics of a language than its syntax. But this is such a common argument among Python detractors, I wanted to address it. Python is right, and it’s just about the only language that is.

I think the rub is that programmers like to think of languages as a tool, and tools should be as flexible as possible. I think in general it is a good principle for programming languages not to enforce conventions. Languages that do tend to annoy people who don’t subscribe to the same conventions. For example, the Go programming language enforces the “One True Brace Style” — every opening curly brace must appear on the same line as the function header or control statement. This irritates me because that’s not my preferred convention. But the indentation convention is so universal that it is considered bad programming practice to not indent in all cases. (There is disagreement over tabs vs spaces, the number of spaces, etc, but we all agree that indentation is good.) There is not a single situation in any country, in any programming language, or at any skill level, in which is it acceptable to not indent your code the way Python requires it. Therefore, it is technically redundant to have a language that is not whitespace-sensitive. Any language that is not whitespace-sensitive requires (by universal convention) that programmers communicate the scoping of the code in two distinct manners for every single line of code: braces (or begin/end) and indentation. You are required to make sure that these two things match up, and if you don’t, then you have a program that doesn’t work the way it looks like it works, and the compiler isn’t going to tell you.

There are two solutions to this problem. 1: Make the compiler tell you. Force the programmer to indent and put in curly braces, and have the compiler check the indentation and give either a warning or error if they don’t match up. Now you’ve solved the problem of accidentally getting it wrong, but now what is the point of requiring curly braces at all? The programmer would just be doing extra work to please the compiler. We may as well go with 2: take out the curly braces and just have the compiler determine the blocks based on indentation.

When you really analyse it, Python’s whitespace sensitivity is actually the only logical choice for a programming language, because you only communicate your intent one way, and that intent is read the same way by humans and computers. The only reason to use a whitespace-insensitive language is that that’s the way we’ve always done things, and that’s never a good reason. That is why my programming language, Mars, has the same indentation rule as Python.

* * *

An interesting aside: there is a related syntax rule in Python which doesn’t seem quite so logical: you are required to place a colon at the end of any line preceding an indent. I haven’t fully tested this, but I’m pretty sure there is no technical reason for that (the parser could still work unambiguously without that colon), and it doesn’t seem to add much to the readability either. I slavishly followed this rule in Mars too, because as a Python programmer it “feels right” to me. But perhaps it would be better to drop it.

Advertisements

46 Responses to “Why Python’s whitespace rule is right”

  1. The colon rule probably helps editors – after a colon and a newline, they know you _must_ indent (and will do so for you typically), without having to pay special attention to keywords.

    Also more importantly, it means you have the exact same syntax for a single-line conditional:

    if foo:
    continue

    naturally condenses (if you wish) to

    if foo: continue

    • the comment field ate my indentation, but you know what I mean…

    • Good point. The editor isn’t so important since I’m sure an editor could detect “Line beginning with def/if/while/for/etc and not containing a colon”. But the consistency is a good point.

  2. (I agree with you wholeheartedly.) No, the colon isn’t necessary. For example, Haskell has significant whitespace and no colon.

  3. I cannot imagine how a language that breaks if you indent wrong, could be considered not only a logical choice, but “the only logical choice”. It’s no doubt perfectly logical for the computer itself, and perhaps it might be for another species, but not for this species.

    Sometimes, the road less traveled is less traveled for a reason.

    • Could you explain why it is not for this species? Or what the “reason” is that you think the road is less traveled? I thought I did a pretty good job of explaining why it is perfect for this species in the post: this very species has near universal coding standards that say you must indent your code. So since you are indenting your code anyway, why not make sure you are indenting correctly?

  4. @Matt, consider the difference between Python’s

    if x:
    if y:
    foo()
    else:
    bar()

    and C’s

    if (x) {
    if (y) {
    foo();
    }
    } else {
    bar();
    }

    In the C case, the meaning is unambiguous; in the Python case, only the indenta— I’m sorry, what’s that? WordPress ate my indentation? Well, crap. You know what I *meant* to write, though… don’t you? I mean, source code is meant for humans to read, so it would be *pretty dumb* if you couldn’t paste source code into a blog comment without breaking it…

    • It’s true, but I consider this to be a deficiency of WordPress, not Python. While it would be less common to do so, I could write a blogging tool that strips out curly braces, and then C code would break. I consider “readability” and “looking like it does what it actually does” to be more important design goals for a programming language than “will the code break if pasted into random website X?”

      • Have you ever tried to paste C++ code using templates (or Java using generics) in a comment field that strips HTML tags?

  5. Sorry for commenting your post so lately, but I couldn’t resist 🙂

    I first saw whitespace delimited code in Haskell – what a crazy language, and good one too 🙂

    I am working with C# for almost 10 years now. I can only say the more I work with C#, the more I hate the braces. It is such a waste of time. In company where I work we had tools (FxCop, StyleCop) that enforced us to comply to company standards. Which is OK, of course, but then I noticed something, there was that strange rule in some of these tools that enforces you to begin every block with new line and also you needed to indent that block! So, at the end, there is that rule which is forcing you to do the same thing which Python enforces you but with additional braces, which you must maintain manually (or buy some 3rd party tool). You cannot checkin (commit) your code unless you fix it. It is same with companies which don’t have automated verification, at the end if you all agreed on some rules, why it is OK to break them down? There is only one reasonable answer to that – it is not OK, it must be automated – is it done by compiler or is it done by interpreter or any other 3rd party tool it doesn’t matter, it must be done by machines.

    And yes, whitespace is a piece of the program itself, we are already using it as input information for our eyes, why is so bad if we use the same information to feed the compiler/interpreter?

  6. Python’s indentation rule makes it harder to do code generation. In a language like this you need to provide a alternative mode for the sake of code generation (see what’s available in Haskell). In Python we do not have this, which really sucks a lot.

    • I’ll admit it’s a little harder: it means your code generator has to keep track of the indentation level at all times, and every time it inserts a newline, to insert that many spaces/tabs. But that’s a fairly trivial task compared to all the other things you have to be aware of when writing a code generator, for any language. I don’t think this is a big issue, especially since programming languages should be designed primarily for people to write, not computers.

      Haskell’s whitespace rule is much more complicated than Python’s, and yes, it would be much harder to write a code generator to whitespace-delimited-Haskell than Python, so it’s not really comparable (i.e., it’s good that Haskell provides a brace-delimited mode; Python doesn’t really need one).

  7. Having worked with braceless and braced programming in YAML, JSON, Haskell, Java, HTML, HAML, Ruby, Python, XML, etc…

    You’re dead wrong about this: “The only reason to use a whitespace-insensitive language is that that’s the way we’ve always done things”

    No, the reason I love braces in my code is that I can press one key combo and automatically reformat my code while being absolutely sure that I have not changed its meaning. This is a good thing. It means that I don’t have to count spaces. I don’t have to worry about mixed tabs and spaces. I can bang out a few expressions on one line, hit a key combo, and have perfectly readable code without constantly wasting mental resources on meeting lame formatting standards.

    • I don’t follow this argument. Let’s trace back to my original point: with braces, there are *two* separate representations of the code structure: one machine-readable and one human-readable. If you don’t make sure they are exactly in agreement, then humans will most likely misunderstand your code. Whereas with whitespace-sensitivity, there is only one representation of the code structure, which both the machine and human can read and agree upon.

      Your point seems to be that you have a tool that takes the machine-readable representation and ensures the human-readable one matches. That’s great (I, too, use this tool — although it didn’t exist when I wrote this blog post in 2011 and I don’t know of any that did). But if you were using a whitespace-sensitive language, you wouldn’t need that tool at all, because the human-readable representation *always* matches the machine-readable one! “X is better than Y because X can achieve the same benefits of Y with additional tooling” is not an argument in favour of X.

      You should not be “counting spaces” — your eyes do that for you (you only have to observe what is more indented than what). You should *never* be mixing tabs and spaces, in any language, so that is a non-issue (and you will find that, while crappy C code mixes tabs and spaces a lot, it almost never happens in even crappy Python code because the program would break, so Python fixes that issue too). My point is that you do not *need* to waste mental resources reformatting code in Python because any correct code *is* correctly formatted already!

  8. You are a stupid pencil neck idiot. You are like someone who accepts to be bent over a barrel and be rammed up the ass by a guy with a huge strap on and say, “it’s ok, he’s using a condom so I won’t catch anything from my other geek pals who think the same shit as me”.
    No, it’s not all right to be forced to use whitespace you moron. It takes away choice. Any language that takes away choice should be binned immediately. Same goes for breaking backwards compatibility in under 10yrs. Invisible characters for code marking… sheer stupidity…. and fools like you who accept this make it all possible for the great creators of these garbage to state they have a product.

    • Decided not to mark this hideously offensive comment as spam just so that I can argue against it.

      > No, it’s not all right to be forced to use whitespace you moron. It takes away choice.

      What a ridiculous argument. Anything that takes away choice should be binned? C doesn’t let me choose between using double quotes or single quotes for strings, so that’s out. Java prevents me from converting an int into a pointer and crashing the computer; what a nasty restriction! Ruby insists that my instructions are executed from top to bottom, and not bottom to top.

      Language design — or rather, any design of anything ever — is about carefully choosing restrictions for the user to try and increase the chance of a good outcome. My microwave won’t let me use it while the door is open, to avoid me radiating myself. Do you think that’s wrong because it’s “taking away choice”? As I carefully argued in this post, the designers of Python deliberately take away your choice to write a program that looks one way, and behaves another. If you don’t like it, well, you’re free to choose a different language. But that is, in my opinion, a helpful restriction.

      As for your attitude, grow the fuck up if you want to be taken seriously.

  9. Here’s why the golang creators apparently didn’t like it:
    http://betterlogic.com/roger/2015/05/go-creators-why-curly-braces-instead-of-significant-indentation/ (I’m not arguing either way, but I had heard a quote from them once)

  10. The problem is that that entire argument for pyhtonic whitespaces leans on the developers having to do manual indentation.
    That’s not the case with a state of the art editor. You set the braces and the code get’s formated for you. That’s zero time spend on indentation and perfect output.
    Python on the other hand frequently forces the programmer to manually deal with indentation.
    That’s because the right indentation level can’t be deduced from anything else.
    For example each and every time when I would add a closing bracket I have to manually deindent.
    So no time safed there. But things get worse from there.
    In particular for code refactoring where pieces get moved around the indentation has to be adjusted manually most of the time.
    And while that’s only mildly annoying for moving larger blocks which can be (manually) indented en block it’s a huge waste of time for all the little changes when adding an additional condition or loop.

    • While you’re certainly right that automatic formatting tools make brace-based languages bearable, my basic argument still holds. Note that I wrote this article in 2011 when (to my knowledge), there were no good automatic formatting engines for C/C++, but of course now there is ClangFormat and I use it all the time.

      But I don’t see how this places Python at a *disadvantage* compared to a C programmer with ClangFormat. Surely any good text editor has a block indent/outdent feature. So moving Python blocks into or out of a loop is very simple: select all the lines you wish to indent, hit Shift+> (in Vim, for example), and you’re done. It’s still simpler than manually inserting braces and then running ClangFormat.

      My original point, which I still stand by, is that in Python you shouldn’t think of intentation as “extra time spent prettying up the code”. Manually setting the indentation in Python *is exactly equivalent* to manually setting braces in other languages, except with the added bonus (admittedly less so now that we have ClangFormat) that your code is automatically readable.

      There’s still the problem of any C code that *hasn’t* been run through ClangFormat, such as old code, and code your colleagues may write. The advantage of Python is that correct indentation is fundamental to the language, not an optional tool you can run. This bug simply can’t happen in Python:

      if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      goto fail;
      goto fail;

  11. This whitespace thing with python really makes me laugh. Except that I have to program in it. My favorite thing is when the pythonistas claim that it is okay to have the language not enforce object member privacy (such as is done in java and c++), because, and I quote, “We are all adults here” and you’re just not supposed to access private data outside the object. Okay, fair enough, except that apparently, we are “all adults here” only when it suits their fairly lame arguments, but are we “adult enough” to decide how and when to indent? Hell, no. Welcome to python, where everyone is assumed to be a novice programmer. The very first set of lines I have to mark as continued over several lines makes less readable crap than any set curly braces ever will.

    Python is the woodpecker of programming languages. Really, you may be around for a while, but guaranteed using your head as a hammer will not set you up on the path to higher evolution… And rarely has a language developed such an amazing number of fanboys. A fanboy is one who cannot be convinced, no matter the evidence or argument, that they may be overstating things…just…a…little…bit. “why-pythons-whitespace-rule-is-right” is a perfect example of such. [face palm]

  12. […] without most of the keywords, parentheses and brackets. It uses whitespace to determine scope, like Python. It also supports classes. These two lines define a method of the TextEditor class called […]

  13. It is true that in general we all agree indentation is important and great, but I don’t think that’s the case we can agree always in specifics.
    If you spare me the “all-code-should-be-beautiful”, which is in an ideal world, which we are not in,
    if I want to write a statement across multiple lines, python makes this ambiguous or cumbersome with ‘\’
    In C# statements like
    thing.Where(i => i func(j));
    Read nicely. All I have to do to make sure my spacing is fine is to do Ctrl+k+f (format)
    I can’t do that in Python.
    I can’t do
    int func (a) { return func(a, default) }
    If I collaborate in code, i’ll likely have to enfore spaces vs.tabs, and number of spaces and so on,
    so I don’t Really GAIN anything.

    While it is true that braces are in general redundant, in specific cases they are useful,
    1. They make whitespace not-significant, for those annoying cases (which are the ones that matter) where we need them to be insignificant
    2. They permit auto-formatting
    3. It is totally unambiguous.
    4. I have no idea how collaborating across different teams/refactoring where the whitespace can go to hell easily is executed by python programmers, but if whitespace can be a pain non-sensitive languages, I can only imagine in python.

    It’s not the only logical choice. Just a pedantic one.
    I rather control my whitespace for maximal readability,
    The compiler doesn’t understand readability, I do.

    I seriously love most design decisions of python, but that one is just such a waste.
    Either way, we may as well argue about the existence ghosts, you’ll never change your mind.

  14. Note that I wrote this article in 2011 when (to my knowledge), there were no good automatic formatting engines for C/C++. The visual studio team will be very upset…

  15. I agree with King Beauregard, any language where indenting is significant to the syntax is idiotic. I’m surprised at how many people are coding in this language. I would never think about using such a language.

  16. Requiring indentation is a great feature of Python. However it comes with a potential for annoying indentation errors that can be very difficult to spot. Take a look at this code in the image in the link below. It’s Python code written in Sublime Text 3. There’s an indentation error in there. Can you spot it? if not (and I don’t think you can) how will you easily correct it?

  17. Yes there’s a space in there between the tabs, but it’s not visible: the indentation is visually exactly the same. I don’t know why this is, because with an extra space in between the tabs I would expect a slight indentation to be noticed visually. But it isn’t. Anyway the editor can be made to show tabs and spaces by selecting all code and thus the culprit can be found, which does makes it a whole lot easier.

    This is shown in this screenshot, its the dot in line 6. Perhaps this is how people usually finds indentation errors but I just found this out now lol 😉

    http://imgur.com/4L9SA2s

    • Ah, right! Yeah, as I said, I think the problem here is mixing tabs and spaces (not the whole whitespace-as-syntax concept). And I think Python would be better if it didn’t allow you to mix.

  18. Just the fact that you rely upon *non-visible* (i.e. not printable) characters tell me whomever made this decision is a moron.

    • Space is absolutely visible and printable… in fact, it drastically affects the way humans read code, so it’s clearly visible. The fact that it’s so vital for reading code is why it makes so much sense for the compiler to use the same indicator for its own parsing.

  19. Except that there are a multitude of non-printable characters – which produce “whitespace”

    Here are the most common:

    SPACE (codepoint 32, U+0020)
    TAB (codepoint 9, U+0009)
    LINE FEED (codepoint 10, U+000A)
    LINE TABULATION (codepoint 11, U+000B)
    FORM FEED (codepoint 12, U+000C)
    CARRIAGE RETURN (codepoint 13, U+000D)

    Which of these would you like inserted in your source code ?

    • Ideally, just SPACE. But obviously there are people who prefer TAB for indentation, and they should be accommodated. No code ever needs to use LINE TABULATION or FORM FEED or any other control character (other than TAB, CR and LF), so those can just be syntax errors.

      Newlines are, obviously, recognisably different to SPACE/TAB so they form a separate part of the language syntax. There’s no confusion there. There *is* confusion between CR, LF and CR+LF, but that’s fine, we can just permit all of them and treat them the same.

      The only *real* issue with having SPACE and TAB allowed is that TAB indents different amounts on different systems, so mixing them is a problem. This has been discussed at length in the comments above; in summary it’s a general problem in any language (you SHOULDN’T be mixing them). My ideal language syntax would make it a syntax error to mix SPACE and TAB in the same block; then the source code is unambiguous.

      PS. The argument that multiple characters can be indistinguishable is more general than whitespace characters and isn’t generally accepted as a reason to not have those characters be meaningful in programming language syntax. For example, many languages allow non-ASCII letters in their identifiers which means you can have ‘a’ (U+0061 LATIN SMALL LETTER A) and ‘а’ (U+0430 CYRILLIC SMALL LETTER A) as distinct identifiers. That isn’t something that generally needs to be enforced at the language level, because programmers are generally going to avoid using both.

  20. And yet I notice you use punctuation at the end of your sentences. Why not just end in white space? Because it flat out is just not as readable without punctuation. It’s hard to figure out why you even bother trying to make this argument. (And, admittedly, why I continue to follow it, ha!)

    • > It’s hard to figure out why you even bother trying to make this argument.

      The fact that one of the most popular programming languages of all time uses whitespace means it’s at least worth talking about. That’s not to say it’s valid, but I’m a bit baffled by all the people who dismiss it as so obviously wrong it’s “not even worth arguing about”.

      The analogy of punctuation is flawed because I put spaces between all my words; if we ended sentences with spaces too there would be no way to tell the difference between word and sentence boundaries.

      Also prose text is totally different to programming code. Precisely the argument I made in this article is that programming code is already indented by all serious programmers (unlike prose text, which lacks much physical structure other than paragraphs), and therefore, punctuation isn’t necessary because the indentation is so readable and noticeable.

  21. PHP is popular as well. Does that make it readable? No. I will give you that a fair number of people have drunk the cool-aid, yes.

    But to the point. There is only _more_ readable and _less_ readable code. And python makes many egregious readability choices. Invisible block termination is just but one.

    If I have 3 blocks deep that drop back to 0 blocks deep, this is far more readable:

    lots of code, how deep is this at this point? (i’m a maintainer, not the author)
    statement
    }
    }
    }

    more code

    and this is quite a bit less readable

    lots of code, how deep is this at this point?
    statement

    more code

    hmmm… better scroll back up and try to count indentations up the page somewhere… :-p

    ——————————

    and then there is the immensely ugly I’m-not-done-yet! notation. kinda, heh, “bash” to the future, i guess.)

    from nexus.cmd.resp.cli.network_config import NetworkConfig \
    as NetworkConfigCliResponse

    and the not ugly this-is-when-i’m-done notation:

    from nexus.cmd.resp.cli.network_config import NetworkConfig
    as NetworkConfigCliResponse;

    ————————————

    isinstance(…)

    seriously? freaking seriously?? what, did they run out of underscores in the beginning programmers lab downstairs? or at least train some damned camels! concatenation of words into one long hot mess is the opposite of readability. again. how is this even debatable?

    • these code samples should look something like so:

      …………lots of code, how deep is this at this point? (i’m a maintainer, not the author)
      …………statement
      ……..}
      ….}
      }

      more code

      vs

      …………lots of code, how deep?
      …………statement

      more code

      ———————————–

      from hank.cmd.resp.cli.network_config import NetworkConfig \
      …………………………………………………………………..as NetworkConfigCliResponse

      vs

      from hank.cmd.resp.cli.network_config import NetworkConfig
      …………………………………………………………………..as NetworkConfigCliResponse;

      • I consider it more important (when reading code) to know *what scope I’m in* (i.e., what column it starts at) than how many scopes ended at a particular point. I’m not even sure what question can be answered by counting the number of close braces. It’s much more important to know which lines of code share a start column (i.e, share a scope).

        In *all* languages, you read this by looking at the column alignment. But only Python makes sure the compiler agrees with your indentation.

        The backslash thing is a bit ugly I agree.

        > isinstance(…)

        Now you’re getting way off topic. I’m not defending all of Python. Just its indentation syntax.

  22. sigh. the formatter at this site appears to agree that whitespace is a trivial thing that should just be ignored outright. :-p

    • Yeah I know, I know. I addressed this … whoa four years ago: https://unspecified.wordpress.com/2011/10/18/why-pythons-whitespace-rule-is-right/#comment-691

      “It’s true, but I consider this to be a deficiency of WordPress, not Python. While it would be less common to do so, I could write a blogging tool that strips out curly braces, and then C code would break. I consider “readability” and “looking like it does what it actually does” to be more important design goals for a programming language than “will the code break if pasted into random website X?””

      • I was being facetious about the site editing out whitespace. Nothing to do with the issue at hand, and I’ll agree some was off-topic, but the bigger issue is readability. And to me the whitespace issue is part and parcel with a range of decisions. But I’m annoyed that you didn’t get drawn into the gratuitous provocation nonetheless. 🙂

        That said, I disagree that knowing how block scope is changing is not as important. As a senior engineer on my team, I spend an unfortunate amount of time reviewing code, and I quite often find myself backing up looking for clues to what the indentation level is. You are right that braces don’t entirely cure that issue, but they do give you twice as many cues as you get in python.

        • And more importantly, those cues are at the right place- every place such a change in scope is occurring.

  23. Reading back a bit, this is the most insightful comment made in this thread (and it wasn’t by me. go figure…)

    ZhadowOfLight said:

    “The compiler doesn’t understand readability, I do.”

    Amen.

    • That statement sounds like (in general) an argument against machine-automated formatting, with an attitude of “humans should format code in the most aesthetically pleasing way; it isn’t something we can leave up to an algorithm”.

      The more programming I do (on larger scale projects), the more I disagree with this attitude. When you work on a large project with code reviews, it’s amazing how much time you save by replacing two humans constantly arguing about minor formatting details with a script that just makes a decision. Not necessarily the most “pretty”, but applying a consistent rule that we can all understand. Python doesn’t really require an auto formatter, because the most significant formatting rules are enshrined in the language.

      “readability” is enhanced by consistency. I can open up a C/C++ codebase and find end-of-line opening braces, on-a-line-by-itself opening braces, a mandatory-braces-for-all-blocks rule, no-brace-on-a-one-line-block rule, optional-brace-on-a-one-line-block rule, etc, etc. I have to adjust to each author’s personal style. All Python code looks more or less the same. Compilers don’t have to understand readability, they just have to enforce a consistent rule and readability will happen.

      • But you can enforce such rules with all manner of auto formatters and with clear policies. There is no “argument” about the right way to do it. The lead engineer talks to others in the team and then makes the decisions. As such a lead on a very extensive library, I am open to debating, but after I make a decision, continued arguments about it are invitations to go work on something else.

        For the language itself to force a poor (imo) decision in regards to this matter is just a bad.

        The problem here is that you feel that python syntax forces more readable code, and I think it forces less readable code. And this is the entire crux of the debate here. Yes, consistency is very important. But consistently poor isn’t really what we’re after here.

        I’ll stand by ZhadowOfLight’s statement on this one. Likely we are at a stalemate here…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: