I just spent two hours trying to build a Hello World for the Java ME (Micro Edition) platform. A fate I don’t wish you, gentle reader, to suffer, I humbly present this step-by-step guide to starting Java ME development on Ubuntu 9.10. I specifically target Ubuntu (or other Linuxes) because all the guides I could find online assumed Windows, which appears to have some different requirements (what happened to Write Once, Run Anywhere™?)

Firstly, I hate IDEs. I couldn’t get the command-line compilers (javac and the like) working — sure I could have if I spent another few hours fiddling. But since most of the advice online refers to NetBeans IDE, I used that.

OK, here we go.

Step 1: Install Debian packages (I always use Debian packages where possible).

sudo apt-get install openjdk-6-jdk netbeans

This installs the full Java 6 development kit, as well as the NetBeans IDE.

Now you will need to install the NetBeans Mobility plugin, which provides support for J2ME development, and the Sun Wireless Toolkit (WTK), which provides the actual libraries to compile against.

Step 2: Install the NetBeans Mobility plugin.

If you choose File -> New Project, you will find no way to create a J2ME project! You need to install the Mobility plugin.

Close that, and choose Tools -> Plugins. Under Available Plugins, select Mobility (Category: Java ME). Install that. Now under File -> New Project, you will find Java ME. Choose to create a Mobile Application.

Step 3: Install the Sun Wireless Toolkit.

Having chosen Mobile Application, you will be presented with a dialogue with a red error at the bottom: “No J2ME compatible platform is installed in the IDE. You need to have at least one J2ME compatible platform installed in the IDE.” So you’re still missing the actual library files required to compile J2ME apps.

So after much searching, I found what I think is the correct download (but there are lots of similar ones with different versions): Sun Java Wireless Toolkit 2.5.1 for CLDC. You should pick the Linux version, and download it (it’s called sun_java_wireless_toolkit-2_5_1-linux.bin).

Download it to your desktop, then run:

$ chmod +x sun_java_wireless_toolkit-2_5_1-linux.bin
$ ./sun_java_wireless_toolkit-2_5_1-linux.bin

Now this is really annoying in itself. You will be asked lots of silly questions:

  1. Hold down SPACE to skip the license agreement (I think this is just the GPL, so you shouldn’t even have to agree to it). Type “yes”.
  2. It will likely say “No suitable Java interpreter was detected”. Type “0″ to “Specify a path to a Java interpreter directory.”
    • If you installed openjdk-6-jdk like I said above, it should be installed in /usr/lib/jvm/java-6-openjdk/bin/.
    • You might have many Java bin directories installed. To find one, type ‘which jar’. This might give you a symlink (such as /usr/bin/jar). Use ’stat /usr/bin/jar’ to follow the symlinks until you find the real location, such as the one above.
  3. You are asked to enter a directory to install WTK. Just pick somewhere out of the way, probably in your home directory unless you want to share it with other users.

Step 4: Add the WTK platform to NetBeans.

Now that WTK is installed, we can fix the above issue in NetBeans.

In the New Mobile Application dialogue (heading should be Install Platform), click Install SDK/Platform/Emulator. (You can access this from the Tools -> Java Platforms menu also).

Click the Add Platform button, choose Java ME MIDP Platform Emulator, and then you will be asked to “choose a directory to search for platforms”. Choose the directory where you installed WTK. It should have a special icon. Now it should do some detection magic, and install a platform with a checked checkbox in the Add Java Platform window.

Click Next a few times, and you will see the Java Platform Manager screen with a J2ME folder, and the “Sun Java(TM) Wireless Toolkit 2.5.1 for CLDC” under it.

Now you can complete the wizard to create a mobile application with MIDP! It even has a checkbox to create a little “Hello World” app. Now to get the damn thing running …

Update: Hmm .. can’t seem to figure out how to get the emulator running, but compiling and transferring the .jar file to my phone works. (I had to change the setting from MIDP 2.1 to MIDP 2.0 to get it to work, YMMV.)

A thing I did not know: When you call a function in C without a prototype or with unknown arguments (for example, printf), all arguments are “promoted” to a bigger size.

Anything smaller than an int (bool, char or short) gets promoted to an int. A float gets promoted to a double. This means on a standard 32-bit machine, all variadic arguments are either 32 or 64 bits.

This explains something I’ve always wondered, which is why in printf, %f formats doubles, not floats, and %lf is the same as %f — there is no way to actually pass a float to printf, only a double. It’s different with scanf, where %f writes to a float*, and %lf writes to a double*.

Source: C language standard (ISO/IEC 9899:1999), section 6.5.2.2: Function calls, item 6:

If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.

How’s this for a stupid piece of security news? Microsoft has banned memcpy from its “Security Development Lifecycle”, which presumably means Visual C++ will issue warnings now whenever you try to use it.

Microsoft have done this many times – “deprecating” pieces of the C standard library. Functions already banned are gets, sprintf, scanf, strcpy and strcat. But people are quite up-in-arms about memcpy because it’s a perfectly safe function if you use it properly — it copies exactly the number of bytes you tell it to. I want to make something clear which I haven’t seen quite right yet in this discussion.

The following functions are bad: gets, scanf(“%s”) and sprintf (among others). All modern C manuals warn you about using them, and Microsoft is right to deprecate them. gets, scanf and sprintf have unavoidable overflow issues, because they write to a pre-allocated string, and you have no way of knowing how many bytes will be written. They’re evil. Never use them, ever.

The following functions are okay, but require more caution than usual, I’d say: strcpy, strcat and sscanf(“%s”) (among others). It’s possible to use these safely, but you need to make sure the destination is large enough before you write, which usually means using strlen to allocate an appropriate size. There’s really nothing wrong with these functions, and I’ve always been annoyed that Microsoft took it upon themselves to “deprecate” them.

Now memcpy sits above both of these categories, with the likes of fgets and strncpy. Rather than going till the length of a (potentially unknown) string, it takes a length as input, and copies exactly that many bytes. It couldn’t be more straightforward. Do not copy more bytes than your source buffer, or your destination buffer. If you get that wrong, then you’ve made a serious programming error. Making errors on such a scale in C leads to unsafe programs. That’s just how it is. If you don’t like that, use a different language.

memcpy is not the problem. C is the problem. There’s nothing less safe about memcpy than anything else in the C language – even if you remove the entire library. If you can’t use memcpy correctly, then you can’t use C correctly. The replacement? memcpy_s! The s is for secure! This function takes separate arguments for the size of the source and the destination. It’s still unsafe, of course, it’s just that the new function is going to try and help you out in some cases. “It looks like you’re trying to cause a buffer overrun. Would you like help?” In my opinion, this memcpy_s offers nothing more than a false sense of security.

Oh, and of course the conspiracy theorist in me can’t help but point out that memcpy_s is a Microsoft function. It isn’t part of the C standard. Code which uses it will only work with Visual C++ (of course, you can define it yourself in another compiler, but that’s not ideal). So what’ve we got? A whole bunch of new functions with _s on the end of their names, which are no more secure than the functions we started with, but Visual C++ will now warn you if you write standard C code, and other compilers won’t work if you write Microsoft C code. Of course I think we can trust Microsoft not to write their own implementation of a standard, add their own extensions, then kill off the competition. That would just be ungentlemanlike.

Another blogger pointed out the official example of memmove_s, which has at least two problems I can spot (the most serious one is trying to write into a string literal, which as far as I know will segfault in most C implementations including Microsoft’s).

If you’ve written C++ code with GCC, you’ll know that you need to use the program g++, both for compilation and linking. For multi-module programs, this means every .cpp file gets compiled with g++, and then the entire program must be linked separately using g++. If you try to link the program using gcc, it will almost work, but you’ll get a lot of “undefined reference” errors, like this:

test.cpp:(.text+0x11): undefined reference to `std::cout'

The need to use g++ to link the entire program causes trouble when you have a very complicated build process you don’t have full control of. For instance, I’m trying to link C++ code with Mercury, and I have to use the Mercury linker, which in turn calls gcc.

So just a quick tip: If you are forced to use gcc to link the program, just add the library “stdc++”, as you would any other library, and it will work. That is, add the option “-lstdc++” to your GCC linker command line. For example:

g++ -c hello.cpp
gcc -lstdc++ -o hello hello.o

Please let me know if there are any cases in which this doesn’t work.

I just re-read the effusively-titled paper Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I by John McCarthy[1], which describes the original implementation of Lisp. What you may or may not know is that Lisp contained the first garbage collector in 1958. The original stop-the-world mark-and-sweep algorithm is described very succinctly in three paragraphs:

Nothing happens until the program runs out of free storage. When a free register is wanted, and there is none left on the free-storage list, a reclamation cycle starts.

First, the program finds all registers accessible from the base registers and makes their signs negative. This is accomplished by starting from each of the base registers and changing the sign of every register that can be reached from it by a carcdr chain. If the program encounters a register in this process which already has a negative sign, it assumes that this register has already been reached.

After all of the accessible registers have had their signs changed, the program goes through the area of memory reserved for the storage of list structures and puts all the registers whose signs were not changed in the previous step back on the free-storage list, and makes the signs of the accessible registers positive again.

Of course back in those days, “stop-the-world” was quite literal:

Its efficiency depends upon not coming close to exhausting the available memory with accessible lists. This is because the reclamation process requires several seconds to execute, and therefore must result in the addition of at least several thousand registers to the free-storage list if the program is not to spend most of its time in reclamation.

A footnote (added in 1995) reads:

We already called this process “garbage collection”, but I guess I chickened out of using it in the paper—or else the Research Laboratory of Electronics grammar ladies wouldn’t let me.

[1] “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.” Communications of the ACM 3:4, April 1960, pp. 184–195.

This came up recently when discussing a friend’s code. It relates specifically to Python, but I think the rule is more general.

Firstly, let me be clear that in this article, when I say “modify”, I mean inserting or removing items from the list. Merely updating or mutating the list items is fine.

Now, consider the following Python code:

elems = ['a','b','c']
for e in elems:
    print e
    elems.remove(e)

This prints:

a
c

The author of this code snippet probably expected it to iterate over each element of the list, print it out, then remove it so that the list ends up empty (this may be desirable if, for example, you are removing items conditionally). Hence, he is probably surprised that 'b' was skipped, and furthermore, remains in the list:

>>> elems
['b']

We can see why only 'a' and 'c' were touched by imagining that Python translates the loop into the following equivalent lower-level code:

elems = ['a','b','c']
i = 0
while i < len(elems):
    e = elems[i]
    print e
    elems.remove(e)
    i += 1

This has the same result. Now it’s clear that what’s happening: when you remove the 'a' from elems, the remaining elements slide down the list. The list is now ['b', 'c']. Now 'b' is at index 0, and 'c' is at index 1. Since the next iteration is going to look at index 1 (which is the 'c' element), the 'b' gets skipped entirely.

This rule doesn’t seem to be documented officially anywhere. It’s just a generally-followed rule. I believe the behaviour is, well, unspecified behaviour. Here is a discussion of the issue.

Also note that the same rule applies to the set and dict types (and similar), only those types will actually raise a RuntimeError. (If you know how hash tables work, it should be clear why — there’s no safe way to let that slide).

This rule should be applied in all languages, not just Python. While other languages have different rules, the basic principle above still applies — you’ll just get shot in different ways. (It would even apply to C, if you wrote your own code to insert or remove elements from an array). Also, this sort of code is invariably confusing.

So how can you work around it?

  • You could translate your code into the while-loop form, as above. Then you have explicit access to the variable i, and you can hand-code a fix. (For example, I could choose not to increment i when I am removing an element).
  • You could compute the set of modifications to make to the list during iteration, then apply them afterwards.
  • You could construct a new list during iteration rather than mutating the existing one. (For example, rather than removing all the elements which satisfy a condition, insert into a new list all the elements which don’t).
  • A solution recommended in the discussion linked above is to duplicate the input list and iterate over the copy, mutating the original.

There is always a way around. You just have to be creative!

It seems like a certainty that any given Slashdot article involving Python will feature at least a handful of top-level comments where people complain about Python’s indentation/whitespace rule, regardless of the topic of the news item. There’s probably a race to be the first to complain about it in any given post.

I usually bite. This time, I decided to bite hard, and I’ve decided to re-publish my response here because I think it was rather blog-worthy.

AuMatar says:

On the other hand, I’ve spent at least a full work week of my life fixing problems due to whitespace. Guido made a major f*** up there- by removing braces but not strictly defining whitespace, he’s created a language where it’s possible to have two identical looking pieces of code do very different things. If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error, it would have been fine. As it is I’d say about 15-20% of the time I spent doing Python was spent fixing these kinds of bugs.

To which I replied:

Guido made a major f*** up there- by removing braces but not strictly defining whitespace

Stop. First, the whitespace rule in Python *is* strictly defined.

The formal, exact, unambiguous specification of how Python interprets whitespace is in the official language reference – Lexical analysis [python.org].

It’s pretty wordy, but I’ve studied it and it’s quite precise. The relevant section is here:

“Firstly, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight”

This is exactly the same as the default behaviour of Unix `expand`.

[Guido has] created a language where it’s possible to have two identical looking pieces of code do very different things.

It depends what you mean by “looking”. To you, perhaps 1 tab looks the same as 4 spaces. To me, maybe it looks the same as 2 spaces. To Jeff, maybe it looks like a red dot in his specially-configured editor. To Python, it happens to look the same as 8 spaces.

DO NOT MIX TABS AND SPACES. Then, I guarantee you that any two pieces of code which look the same to you (whether they use tabs or spaces) will also look the same to Python. (You don’t have to enforce this across a whole file, just on a per-block basis, but it’s best if your whole project has an agreed indentation standard).

If he had said that it must be indented by exactly 1 tab or exactly 4 spaces or whatever other measure and everything else would throw a syntax error.

That’s silly. Then you’d be at Guido’s whim; you’d have to indent the way he chose. This way, you can choose any indentation you like. Tabs, 2 spaces, 4 spaces, 3 tabs if you like. As long as you are internally-consistent, Python will be happy.

My second point to you: If you are pasting code from somewhere into your code, and you do not fix up indentation so it matches the surrounding code, you are worse than Hitler. Or at least very lazy. I don’t care if you are using Python or C or Brainf***.

If you carelessly paste 1-tab-indented code into a surrounding block which is 4-tab-indented, and don’t fix it up, then how do you think I will feel when I open it in my editor configured to expand tabs to 2 spaces instead. It will be totally unreadable — and this is why we indent in the first place (in any language, that is).

Python forces you to tidy this up, and that can only be a good thing. If your code is confusing Python, it’s probably confusing a bunch of other readers as well.

I’ve been known to ramble a lot in my writing. Being more concise is something I know I have to work on. Apologies for some of the long posts on this blog.

I was working on the spec for my language, Mars, just now. Writing language specs, it turns out, is pretty hard. You have to precisely capture the semantics of the language in a way that’s clear and understandable. I got up to that awfully mundane bit where you explain the really basic things like if statements and while loops. Things for which everybody knows how they work, but they have to be there anyway, because it is, after all, a formal language spec. And here’s what I wrote under the heading “If statements”:

The if statement conditionally executes a sequence of statements:

if_stmt ::=  "if" expression ":" NEWLINE
                 INDENT statement+ DEDENT
             [ "else" ":" NEWLINE
                 INDENT statement+ DEDENT ]

The expression on the first line is called the “condition”. The condition’s type must be Int, or the compiler MUST reject the program.

The condition is evaluated, and any side-effects of evaluating the expression take place during the execution of the if statement, and before the execution of any sub-expressions.

The first nested block of statements is called the “then” clause. If the optional else part is supplied, then the second nested block of statements is called the “else” clause. One of these clauses, but not both, will be executed, by executing each of the sequence of statements in order.

If the condition’s value is not 0, the “then” clause is executed.

If the condition’s value is 0, then the “else” clause, if supplied, is executed. If it is not supplied, the if statement has no effect, besides the evaluation of the initial expression.

Quite a mouthful! Then I went to the Python language reference, which I absolutely idolise, and have been referring to periodically for guidance. And here’s what Python says on the matter:

The if statement is used for conditional execution:

if_stmt ::=  "if" expression ":" suite
             ( "elif" expression ":" suite )*
             ["else" ":" suite]

It selects exactly one of the suites by evaluating the expressions one by one until one is found to be true (see section Boolean operations for the definition of true and false); then that suite is executed (and no other part of the if statement is executed or evaluated). If all expressions are false, the suite of the else clause, if present, is executed.

“Right!” I said, “If they can describe it in a paragraph, then so can I!” I said, and rewrote mine to a much more condensed version, which I believe describes the same logic:

The expression on the first line is called the “condition”. The condition’s type must be Int, or the compiler MUST reject the program. The first nested block of statements is called the “then” clause. If the optional else part is supplied, then the second nested block of statements is called the “else” clause.

The condition is evaluated, and any side-effects take place immediately. If the value is not 0, the “then” clause is executed. If the value is 0, then the “else” clause, if supplied, is executed.

My point is that if you read my original version, it’s a lot more stiff and formal, and dare I say, condescending? It is clear but not concise. It tells you every little detail about how an if statement works!

The way the Python language reference is written is really assuming some intelligence on the part of the reader. It uses more conversational phrases such as “one by one until one is found to be true; then that suite is executed”. There’s no formal notion of “one by one”, nor do they formally describe what is meant by “that suite”. But of course, any reader with an ounce of sense will know what is meant by “one by one” and “that suite”.

It almost seems counter-intuitive, but assuming some intuition or thought on behalf of your readers can actually make documents easier to read, not harder.

If you compare the size of, say, the C++ language spec (~1300 pages), or even the C language spec (~500 pages) to Python (~100 pages), you can see the difference this style makes.

I’ve been wondering for awhile if commutativity implies associativity in mathematics. That is, for any given binary operation, if it is commutative, is it also associative?

I’ve never seen this implication stated, but I’ve also never thought of a counter-example. Until now! So I thought I’d quickly share it.

First, to state the problem – every commutative operation I’ve seen is also associative. This means that if a * b = b * a, for any given operation *, then it’s also true that a * (b * c) = (a * b) * c.

It somehow makes intuitive sense that commutativity be a “more powerful” property than associativity. After all, if an operation can have its terms switched, and survive intact, surely it can weather a little order-of-operations rearrangement!

In any event, I came up with this function which is commutative, but not associative. In Haskell:

f :: Ord a => [a] -> [a] -> [a]
f x y = if x < y then x ++ y else y ++ x

It takes two lists, and concatenates them in sorted order. Thus, swapping the operands has no effect on the outcome, but the operation (like regular concatenation) is inherently non-associative.

[2] `f` ([1] `f` [3]) == [1,3,2]
([1] `f` [3]) `f` [2] == [1,3,2] (commutes)
([2] `f` [1]) `f` [3] == [1,2,3] (does not associate)

I’ve been listening to the marvellous works of Jonathan Coulton (best known among gamers for Still Alive, the song at the end of Portal). These songs are nerdy and great, and really get in your head.

Now when a song gets in my head with the chorus lyrics:

Just take a point called c in the complex plane. Let z1 be z squared plus c. And z2 is z1 squared plus c. And z3 is z2 squared plus c, and so on. If the series of z’s will always stay close to z and never trend away, that point is in the Mandelbrot set.

- From the song Mandelbrot Set by Jonathan Coulton – free download

My inner code monkey can’t help but want to implement it. So I did.

I resolved to implement a program which generates a view of the Mandelbrot set just using the lyrics of the song, plus this blog post I found by the song’s author with some minor addenda:

  • You take a point called c, not z, in the complex plane (I corrected the lyrics above; you can easily hear “Zee” as “C” in the song anyhow).
  • You need to initialize the first z to 0+0i.

With this in mind, I got out my trusty Python and started hacking away, writing the song lyrics in comments and filling in the code. I hit “go” and ended up with this:

Mandelbrot Set, computed just from the lyrics

Mandelbrot Set, computed just from the lyrics

(Note that it took a couple of goes to get the parameters right, but I didn’t have to tinker with the algorithm itself at all!)

Fantastic!! Instantly generated that famous “bulbous pointy form”.

Then I hit the wiki and got a few tips on producing a coloured version. With a little bit more hacking, I managed to produce this:

Coloured Mandelbrot Set

Coloured Mandelbrot Set

Note the colour bands: not an image compression artefact but a property of the method used to select the colour (based on the discrete number of iterations it took for the point to “trend away”). There’s a way the Wikipedia article suggests to solve this, which is how they generate the spectacular image at the top, but I haven’t get it working yet.

Also note that the coloured version is “just for fun”. This is a set of numbers – you’re either in or you’re out. The black pixels in the coloured version are in the set; the coloured ones basically show you by how much that pixel missed out on being in the set. But the first picture I generated is a mathematically correct (approximation of) the set itself.

I’d post the code, but I just figured out that WordPress doesn’t allow non-image attachments. Grr! I’ll figure out some way later on.

And by the way, because some of us were discussing it (from the same post),

I’ve always wanted for [Benoît Mandelbrot] to hear it, but I fear his corrections. Which is why I just emailed him the mp3. Hopefully he will not be offended by the line in which I joke that he is not dead (yet). Or by the above-described mathematical shortcomings. Or by the “too-much-rocking” that I put in there.

Well it looks like he at least read the lyrics, and was pretty impressed :)