In light of a recent bug in crypt_blowfish highlighted by Steven Gibson on the Security Now podcast (episode #311), I read up on signed / unsigned conversions in C. Wow, is it complex. Here are some things you might not know about integer conversions in C (relating to converting between different bit widths and signedness). (Note: I, like the C specification, use the word “integer” to refer to any type that represents whole numbers, and the word “int” to refer to the C type of the same name.)
- The type “int” is equivalent to “signed int”. However, it is implementation-defined whether “char” refers to “signed char” or “unsigned char”. If you care, you have to be specific (this is probably what caused the crypt_blowfish bug, because a variable was declared as “char” which really needed to be an unsigned char).
- Converting between unsigned integers of different sizes is always well-defined (for example, converting from an unsigned int to unsigned char takes the value % 256, assuming an 8-bit char type).
- However, a narrowing conversion of signed integers is undefined if the value cannot be represented in the new type (for example, converting the value 256 to a signed char is undefined, assuming an 8-bit char type). In almost all compilers, you can assume two’s complement format, which means the behaviour is nearly uniform, but the language standard itself does not mandate two’s complement representation for signed integers, and therefore cannot specify the behaviour when a signed integer is narrowed. The same applies to operations that overflow.
- Similarly, converting from signed to unsigned is well-defined, but converting from unsigned to signed has undefined behaviour if the value is too large to fit. (Converting from unsigned to a signed value with more bits is always well-defined.)
- Converting a signed value to an unsigned value with more bits (a widening conversion) is equivalent to first making the signed value wider (preserving the value) and then converting it to unsigned (taking the modulo). For example, converting a signed char -38 to an unsigned short results in 65536 – 38 = 65498, assuming a 16-bit short type. If you think about it, this isn’t obvious, because it could have been the other way around (first convert to unsigned, then widen, so in the above example you would get 256 – 38 = 218), but I think the way they chose makes more sense. In two’s complement notation, this is a sign-extension (and the alternative would be zero-extension).
- For an operation A * B (“*” represents any binary operator, not necessarily multiplication), if A is signed and B is unsigned, A gets converted to an unsigned number. This is quite unintuitive to me. I would have thought that any operation involving a signed number would be performed as a signed operation, but C seems to always favour unsigned numbers (I presume since the results of conversion are well-defined). This means that if A is a signed char and B is an unsigned int, A is first converted to an unsigned int via sign-extension.
Source: ISO/IEC 9899:TC3 (the C99 draft standard from 2007), §6.3.1
You can see the bug here (in the function BF_set_key). There are two variables involved: unsigned int tmp and char* ptr. Because it doesn’t specify whether ptr points to an unsigned or signed char, the compiler is allowed to choose (and most modern compilers choose signed char). The killer line is “tmp |= *ptr;” which, according to the above rules and assuming a signed char, converts a signed char to an unsigned int by sign-extension. This is very bad since it then performs a bitwise OR with tmp, setting all of the bits above 8 to 1 (when the programmer expected them to be left alone), causing a massive weakening of the hash. The bug fix explicitly casts *ptr to an unsigned char first, and funnily enough includes a switch “sign_extension_bug” to re-enable the old, buggy behaviour in case you want your old hashes to match!