Articles

Integer conversions in C

In C on August 8, 2011 by Matt Giuca

In light of a recent bug in crypt_blowfish highlighted by Steven Gibson on the Security Now podcast (episode #311), I read up on signed / unsigned conversions in C. Wow, is it complex. Here are some things you might not know about integer conversions in C (relating to converting between different bit widths and signedness). (Note: I, like the C specification, use the word “integer” to refer to any type that represents whole numbers, and the word “int” to refer to the C type of the same name.)

  • The type “int” is equivalent to “signed int”. However, it is implementation-defined whether “char” refers to “signed char” or “unsigned char”. If you care, you have to be specific (this is probably what caused the crypt_blowfish bug, because a variable was declared as “char” which really needed to be an unsigned char).
  • Converting between unsigned integers of different sizes is always well-defined (for example, converting from an unsigned int to unsigned char takes the value % 256, assuming an 8-bit char type).
  • However, a narrowing conversion of signed integers is undefined if the value cannot be represented in the new type (for example, converting the value 256 to a signed char is undefined, assuming an 8-bit char type). In almost all compilers, you can assume two’s complement format, which means the behaviour is nearly uniform, but the language standard itself does not mandate two’s complement representation for signed integers, and therefore cannot specify the behaviour when a signed integer is narrowed. The same applies to operations that overflow.
  • Similarly, converting from signed to unsigned is well-defined, but converting from unsigned to signed has undefined behaviour if the value is too large to fit. (Converting from unsigned to a signed value with more bits is always well-defined.)
  • Converting a signed value to an unsigned value with more bits (a widening conversion) is equivalent to first making the signed value wider (preserving the value) and then converting it to unsigned (taking the modulo). For example, converting a signed char -38 to an unsigned short results in 65536 – 38 = 65498, assuming a 16-bit short type. If you think about it, this isn’t obvious, because it could have been the other way around (first convert to unsigned, then widen, so in the above example you would get 256 – 38 = 218), but I think the way they chose makes more sense. In two’s complement notation, this is a sign-extension (and the alternative would be zero-extension).
  • For an operation A * B (“*” represents any binary operator, not necessarily multiplication), if A is signed and B is unsigned, A gets converted to an unsigned number. This is quite unintuitive to me. I would have thought that any operation involving a signed number would be performed as a signed operation, but C seems to always favour unsigned numbers (I presume since the results of conversion are well-defined). This means that if A is a signed char and B is an unsigned int, A is first converted to an unsigned int via sign-extension.

Source: ISO/IEC 9899:TC3 (the C99 draft standard from 2007), §6.3.1

You can see the bug here (in the function BF_set_key). There are two variables involved: unsigned int tmp and char* ptr. Because it doesn’t specify whether ptr points to an unsigned or signed char, the compiler is allowed to choose (and most modern compilers choose signed char). The killer line is “tmp |= *ptr;” which, according to the above rules and assuming a signed char, converts a signed char to an unsigned int by sign-extension. This is very bad since it then performs a bitwise OR with tmp, setting all of the bits above 8 to 1 (when the programmer expected them to be left alone), causing a massive weakening of the hash. The bug fix explicitly casts *ptr to an unsigned char first, and funnily enough includes a switch “sign_extension_bug” to re-enable the old, buggy behaviour in case you want your old hashes to match!

Advertisements

One Response to “Integer conversions in C”

  1. Thanks for the article – integer conversions are quite relevant and complex. Is narrowing to a signed integer really undefined in C99 when the value cannot be represented in it? In C11, the
    result is either implementation-defined or an implementation-defined signal is raised.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: