Thursday, February 17, 2011

Why is a char and a bool the same size in c++?

I'm reading The C++ Programming Language. In it Stroustrup states that sizeOf(chr) is always 1 and 1 <= sizeOf(bln). The specifics depend on the implementation. Why would such a simple value as a boolean take the same space as a char?

From stackoverflow
  • It takes the same space, because the smallest amount of space you can write in memory is a single byte. Both values are stored in a byte. Although you theoretically only need 1 bit to signify a boolean value, you still have to have a whole byte to store the value.

  • Theoretically you only need a single bit for a bool, but working with less than 1 byte's worth of data is messy. You need more instructions to achieve anything and you don't really benefit.

    If you want to pack multiple booleans into a single byte you can use a bit-field structure.

    Jonathan Leffler : But you would be better off not doing so.
  • In modern computer architectures, a byte is the smallest addressable unit of memory. To pack multiple bits into a byte requires applying extra bit-shift operations. At the compiler level, it's a trade off of memory vs. speed requirements (and in high-performance software, those extra bit-shift operations can add up and slow down the application needlessly).

    Cybis : Wow!! One simple response boosted my reputation 2.5x (by almost 150 points). Cool :)
  • Because in C++ you can take the address of a boolean and most machines cannot address individual bits.

    Evan Teran : true, but you can fake it by overloading the unary & operator and returning a proxy reference class :-P
    Steve Jessop : The resulting "fake pointer" would not be representable as a char*, which the C++ standard requires for pointers to built-in types (not including pointer-to-function and pointer-to-member). So while fun for user-defined types, the trick can't be used by the compiler for bool.
  • Actually, in most implementation that I know of sizeof(bool) == sizeof(int). "int" is intended to be the data size that is most efficient for the CPU to work with. Hence things which do not have a specific size (like "char") are the same size as an int. If you had a large number of them per object, you may want to implement a means of packing them for storage, but during normal calculation, it should be left it's native size.

    Mats Fredriksson : if you are going to use an array of booleans, use the std::vector, it has specialised implementations for bit vectors that only uses one bit per element.
    Steve Jessop : @mats: std::bitset is less crazy. vector tries and fails to be an STL container. There's also boost::dynamic_bitset, which unlike std::bitset can grow.
  • There is this thing in C++ called vector that attempts to exploit the fact that you can theoretically store 8 bools in one char, but it's widely regarded as a mistake by the C++ standards committee. The book "effective stl" actually says "don't use it". That should give you an idea of how tricky it is.

    BTW: Knuth has a book just dedicated to bitwise operations. Boost also has a library dedicated to handling large numbers of bits in a more memory efficient way.

  • A byte is the smallest addressable unit of memory.

    Consider the following code:

        bool b[9];
        bool *pb0 = &b[0];
        bool *pb1 = &b[1];
    
        for (int counter=0; counter<9; ++counter)
        {
             // some code here to fill b with values
             b[counter] = true;
    
        }
    

    If bool is stored as 1 bit, then pb0 will equal pb1, because both have the same address. This is clearly not desirable!

    Additionally the assignment in the loop will result in non-trival assembly code. It will involve a different bit shift in each iteration of the loop. In high-performance software, those extra bit-shift operations can slow down the application needlessly.

    The STL library provides a work-around in situations where space DO matter. The use of std::vector<bool> will store bool as 1 bit. The paradox of the example above do not apply because

    • the overloading of operator[] hides the rigors of the bit shift operation
    • the use of iterators instead of pointers give additional flexibilty to the implementation

0 comments:

Post a Comment