# Roslyn Code Gems - Counting Bits

Thursday, February 19, 2015

Back in the early days of computing, when machines were powered by coal-fired steam boilers the size of Liechtenstein, even the simplest operations were optimized by programmers toiling in the basements of university science buildings around the world. Simple operations like counting the number of “on” bits in a word.

Back then you couldn’t afford to use a naive approach...

```unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v

for (c = 0; v; v >>= 1)
{
c += v & 1;
}
```

... because each CPU cycle required 4.09 x 10-5 kg of hard coal, and the total cost adds up quickly.

It’s not surprising then, to find papers on efficient bit counting algorithms that have been around for decades. Sean Eron Anderson has a selection of these algorithms, plus others, on the Bit Twiddling Hacks page.

The Roslyn compiler platform builds on this research with classes like BitArithmeticUtilities, which features code like the following.

```public static int CountBits(ulong v)
{
unchecked
{
return (int)v;
}
}
```

It’s not often you find code with visually stimulating patterns that is also environmentally friendly.

Coming up soon – how many type parameters is too many? 10? 20? 30?

jdan Thursday, February 19, 2015
Awesome
bdunagan Thursday, February 19, 2015
poetic
manuel Friday, February 20, 2015
very awesome :-) this saves my day
Phil Bolduc Monday, February 23, 2015
There are many more of these bit tricks found at https://graphics.stanford.edu/~seander/bithacks.html For example, to determine if a number is a power of 2, you can use: v != 0 && (v & (v - 1)) != 0 This will be faster than doing CountBits(v) == 1
scott Monday, February 23, 2015
@Phil - thanks. That was page I already linked to in the post :)
Christian Ullenboom Tuesday, February 24, 2015
The naive approach isn't that bad, the performance only depends on the input data. If the number of set bit are small, the loop variant is faster, in theory its O(n) but in the "folding" version you have bigger constant, so it's not O(1). The interesting question¬ is:¬ For how many bit set is the non-loop version faster, this is a question for the readers ;)
Wesner Moise Tuesday, February 24, 2015
I used to use this approach, but it's actually about 3 times slower than optimal. First of all, masking is not necessary for the left operand in the last 3 assignments. private const ulong M1 = 0x5555555555555555; //binary: 0101... private const ulong M2 = 0x3333333333333333; //binary: 00110011.. private const ulong M4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ... private const ulong M8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ... private const ulong M16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ... private const ulong M32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones private const ulong Hff = 0xffffffffffffffff; //binary: all ones private const ulong H01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3... // taken from Hacker's Delight, chapter 5, pp66 public static uint BitsSetCountOptimised(uint i) { i = i - ((i >> 1) & 0x55555555); // adjacent bits grouped i = (i & 0x33333333) + ((i >> 2) & 0x33333333); // adjacent 2-bit groups paired i = i + ((i >> 4) & 0x0F0F0F0F); // adjacent 4-bit groups paired i = i + (i >> 8); // adjacent 8-bit groups paired i = i + (i >> 16); // adjacent 16-bit groups pair return (i & 0x0000003F); // a 32-bit unsigned int can have at most 32 set bits, // which in decimal needs on 6 bits to represent } public static uint BitsSetCountHackMem(uint i) { uint n = (i >> 1) & 0x77777777; i = i - n; n = (n >> 1) & 0x77777777; i = i - n; n = (n >> 1) & 0x77777777; i = i - n; i = (i + (i >> 4) & 0xF0F0F0F); i = i * 0x01010101; return i >> 24; }