Calculating entropy
I was reading something and it mentioned in passing that you can convert from base 10 to base 2 by simply dividing by the base to convert to, and then collecting the remainder digits. So for base 10 to base 2 conversion you simply keep dividing by two and collecting the remainders. I'd never heard of that before, so I had to try it out in code immediately, as it seemed like magic. I spent a few minutes coming up with some code to test it out and yes, it works! I then went off on a bit of a tangent...
It seems like the sort of thing that should be doable in 3 lines or so, but my version is a bit long-winded. It also seems like it should be amenable to recursion too. Anyway, here's some noddy code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// size hardcoded at 8 bits
void to_binary (int *bin_array, unsigned int num)
{
unsigned int base = 2;
unsigned int res, rem;
int i=7;
while (num > 0)
{
res = num / base;
rem = num % base;
bin_array[i] = rem;
i--;
num = res;
}
}
void clear_array (int *array)
{
for (int i = 0; i < 8; i++)
{
array[i] = 0;
}
}
void print_array (int *array)
{
for (int i = 0; i < 8; i++)
{
printf("%d", array[i]);
}
printf("\n");
}
int main (int argc, char **argv)
{
int bin_array[8];
clear_array(bin_array);
to_binary(bin_array, 19);
print_array(bin_array);
return 0;
}
It's then possible to take this silliness a step further and add a little tweak to tell us how many bits are required to code a number. This is the entropy of a number:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BITS 8
void to_binary (int *bin_array, unsigned int num)
{
unsigned int base = 2;
unsigned int res, rem;
int i=BITS-1;
int entropy_bits = 0;
while (num > 0)
{
res = num / base;
rem = num % base;
bin_array[i] = rem;
i--;
num = res;
entropy_bits++;
}
printf("Entropy Bits: %d\n", entropy_bits);
}
void clear_array (int *array)
{
for (int i = 0; i < BITS; i++)
{
array[i] = 0;
}
}
void print_array (int *array)
{
for (int i = 0; i < BITS; i++)
{
printf("%d", array[i]);
}
printf("\n");
}
int main (int argc, char **argv)
{
int bin_array[BITS];
if (argc == 1)
{
printf("Example usage: to_binary 123\n");
exit(-1);
}
clear_array(bin_array);
to_binary(bin_array, atoi(argv[1]));
print_array(bin_array);
return 0;
}
A few runs of the program are shown here:
bash-3.2$ ./a.out 63
Entropy Bits: 6
00111111
bash-3.2$ ./a.out 244
Entropy Bits: 8
11110100
bash-3.2$ ./a.out 19
Entropy Bits: 5
00010011
bash-3.2$
There is a mathematical formula for calculating entropy. The following code shows this formula in action:
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
int main (int argc, char **argv)
{
if (argc == 1)
{
printf("Example usage: entropy 123\n");
exit(-1);
}
double x = atof(argv[1]);;
double log2x = log10(x)/log10(2);
printf ("entropy: %f\n", log2x);
return 0;
}
You can then compare the output between the two programs:
bash-3.2$ ./to_binary 123
Entropy Bits: 7
bash-3.2$ ./entropy 123
entropy: 6.942515
bash-3.2$
You can see there is a slight mathematical 'glitch'. This can be fixed with the ceil()
function, which basically rounds up to the nearest integer:
double log2x = ceil(log10(x)/log10(2));
Ah, but hang on, it looks like the formula might be wrong:
bash-3.2$ ./to_binary 4
Entropy Bits: 3
00000100
bash-3.2$ ./entropy 4
entropy: 2.000000
bash-3.2$
We'll need to look into that a bit more ...
But it can be fixed easily enough like this:
double log2x = ceil(log10(x+1)/log10(2));
So, as per Shannon's law, the entropy of x is log2(x), and you use the formula to work that out.