The Joke

Surely you’ve seen a bunch of these kind of memes floating around the intertubes. But they actually way predate the internet, and often actually mean something other than “just an incomprehensible computer language joke”.

One of the first clues you’ll get that it means something is if the first character in the binary string is a 0. If it’s a 1, it’s probably actually someone typing nonsense on the keyboard to “emulate” binary garbage. How do we know this?

As most programmers know, ASCII is the mapping of numbers to characters that is most frequently used today. That is, if a piece of memory has the number 65 in it, it’s really not clear just by inspection whether it “means” the number 65 or “means” the letter “A”. After all, it’s merely 01000001 at some level, just two transistors or flip-flops or wires in a bundle that happen to have 5V electric potential and the rest don’t. As usual in programming, “everything is an integer, actually that’s a lie, everything is a nothing until you decide how to interpret it”.

And the characters we usually see in daily life are all between 32 and 127 on that table, and when you look at that in binary, that’s 00100000 – 01111111, meaning the first bit is zero. If you happen to notice that every first of each block of 8 characters is a 0, then it’s really unlikely that it’s random gobbledegook.

So how can we get in on this joke? Let’s decode it. We need to take each block of 8 of those 0’s and 1’s and smash them into a character, and then print it out. I’ll do this in a few different languages to show some interesting methods. My favorite for any text parsing is always Perl, but when it comes to binary stuff, it’s a battle between whether it’s faster for me to do it in Perl with pack/unpack/oct/ord, or faster to do bit-shifting and masking in C.

On my first attempt playing with this, I did it with Perl, then C (and then found out that installing a new Mac OSX broke my C compiler and I had to wait to download XCode). There’s two major methods to doing binaryish things in Perl, to pick apart a string that looks binary or hexish and transforming it: oct() and pack(). First I put the binary from the joke above into some file (“x”) so that I don’t lose it, and then catted it into some command-line perl. I found two methods on my own, and then learned of the oct() trick later.

```#> cat <<EOF >x
0110010101111000011101000110010101110010011011010110100101101110011000010111010001100101
EOF
#> cat x | perl -ne 'chomp; print pack("B".length(\$_),\$_); END{print "\n"}'
exterminate
#> cat x | perl -ne 'BEGIN{\$/=\8} chomp; print pack("B8",\$_); END{print "\n"}'
exterminate
#> cat x | perl -ne 'BEGIN{\$/=\8} chomp; print chr(oct("0b".\$_)); END{print "\n"}'
exterminate```

A quick reminder that “perl -ne” means execute the code in quotes once per line. If you need to do something once only rather than each line, then you have to put it in a BEGIN{} or END{} block. And as usual, the last command in a block doesn’t need a semicolon because perl.

So to unpack (*titter*) that first one, first chomp() the line to get rid of newlines that will mess things up – remember that newlines “\n” are characters, too, and they’ll conflict with the “0” and “1” characters. Next, we are using pack to interpret \$_, the contents of the current line, as a binary string of length whatever the string’s length is. “B”.length(\$_) will become “B88”. That turns the string of binaryish things into a string of actual ASCII; those can then be unpack()ed or other things later, but we’re just printing them here.

In the second one, first we use a weird feature of Perl (what isn’t) where we set the \$/ variable – this normally changes the input line separator. If you set it to something other than “\n”, it’ll read lines according to a different line-ending sequence. Neat. You can also set it to undef to have no line-terminator, meaning a single get-line operation will slurp up the whole file. However, a weird edge-case one is when you set it to a reference to an integer (hence the backslash before the 8). That makes it a character-fixed-width reading mechanism, here 8 at a time. Neat! From there, we just translate blocks of 8 at a time with pack(“B8”).

And for the third one, which I learned from StackOverflow myself on this (we all have new things to learn!), first is using a weird feature of the oct() function (Perl and weird features!) where it can interpret octal strings like “070401”, but it can also interpret binary ones, as long as you prepend “0b” up front. So I did that; that results in an integer output, not a character one. So I had to also pump it through a chr() function to turn it into the ASCII equivalent of the integer.

Voila!

How about another language, like C? Sure. There’s lots of ways to read files in C, so I’m going to do the one that doesn’t use the usual stdin constructions, but rather the fileno integers, i.e. 0=stdin, 1=stdout, 2=stderr.

```#> cat b.c
#include <stdio.h>
#include <unistd.h>
int main( int argc, char** argv )
{
char buf[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
char a;
int i;
while( read( 0, buf, 8 ) >= 8 )
{
a = 0;
for( i=0; i<8; i++ ) a = ( a << 1 ) + ( buf[i]=='0' ? 0 : 1 );
fprintf( stdout, "%c", a );
}
fprintf( stdout, "\n" );
}
#> gcc b.c -o b
#> cat x | ./b
exterminate```

For this code, first I make a buffer big enough to grab 8 characters at a time, plus one for the null character ‘\0’ – don’t forget. I fill it with 0’s mostly just to make sure the 9th is a null. Then I read 8 at a time and only print stuff if I get all 8, hence the >= 8 part of the while loop’s invariant. Then I have to build up the binary number from scratch, because there’s no nice binary-interpretation mechanisms like in Perl, which is bizarre to me, considering how much closer to the hardware C is, but I digress. Each block of 8 we perform a for-loop that makes “variable a” equal to “itself shifted left one binary digit” plus “0 if the character is ‘0’ and 1 otherwise”. This is key because 0 and ‘0’ are not the same thing – remember the ASCII table – 0 is 0 but ‘0’ is 48. After packing 8 bits into a character variable a, I print out one character using fprintf; notice the %c rather than the more common %s which is for character array style strings. And don’t forget the newline, of course.

Let’s do it another way. You’re not likely to have seen a bunch of these terms before, so buckle up, cupcake.

```#> cat b2.c
#include <stdio.h>
#include <unistd.h>
#pragma bit_field_size (char)
#pragma bit_field_align (char)
#pragma pack()

typedef union {
struct {
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char b5:1;
unsigned char b6:1;
unsigned char b7:1;
unsigned char b8:1;
} packme;
unsigned char unpackme;
} packit;

int main( int argc, char** argv )
{
char buf[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
packit a;
int i;
while( read( 0, buf, 8 ) >= 8 )
{
/* because ENDIANISM: */
a.packme.b1 = buf[7]=='0'?0:1;
a.packme.b2 = buf[6]=='0'?0:1;
a.packme.b3 = buf[5]=='0'?0:1;
a.packme.b4 = buf[4]=='0'?0:1;
a.packme.b5 = buf[3]=='0'?0:1;
a.packme.b6 = buf[2]=='0'?0:1;
a.packme.b7 = buf[1]=='0'?0:1;
a.packme.b8 = buf[0]=='0'?0:1;
fprintf( stdout, "%c", a.unpackme );
}
fprintf( stdout, "\n" );
}
#> gcc b2.c -fpack-struct -o b2
#> cat x | ./b2
exterminate```

Let’s unpack this. *snicker* OK, first of all what is the structure thingy? Well one thing aside, we’re creating a typedef not a structure, so we’re just creating a template for making structures later, which we actually instantiate later with packit a. Second thing, we’re creating a union not a structure, that means that all the elements in it overlap rather than lie sequentially. Third, the :1 means that we’re explicitly telling it to use only one-bit for the variable – so a bunch of one-bit-long characters.

Why are the inner struct and the unsigned char are in the same location in memory? So we can shove bits in one at a time into the same spot in memory at which the unsigned char lies, and then take the data out “as a character” that way. They need to be unsigned because the most significant bit is a negative sign, even for characters, and that can mess things up.

So, more or less, we’re putting each bit into a spot that overlaps a character, and then extracting the character. Neat.

What’s the rest of the stuff? Well, if we let C run according to it’s normal specifications, each item in a struct or union have to lie on some kind of boundary, usually a character or int. That would mean that each bitfield might get relocated to be at the beginning of a character in memory, which would mean that they’re not actually next to each other. That would mean a bit, 7 bits of empty space, a bit, 7 bits of empty space, etc. That wouldn’t work.

Now, just to be clear: on this particular architecture that I’m on, the #pragma lines and the -fpack-struct command line argument happen to not be needed. But I’m including them here because your architecture might differ, and something closer to this might be necessary to work properly.

One last thing to note: the order of the bits is reversed! Yes, on my architecture, the most significant bit is on the right in physical memory when packing by hand. Binary magic for the win.

Let’s say that you want to make your own one of these jokes with your own words, like I did in the name of this article. Let’s look at using the same languages, but doing it all in reverse.

```#> echo 'Binary' | perl -ne 'chomp; print unpack( "B*", \$_ ), "\n";'
010000100110100101101110011000010111001001111001```

Here, unpack will just generate a bunch (*) of binary (B) given the string \$_. Easy.

So, that ruins my joke, but you probably could have guessed what it meant anyhow. The non-pack way in perl is:

```#> echo 'Binary' | perl -ne 'chomp; print map {sprintf "%08b", ord(\$_)} (split //); print "\n";'
010000100110100101101110011000010111001001111001```

Here , we’re using split // to split a string into individual characters, which automatically uses \$_ as a default — we could have said split //, \$_ if we want. Also, split “” works, they both split a string by “zero length thing”, which means “every individual character”. Then we pump that through a map{} command to have it run it once per, we first convert ‘A’ to 65 using ord() and then print it in a binaryish way using sprintf. The %b will print binary strings from integers. %8b will always print 8 characters but may fill the extra with spaces. %08b will fill extras on the left with 0’s.

```#> cat unb.c
#include <stdio.h>
int main( int argc, char** argv )
{
int c;
int i;
unsigned char a;
while( ( c = getc(stdin) ) != EOF )
{
a = c;
if( a == '\n' ) continue;
for( i=7; i>=0; i-- )
fprintf( stdout, "%c", ( (( a>>i ) & 0x00000001) ? '1' : '0' ) );
}
fprintf( stdout, "\n" );
}
#> gcc unb.c -o unb
#> echo Binary | ./unb
010000100110100101101110011000010111001001111001```

Here, we’re getting one character at a time with getc(), and don’t forget that the thing we get back from getc is an integer! Yeah, we’re getting one character at a time, but it returns an integer because otherwise we’d never know when we’re done. EOF isn’t a character that fits in a single char because that would mean there’s a character out there in a file that’s unreadable or unprintable. So it gives back something that doesn’t fit in a char as a result.

Next, if it’s a newline, skip it, but the money portion is the for-loop. We shift the character rightwards down i-times and then use binary-and (&) to chop off only the last bit — if that’s true (a 1), then we print a ‘1’, otherwise a ‘0’. Also note, the order is reversed — 7 down to 0.

Final version, using the struct-packing:

```#> cat unb2.c
#include <stdio.h>

typedef union {
struct {
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char b5:1;
unsigned char b6:1;
unsigned char b7:1;
unsigned char b8:1;
} packme;
unsigned char unpackme;
} packit;

int main( int argc, char** argv )
{
packit a;
int c;
int i;
while( ( c = getc(stdin) ) != EOF )
{
a.unpackme = c;
if( a.unpackme == '\n' ) continue;
fprintf( stdout, "%01d", a.packme.b8 );
fprintf( stdout, "%01d", a.packme.b7 );
fprintf( stdout, "%01d", a.packme.b6 );
fprintf( stdout, "%01d", a.packme.b5 );
fprintf( stdout, "%01d", a.packme.b4 );
fprintf( stdout, "%01d", a.packme.b3 );
fprintf( stdout, "%01d", a.packme.b2 );
fprintf( stdout, "%01d", a.packme.b1 );
}
fprintf( stdout, "\n" );
}
#> gcc unb2.c -o unb2
#> echo Binary | ./unb
010000100110100101101110011000010111001001111001```

And with that, you can make any of these joke memes you want, and they don’t even have to channel Battlestar Galactica.

#EXTERMINATE

4 thoughts on - 01000010 01101001 01101110 01100001 01110010 01111001 Jokes

• gwr says:

buf[i]==’0′ ? 0 : 1

faster to say buf[i] – ‘o’, no??

• yeah though if the character is a ‘2’ it’ll so something screwey, so i decided to make it more boolean

• gwr says:

Oh, and C++ std::bitset ???

• yeah, i just wanted to highlight the bitfield and #pragma stuff cuz it’s cool and not widely known