the universal computer

is an excellent book by Martin Davis (isbn 978-1-4665-0519-3). As usual I'm going to quote from a few pages:
Liebnitz's involvement with the Harz Mountain mining project ultimately proved to be a fiasco. In his optimism, he had not forseen the natural hostility of the expert mining engineers towards a novice proposing to teach them their trade. Nor had he allowed for the inevitable break-in period a novel piece of machinery requires or for the unreliability of the winds.
Unlike the usual experience with a new untried gadget, Turing's Bombes, built from his design, worked correctly as soon as they were made.
"There are several theorems which say almost exactly that ... if a machine is expected to be infallible, it cannot also be intelligent... But these theorems say nothing about how much intelligence may be displayed if a machine makes no pretence at infallibility." [Turing]
There is nothing in Godel's theorem to preclude the mathematical powers of a human mind being equivalent to an algorithmic process that produces false as well as true statements.
It is interesting to contrast von Neumann's view of computer programming as an activity with Turing's; von Neumann called it "coding" and made it clear that he thought of it as a clerical task requiring little intellect. A revealing anecdote tells of a practice at the Institute for Advanced Study computer facility of using students to translate by hand, computer instructions written using human-readable mnemonics into machine language. A young hot-shot programmer proposed to write an assembler that would do this conversion automatically. Von Neumann is said to have responded angrily that it would be wasteful to use a valuable scientific tool to do a mere clerical job. In his ACE report, Turing said that the process of computer programming "should be very fascinating. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself."
There is no reason to think that a full scale ACE-style computer would not have worked well if the organization and resources to build one had been there. The issue is best understood in the more general context of the question of which computer functions should be supplied by the hardware and which by software. Turing had proposed a relatively simple machine in which a lot was left to be supplied by software, but where, in compensation, the programmer had very substantial control of underlying machine operations.
"I expect that digital computing machines will eventually stimulate a considerable interest in symbolic logic... The language in which on communicates with these machines ... forms a sort of symbolic logic." [Turing]
Searle tells us that Deep Blue "has a bunch of meaningless symbols." Well, if you could look inside Deep Blue when it was in operation, you wouldn't see any symbols, meaningful or not. At the level of circuits, electrons are moving around. Just as, if you look inside Kasparov's skull while he is playing, you wouldn't see any chess pieces, you'd see neurons firing.
Our consciousness is a principal way in which each of us experiences his or her unique individuality. But we know it only from the inside. We experience our own consciousness but not that of anyone else.


print "squashed-circle" diamond

There's been a bit of a buzz about the Print-Diamond practice recently. I recall doing this a couple of years ago with Johannes Brodwall. In cyber-dojo naturally. We took a wrong turn and were making a thorough mess of it. I vividly recall Johannes saying:
This is too difficult. We're doing it wrong.
I love that. If it's difficult you're probably doing it wrong. We gave up and took a break. We got a really nice Indian take away. Then we went back to Print-Diamond. Very quickly we came up with a new idea. We imagined the diamond lying in the center of an x,y axis. The Print-Diamond of 'C' would therefore look like this:

       -2 -1  0 +1 +2

  -2    -  -  A  -  -    
  -1    -  B  -  B  -
   0    C  -  -  -  C
  +1    -  B  -  B  -
  +2    -  -  A  -  -


Viewed like this you can think of the Diamond as a sort of squashed circle with the A,B,C characters all lying on the circumference. From here it was a short step to this:
(-2..+2).collect{|row| 
  (-2..+2).collect{|col| 
    row.abs + col.abs == 2 ? 'X' : '-'
  }.join 
}.join("\n")

which gives:
--X--
-X-X-
X---X
-X-X-
--X--

And we knew we were on our way.
Let's hear it for Indian Food Driven Development!

yet another interesting TDD episode

In the previous episode I described how a custom assert function declared a local array which I did not initialize.
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    ...
}

The effect of not initializing the array was that state from one test leaked into another test and I got an unexpectedly passing test. I fixed the problem by initializing the array.
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16] = "";
    ...
}

The most recent episode (in cyber-dojo naturally) revolved around this same issue. This time I was redoing the roman numerals exercise (1 → "I", 2 → "II", etc) in C. My custom assert function started like this.
...
#define PRINT(s) print_string(#s, s)

static void print_string(const char * name, const char * s)
{
    printf("%10s: \"%s\"\n", name, s);
}

static void assert_to_roman(const char * expected, int n)
{
    char actual[32];
    to_roman(actual, sizeof actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("to_roman(%d) FAILED\n", n);
        PRINT(expected);
        PRINT(actual);
        assert(false);
    }
}

Once again I failed to initialize the array. I'm a slow learner. This time the test failed unexpectedly! And the diagnostic was even more unexpected:
to_roman(1) FAILED
  expected: "I"
    actual: "I"

This is a less than ideal diagnostic! It seemed that strcmp and printf had differing opinions on what a string is! I fixed it by adding initialization.
...
static void assert_to_roman(const char * expected, int n)
{
    char actual[32] = "";
    to_roman(actual, sizeof actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("to_roman(%d) FAILED\n", n);
        PRINT(expected);
        PRINT(actual);
        assert(false);
    }
}

After this the tests passed. This addressed the immediate problem but it did not address to the root cause. So I removed the initialization and (with a hat tip to Mr Jonathon Wakely) I reworked print_string to display the length of the string as well as an indication of the (un)printability of each character:
static void print_string(const char * name, const char * s)
{
    printf("%10s: \"%s\" %d ", name, s, (int)strlen(s));
    for (size_t i = 0; i != strlen(s); i++)
    {
         putchar(isprint(s[i]) ? 'P' : 'U');   
    }
    putchar('\n');
}

With this change the diagnostic became:
to_roman(1) FAILED
  expected: "I" 1 P
    actual: "I" 4 UUUP

Much better. Then I thought about initializing the array a bit more. I realized that initializing the array to the empty string was a poor choice since it masked a fault in the implementation of to_roman which did not start like this:
void to_roman(char buffer[], size_t size, int n)
{
    buffer[0] = '\0';
    ...
}

So I added that and reworked print_string as follows:
static void assert_to_roman(const char * expected, int n)
{
    char actual[32];
    memset(actual, '!', sizeof actual);
    to_roman(actual, sizeof actual, n);
    ...
}

And I was back to green :-)

commas in C - can you pass the puzzle ?

In a recent Deep-C training course in Bangalore I was discussing sequence points with some C programmers. I explained that the comma operator is one the very few operators that creates a sequence point. I like the comma. It's the name of a beautiful butterfly. K&R's famous Hello World program had a comma between hello and world. I should really put a comma into my blog picture!

During several cyber-dojos it became clear to me that many of the programmers (despite having programmed in C for several years), did not understand that in C, not all commas are the same. I've created a small piece of C code to try help C programmers understand the humble comma...

Have a look at the following 5 lines of code. Do you know what each line does?
int x = (1,2,3);
int x =  1,2,3;

   x =   1,2,3;
   x =  (1,2,3);
   x = f(1,2,3); 
.
.
.
.
.
.
Scroll down for my answers once you've decided...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
int x = (1,2,3);

This declares an int called x and initializes it to the result of the expression (1,2,3). The commas inside this expression are operators. 1 is evaluated and its value (1) discarded, then a comma provides a sequence point, then 2 is evaluated and its value (2) discarded, then a comma provides a sequence point, then 3 is evaluted and its value is the value of the expression (1,2,3). So x is initialized to 3. You'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
int x =  1,2,3;

This is different. If this compiled it would declare an int called x and initialize it to 1 and then declare two more ints called 2 and 3. It has the same structure as int x = 1,y,z; which declares three ints called x, y, and z. The commas are not operators, they are punctuators/separators. You can't declare variables called 2 or 3. It does not compile.
   x =   1,2,3;

In this fragment x is assumed to have already been declared. It is not a declaration. The commas are operators again. Assignment has higher precedence than the comma operator so this binds as (x = 1),2,3;. So 1 is assigned to x, and the result of this assignment expression (1) is discarded, then there is a sequence point, then 2 is evaluated and its value (2) is discarded, then there is a sequence point, then 3 is evaluated and its value (3) is discarded. You'll probaby get warnings saying there are no side-effects in the expressions 2 and 3.
   x =  (1,2,3);

Again, x is assumed to have already been declared. It is not a declaration. The commas are operators again. This is the same as the first fragment except it is not a declaration. x is assigned the value of the expression (1,2,3) which is 3. Again you'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
   x = f(1,2,3);

Again, x is assumed to have already been declared. As has a function called f which accepts three int arguments. These commas are not operators. They are punctuators/separators. They separate the three expressions forming the three arguments to f. But they do not introduce any sequence points.

How did you do?

another interesting TDD episode

The classic TDD cycle says that you should start with a test for new functionality and see it fail.
There is real value in not skipping this step; not jumping straight to writing code to try to make it pass.
  • One reason is improving the diagnostic. Without care and attention diagnostics are unlikely to diagnose much.
  • A second reason is to be sure the test is actually running! Suppose for example, you're using JUnit and you forget its @Test annotation? Or the public specifier?
  • A third reason is because sometimes, as we saw last time, you get an unexpected green! Here's another nice example of exactly this which happened to me during a cyber-dojo demo today.
I was doing the fizz-buzz practice in C.
I started by writing my first test, like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    fizz_buzz(actual, sizeof actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("fizz_buzz(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
}

I made this fail by writing the initial code as follows (the (void)n is to momentarily avoid the "n is unused" warning which my makefile promotes to an error using the -Werror option):
void fizz_buzz(char * result, size_t size, int n)
{
    (void)n;
    strncpy(result, "Hello", size);
}

which gave me the diagnostic:
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(3)
expected: "Fizz"
  actual: "Hello"

I made this pass with the following slime
void fizz_buzz(char * result, size_t size, int n)
{
    if (n == 3)
        strncpy(result, "Fizz", size);
}

Next, I returned to the test and added a test for 6:
static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
    assert_fizz_buzz("Fizz", 6);
}

I ran the test, fully expecting it to fail, but it passed!
Can you see the problem?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The problem is in assert_fizz_buzz which starts like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    ...
}

Here's what's happening:
  • assert_fizz_buzz("Fizz", 3) is called
  • char actual[16] is defined
  • fizz_buzz(actual, sizeof actual, 3) is called
  • if (n == 3) is true
  • "Fizz" is strncpy'd into actual
  • fizz_buzz(actual, sizeof actual, 3) returns
  • strcmp says that expected equals actual
  • ...
  • assert_fizz_buzz("Fizz", 6) is called
  • char actual[16] is defined
  • actual exactly overlays its previous location so its first 5 bytes are still 'F','i','z','z','\0'
  • fizz_buzz(actual, sizeof actual, 6) is called
  • if (n == 3) is false
  • fizz_buzz(actual, sizeof actual, 6) returns
  • strcmp says that expected equals actual

My mistake was in the test; actual has automatic storage duration so does not get initialized. It's initial value is indeterminate. The first call to assert_fizz_buzz is accidentally interfering with the second call. Tests should be isolated from each other. I tweaked the test as follows:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16] = { '\0' };
    ...
}

I ran the test again and this time it failed :-)
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(6)
expected: "Fizz"
  actual: ""

I made the test pass:
void fizz_buzz(char * result, size_t size, int n)
{
    if (n % 3 == 0)
        strncpy(result, "Fizz", size);
}

Let's hear it for starting with a test for new functionality and seeing it fail.

an interesting TDD episode

I'm doing the roman-numerals kata in C.
I write a test as follows:
#include "to_roman.hpp"
#include <assert.h>
#include <string.h>

int main(void)
{
    char actual[32] = { '\0' };
    to_roman(actual, 111);
    assert(strcmp("CXI", actual) == 0);
}

I write a do-nothing implementation of to_roman.
I run the tests and I get (I kid you not) this:
...
test: to_roman.tests.c:26: main: Assertion `__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p ("CXI") && __builtin_constant_p (actual) && (__s1_len = __builtin_strlen ("CXI"), __s2_len = __builtin_strlen (actual), (!((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) || __s1_len >= 4) && (!((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) || __s2_len >= 4)) ? __builtin_strcmp ("CXI", actual) : (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) && (__s1_len = __builtin_strlen ("CXI"), __s1_len < 4) ? (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (actual); register int __result = (((const unsigned char *) (const char *) ("CXI"))[0] - __s2[0]); if (__s1_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[2] - __s2[2]); if (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) ("CXI"))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) && (__s2_len = __builtin_strlen (actual), __s2_len < 4) ? (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s1 = (const unsigned char *) (const char *) ("CXI"); register int __result = __s1[0] - ((const unsigned char *) (const char *) (actual))[0]; if (__s2_len > 0 && __result == 0) { __result = (__s1[1] - ((const unsigned char *) (const char *) (actual))[1]); if (__s2_len > 1 && __result == 0) { __result = (__s1[2] - ((const unsigned char *) (const char *) (actual))[2]); if (__s2_len > 2 && __result == 0) __result = (__s1[3] - ((const unsigned char *) (const char *) (actual))[3]); } } __result; }))) : __builtin_strcmp ("CXI", actual)))); }) == 0' failed.
...

So I work towards improving the diagnostic with a custom assert, as follows:
#include "to_roman.h"
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static void assert_roman(const char * expected, int n)
{
    char actual[32] = { '\0' };
    to_roman(actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("to_roman(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

int main(void)
{
    assert_roman("CXI", 111);
}

I run this and my diagnostic is as follows:
test: to_roman.tests.c:16: assert_roman: Assertion `0' failed.
to_roman(111)
expected: "CXI"
  actual: ""
...

Much better :-)
Now I start to implement to_roman
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "CXI");
}

And I'm at green.
I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "C");
    strcat(roman, "X");
    strcat(roman, "I");
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "C" };
    const char * tens[]     = { "X" };
    const char * units[]    = { "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[0]);
    strcat(roman, tens[0]);
    strcat(roman, units[0]);
}

Remembering that in my test, n is one-hundred-and-eleven, I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[1]);
    strcat(roman, tens[1]);
    strcat(roman, units[1]);
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green. Now I add a new test:
int main(void)
{
    assert_roman("CXI", 111);
    assert_roman("CCXXII", 222);
}

I run it and am amazed to see it pass.
It takes me a little while to figure out what is going on.
I'll take it line by line.
When n == 222 this line:
    strcat(roman, hundreds[n / 100]);

is this
    strcat(roman, hundreds[2]);

and hundreds[2] is an out-of-bounds index. However, hundreds[2] just happens to evaluate to the same as tens[0] which is the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 100;
    strcat(roman, tens[n / 10]);

which is this:
    strcat(roman, tens[2]);

And tens[2] is also an out-of-bounds index. And tens[2] just happens to evaluate to the same as units[0] which is also the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 10;
    strcat(roman, units[n]);

which is this:
    strcat(roman, units[2]);

And yet again units[2] is an out-of-bounds index. This time units[2] just happens to evaluate to "CCXXII" from the test! So after this roman is "CCXXII" and the test passes! Amazing!

I edit the code to this:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "CC" };
    const char * tens[]     = { "", "X", "XX" };
    const char * units[]    = { "", "I", "II" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green.

So now I'm wondering if there are any lessons I can learn from this episode. It was not a good idea to run the tests (to try and get an initial red) when doing so would knowingly cause the (unfinished) program to exhihibit undefined behaviour. In cyber-dojo terms an amber traffic-light is not the same as a red traffic-light. After adding the second test I should have edited to_roman as follows:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "" };
    const char * tens[]     = { "", "X", "" };
    const char * units[]    = { "", "I", "" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

Then I would have got a proper red:
test: to_roman.tests.c:17: assert_roman: Assertion `0' failed.
to_roman(222)
expected: "CCXXII"
  actual: ""
...


lessons from testing

I have run hundreds of test-driven coding dojos using cyber-dojo.
I see the same test anti-patterns time after time after time.
Do some of your tests exhibit the same same anti-patterns?



are you missing a TDD step?

Here's a TDD state diagram.
  • start by writing a test for new functionality
  • see it fail
  • make it pass
  • refactor
  • round and round you go
It looks a bit like an animal. Let's give it some eyes!



But there's something missing!
There's no red-to-red self-transition.
My animal has no left ear!
I'll give it a left ear.




What is this left ear?
It's for changes made at red.
I see the test fail.
I read the diagnostic.
Then I stay at red and improve the diagnostic.
When I'm happy with diagnostic I get rid of it by making the test pass.



This was part of my lessons from testing presentation which reviews common test anti-patterns I see on cyber-dojo.

Note: I'm being careful not to call this red-to-red transition a refactoring since refactoring is for changes made at green.


pro git

Is an excellent book by Scott Chacon (isbn 978-1-4302-1833-3). As usual I'm going to quote from a few pages:
Git as a content-addressable filesystem is a very powerful tool that you can easily use as more than just a VCS.
In a DVCS, clients don't just check out the latest snapshot of the files: they fully mirror the repository... Every checkout is really a full backup of all the data.
Conceptually, most other systems store information as a list of file-based changes. These systems think of the information they keep as a set of files and the changes made to each file over time... Git doesn't think of or store its data in this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem... This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.
It is important to note that the fetch command pulls the data to your local repository - it doesn't automatically merge it with any of your work or modify what you're currently working on. You have to merge it manually into your work when you're ready.
Running git pull generally fetches data from the server you originally cloned from and automatically tries to merge it into the code you're currently working on.
The way Git branches is incredibly lightweight, making branching operations nearly instantaneous and switching back and forth between branches generally just as fast. Unlike make other VCSs, Git encourages a workflow that branches and merges often, even multiple times a day.
A branch in Git is simply a lightweight moveable pointer to one of these commits.
To switch to an existing branch, you run the git checkout command.
Git determines the best common ancestor to use for its merge base; this is different than CVS or Subversion (before version 1.5), where the developer doing the merge has to figure out the best merge base for themselves.
In Git it's common to create, work on, merge, and delete branches several times a day.
In Git there are two main ways to integrate changes from one branch into another; the merge and the rebase.
At the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
If someone at any point in the history of your project added a single huge file, every clone for all time will be forced to download that large file, even if it was removed from the project in the very next commit. Because it's reachable from the history it will always be there.