commas in C - can you pass the puzzle ?

In a recent Deep-C training course in Bangalore I was discussing sequence points with some C programmers. I explained that the comma operator is one the very few operators that creates a sequence point. I like the comma. It's the name of a beautiful butterfly. K&R's famous Hello World program had a comma between hello and world. I should really put a comma into my blog picture!

During several cyber-dojos it became clear to me that many of the programmers (despite having programmed in C for several years), did not understand that in C, not all commas are the same. I've created a small piece of C code to try help C programmers understand the humble comma...

Have a look at the following 5 lines of code. Do you know what each line does?
int x = (1,2,3);
int x =  1,2,3;

    x =  1,2,3;
    x = (1,2,3);
    x =f(1,2,3); 
.
.
.
.
.
.
Scroll down for my answers once you've decided...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
int x = (1,2,3);

This declares an int called x and initializes it to the result of the expression (1,2,3). The commas inside this expression are operators. 1 is evaluated and its value (1) discarded, then a comma provides a sequence point, then 2 is evaluated and its value (2) discarded, then a comma provides a sequence point, then 3 is evaluted and its value is the value of the expression (1,2,3). So x is initialized to 3. You'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
int x =  1,2,3;

This is different. If this compiled it would declare an int called x and initialize it to 1 and then declare two more ints called 2 and 3. It has the same structure as int x = 1,y,z; which declares three ints called x, y, and z. The commas are not operators, they are punctuators/separators. You can't declare variables called 2 or 3. It does not compile.
    x =  1,2,3;

In this fragment x is assumed to have already been declared. It is not a declaration. The commas are operators again. Assignment has higher precedence than the comma operator so this binds as (x = 1),2,3;. So 1 is assigned to x, and the result of this assignment expression (1) is discarded, then there is a sequence point, then 2 is evaluated and its value (2) is discarded, then there is a sequence point, then 3 is evaluated and its value (3) is discarded. You'll probaby get warnings saying there are no side-effects in the expressions 2 and 3.
    x = (1,2,3);

Again, x is assumed to have already been declared. It is not a declaration. The commas are operators again. This is the same as the first fragment except it is not a declaration. x is assigned the value of the expression (1,2,3) which is 3. Again you'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
    x =f(1,2,3);

Again, x is assumed to have already been declared. As has a function called f which accepts three int arguments. These commas are not operators. They are punctuators/separators. They separate the three expressions forming the three arguments to f. But they do not introduce any sequence points.

How did you do?

another interesting TDD episode

The classic TDD cycle says that you should start with a test for new functionality and see it fail.
There is real value in not skipping this step; not jumping straight to writing code to try to make it pass.
  • One reason is improving the diagnostic. Without care and attention diagnostics are unlikely to diagnose much.
  • A second reason is to be sure the test is actually running! Suppose for example, you're using JUnit and you forget its @Test annotation? Or the public specifier?
  • A third reason is because sometimes, as we saw last time, you get an unexpected green! Here's another nice example of exactly this which happened to me during a cyber-dojo demo today.
I was doing the fizz-buzz practice in C.
I started by writing my first test, like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    fizz_buzz(actual, sizeof actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("fizz_buzz(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
}

I made this fail by writing the initial code as follows (the (void)n is to momentarily avoid the "n is unused" warning which my makefile promotes to an error using the -Werror option):
void fizz_buzz(char * result, size_t size, int n)
{
    (void)n;
    strncpy(result, "Hello", size);
}

which gave me the diagnostic:
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(3)
expected: "Fizz"
  actual: "Hello"

I made this pass with the following slime
void fizz_buzz(char * result, size_t size, int n)
{
    if (n == 3)
        strncpy(result, "Fizz", size);
}

Next, I returned to the test and added a test for 6:
static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
    assert_fizz_buzz("Fizz", 6);
}

I ran the test, fully expecting it to fail, but it passed!
Can you see the problem?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The problem is in assert_fizz_buzz which starts like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    ...
}

Here's what's happening:
  • assert_fizz_buzz("Fizz", 3) is called
  • char actual[16] is defined
  • fizz_buzz(actual, sizeof actual, 3) is called
  • if (n == 3) is true
  • "Fizz" is strncpy'd into actual
  • fizz_buzz(actual, sizeof actual, 3) returns
  • strcmp says that expected equals actual
  • ...
  • assert_fizz_buzz("Fizz", 6) is called
  • char actual[16] is defined
  • actual exactly overlays its previous location so its first 5 bytes are still 'F','i','z','z','\0'
  • fizz_buzz(actual, sizeof actual, 6) is called
  • if (n == 3) is false
  • fizz_buzz(actual, sizeof actual, 6) returns
  • strcmp says that expected equals actual

My mistake was in the test; actual has automatic storage duration so does not get initialized. It's initial value is indeterminate. The first call to assert_fizz_buzz is accidentally interfering with the second call. Tests should be isolated from each other. I tweaked the test as follows:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16] = { '\0' };
    ...
}

I ran the test again and this time it failed :-)
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(6)
expected: "Fizz"
  actual: ""

I made the test pass:
void fizz_buzz(char * result, size_t size, int n)
{
    if (n % 3 == 0)
        strncpy(result, "Fizz", size);
}

Let's hear it for starting with a test for new functionality and seeing it fail.

an interesting TDD episode

I'm doing the roman-numerals kata in C.
I write a test as follows:
#include "to_roman.hpp"
#include <assert.h>
#include <string.h>

int main(void)
{
    char actual[32] = { '\0' };
    to_roman(actual, 111);
    assert(strcmp("CXI", actual) == 0);
}

I write a do-nothing implementation of to_roman.
I run the tests and I get (I kid you not) this:
...
test: to_roman.tests.c:26: main: Assertion `__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p ("CXI") && __builtin_constant_p (actual) && (__s1_len = __builtin_strlen ("CXI"), __s2_len = __builtin_strlen (actual), (!((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) || __s1_len >= 4) && (!((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) || __s2_len >= 4)) ? __builtin_strcmp ("CXI", actual) : (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) && (__s1_len = __builtin_strlen ("CXI"), __s1_len < 4) ? (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (actual); register int __result = (((const unsigned char *) (const char *) ("CXI"))[0] - __s2[0]); if (__s1_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[2] - __s2[2]); if (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) ("CXI"))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) && (__s2_len = __builtin_strlen (actual), __s2_len < 4) ? (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s1 = (const unsigned char *) (const char *) ("CXI"); register int __result = __s1[0] - ((const unsigned char *) (const char *) (actual))[0]; if (__s2_len > 0 && __result == 0) { __result = (__s1[1] - ((const unsigned char *) (const char *) (actual))[1]); if (__s2_len > 1 && __result == 0) { __result = (__s1[2] - ((const unsigned char *) (const char *) (actual))[2]); if (__s2_len > 2 && __result == 0) __result = (__s1[3] - ((const unsigned char *) (const char *) (actual))[3]); } } __result; }))) : __builtin_strcmp ("CXI", actual)))); }) == 0' failed.
...

So I work towards improving the diagnostic with a custom assert, as follows:
#include "to_roman.h"
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static void assert_roman(const char * expected, int n)
{
    char actual[32] = { '\0' };
    to_roman(actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("to_roman(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

int main(void)
{
    assert_roman("CXI", 111);
}

I run this and my diagnostic is as follows:
test: to_roman.tests.c:16: assert_roman: Assertion `0' failed.
to_roman(111)
expected: "CXI"
  actual: ""
...

Much better :-)
Now I start to implement to_roman
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "CXI");
}

And I'm at green.
I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "C");
    strcat(roman, "X");
    strcat(roman, "I");
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "C" };
    const char * tens[]     = { "X" };
    const char * units[]    = { "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[0]);
    strcat(roman, tens[0]);
    strcat(roman, units[0]);
}

Remembering that in my test, n is one-hundred-and-eleven, I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[1]);
    strcat(roman, tens[1]);
    strcat(roman, units[1]);
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green. Now I add a new test:
int main(void)
{
    assert_roman("CXI", 111);
    assert_roman("CCXXII", 222);
}

I run it and am amazed to see it pass.
It takes me a little while to figure out what is going on.
I'll take it line by line.
When n == 222 this line:
    strcat(roman, hundreds[n / 100]);

is this
    strcat(roman, hundreds[2]);

and hundreds[2] is an out-of-bounds index. However, hundreds[2] just happens to evaluate to the same as tens[0] which is the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 100;
    strcat(roman, tens[n / 10]);

which is this:
    strcat(roman, tens[2]);

And tens[2] is also an out-of-bounds index. And tens[2] just happens to evaluate to the same as units[0] which is also the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 10;
    strcat(roman, units[n]);

which is this:
    strcat(roman, units[2]);

And yet again units[2] is an out-of-bounds index. This time units[2] just happens to evaluate to "CCXXII" from the test! So after this roman is "CCXXII" and the test passes! Amazing!

I edit the code to this:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "CC" };
    const char * tens[]     = { "", "X", "XX" };
    const char * units[]    = { "", "I", "II" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green.

So now I'm wondering if there are any lessons I can learn from this episode. It was not a good idea to run the tests (to try and get an initial red) when doing so would knowingly cause the (unfinished) program to exhihibit undefined behaviour. In cyber-dojo terms an amber traffic-light is not the same as a red traffic-light. After adding the second test I should have edited to_roman as follows:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "" };
    const char * tens[]     = { "", "X", "" };
    const char * units[]    = { "", "I", "" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

Then I would have got a proper red:
test: to_roman.tests.c:17: assert_roman: Assertion `0' failed.
to_roman(222)
expected: "CCXXII"
  actual: ""
...


lessons from testing

I have run hundreds of test-driven coding dojos using cyber-dojo.
I see the same test anti-patterns time after time after time.
Do some of your tests exhibit the same same anti-patterns?



are you missing a TDD step?

Here's a TDD state diagram.
  • start by writing a test for new functionality
  • see it fail
  • make it pass
  • refactor
  • round and round you go
It looks a bit like an animal. Let's give it some eyes!



But there's something missing!
There's no red-to-red self-transition.
My animal has no left ear!
I'll give it a left ear.




What is this left ear? It's for changes made at red. I see the test fail. I read the diagnostic.
Then I stay at red and improve the diagnostic.
When I'm happy with diagnostic I get rid of it by making the test pass.



This was part of my lessons from testing presentation which reviews common test anti-patterns I see on cyber-dojo.

Note: I'm being careful not to call this red-to-red transition a refactoring since refactoring is for changes made at green.


pro git

Is an excellent book by Scott Chacon (isbn 978-1-4302-1833-3). As usual I'm going to quote from a few pages:
Git as a content-addressable filesystem is a very powerful tool that you can easily use as more than just a VCS.
In a DVCS, clients don't just check out the latest snapshot of the files: they fully mirror the repository... Every checkout is really a full backup of all the data.
Conceptually, most other systems store information as a list of file-based changes. These systems think of the information they keep as a set of files and the changes made to each file over time... Git doesn't think of or store its data in this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem... This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.
It is important to note that the fetch command pulls the data to your local repository - it doesn't automatically merge it with any of your work or modify what you're currently working on. You have to merge it manually into your work when you're ready.
Running git pull generally fetches data from the server you originally cloned from and automatically tries to merge it into the code you're currently working on.
The way Git branches is incredibly lightweight, making branching operations nearly instantaneous and switching back and forth between branches generally just as fast. Unlike make other VCSs, Git encourages a workflow that branches and merges often, even multiple times a day.
A branch in Git is simply a lightweight moveable pointer to one of these commits.
To switch to an existing branch, you run the git checkout command.
Git determines the best common ancestor to use for its merge base; this is different than CVS or Subversion (before version 1.5), where the developer doing the merge has to figure out the best merge base for themselves.
In Git it's common to create, work on, merge, and delete branches several times a day.
In Git there are two main ways to integrate changes from one branch into another; the merge and the rebase.
At the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
If someone at any point in the history of your project added a single huge file, every clone for all time will be forced to download that large file, even if it was removed from the project in the very next commit. Because it's reachable from the history it will always be there.


management 3.0

is an excellent book by Jurgen Appelo, subtitled Leading agile developers, Developing agile leaders (isbn 978-0-321-71247-9). As usual I'm going to quote from a few pages:
The hierarchy is needed for authorization; the network is needed for communication.
Big species consume more and breed slower.
The Red Queen's Race is an evolutionary hypothesis describing that a complex system needs continuous improvement to simply maintain its current fitness, relative to the systems it is co-evolving with. Some scientists claim that the Red Queen's Race, or the principle of co-evolving species, is an even more important driver of evolution that any other kind of environmental change.
We can consider the internal structure of each system to be a code for the environment and the other species that it is evolving with.
There is no accurate (or rather, perfect) representation of a system which is simpler than the system itself.
We can figure out why the human heart fails (reductionism) but we can never create a heart that won't fail (constructionism).
Managers must learn that they are "in charge" but not "in control".
Recent research has shown that the copying of ideas is the most successful of all strategies.
Uncertainty results in a bias towards self-interest.
Feedback is only feedback when there is a purpose behind it.
Research shows that self-discipline is twice as important as IQ for final grades of students. Effort matters more than talent.
Focus on delivering value.
We need continuous business improvement.


scrum

is an excellent book by Jeff Sutherland, subtitled A revolutionary approach to building teams, beating deadlines, and boosting productivity (isbn 978-847-94108-4). As usual I'm going to quote from a few pages:
The most powerful part of Scrum from his [Jeff Johnson] point of view? Demos.
Scrum is not about the developers. It's about the customers and stakeholders.
I learned a lot about systems theory and how a system only has certain stable states. As a cell evolves, it moves from one stable state to another. Figuring out the rules to move a complex adaptive system from one state to another, and how to make the next state a positive one rather than a negative one, was something I spent nearly a decade on. To change a cell, you first inject energy into the system. At first there's chaos, there seem to be no rules, everything is in flux...
"How many gantt charts have you seen in your career?" I asked.
"Hundreds," he replied.
"How many of them were right?"
He paused. "None."
In business we all too often focus solely on individuals, even if production is a team effort.
It's the system that surrounds us, rather than any intrinsic quality, that accounts for the vast majority of our behaviour.
Every three weeks each team had to demonstrate to their colleagues what it was working on. This was an open demonstration; anyone could come. And if that demo wasn't both working and cool, [MediaLab] directors killed the project.
"Sprints." We called them that because the name evokes a quality of intensity.
Nothing gets moved to Done unless it can be used by the customer.
After engaging for a while in Sprints and Stand-ups, you stop seeing time as a linear arrow into the future but, rather, as something that is fundamentally cyclical.
People think in narratives, in stories.
The Product Owner has to be available to the team, to explain what needs to be done and why. While the Product Owner is ultimately accountable for the Backlog, there needs to be a constant dialogue with the team.
Orientation isn't just a state you're in; it's a process. You're always orienting... [OODA]