less code, more software: 02/01/2012

The Vizzini school of bad management

Am I going mad or did the word "think" escape your lips?
Hurry up
Inconceivable
Faster!
You know what a hurry we're in
I'm waiting!
Catch up with us quickly
I do not accept excuses
Did I make it clear that your job is on the line?
There will be no one to hear you scream
Stop doing that. We can relax, it's almost over

renaming a sub-directory with a space in it in git

Suppose you have a git sub-directory with a space in it...

/outer/old name/

It appears you cannot rename the sub-directory with a space in it to a different name which also has a space in it. For example, the following does not work...

git mv -f "outer/old name/" "outer/new name/"

However, after much trial and error, I have found the following which does work...

git mv -f "outer/old name/" "outer/temp-name/"
git commit -m "rename subfolder with space part 1/2"
git mv -f "outer/temp-name/" "outer/new name/"
git commit -m "rename subfolder with space part 2/2"

Hope this proves useful to someone.

my kanban 1's board game

Here's a slide deck explaining the essence of my kanban 1s board game. Jon Jaggers Kanban 1s Board Game

You can play an early session with no clips so the players can see how inventory builds up (you can also push done story-cards to the next edge's corner, rather than waiting for them to be pulled).
The clips that hold the story-cards are a crucial part of the game. They make it a kanban game.
You can limit the number of clips per edge to create a natural work-in-progress (wip) limit.
You can add a new rule: players can also spend a 1 to split a story-card in two, eg a 4 into a 3 and a 1 (assuming they have a spare clip).
You can record the day a story-card comes off the backlog, and also the day it gets to done and thus measure the cycle time.
You can simulate scrum-style discrete sprints.
You can vary the number of dice at different edges.

Isolating legacy C code from external dependencies

Code naturally resists being isolated if it isn't designed to be isolatable. Isolating legacy code from external dependencies can be awkward. In C and C++ the transitive nature of #includes is the most obvious and direct reflection of the high-coupling such code exhibits. However, there is a technique you can use to isolate a source file by cutting all it's #includes. It relies on a little known third way of writing a #include. From the C standard:

6.10.2 Source file inclusion
...
A preprocessing directive of the form:
  #include pp-tokens 
(that does not match one of the two previous forms) is permitted. The preprocessing tokens after include in the directive are processed just as in normal text. ... The directive resulting after all replacements shall match one of the two previous forms.

An example. Suppose you have a legacy C source file that you want to write some unit tests for. For example:

/*  legacy.c  */
#include "wibble.h"
#include <stdio.h>
...
int legacy(int a, int b)
{
    FILE * stream = fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a * b);
    fwrite(buffer, 1, sizeof buffer, stream);
    fclose(stream);
    return result;
}

Your first step is to create a file called nothing.h as follows:

/* nothing! */

nothing.h is a file containing nothing and is an example of the Null Object Pattern. Then you refactor legacy.c to this:

/* legacy.c */
#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#include LOCAL(wibble.h)  /* <--- */
#include SYSTEM(stdio.h)  /* <--- */
...
int legacy(int a, int b)
{
    FILE * stream = fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a*b);
    fwrite(buffer, 1, sizeof buffer, stream);
    fclose(stream);
    return result;
}

Now structure your unit-tests for legacy.c as follows:
First you write null implementations of the external dependencies you want to fake (more Null Object Pattern):

/* legacy.test.c: Part 1 */

static FILE * fopen(const char * restrict filename, 
                    const char * restrict mode)
{
    return 0;
}

static size_t fwrite(const void * restrict ptr,   
                     size_t size, 
                     size_t nelem, 
                     FILE * restrict stream)
{
    return 0;
}

static int fclose(FILE * stream)
{
    return 0;
}

Then #include the source file. Note carefully that you're #including legacy.c here and not legacy.h and you're #defining UNIT_TEST so that legacy.c will have no #includes of its own:

/* legacy.test.c: Part 2 */
#define UNIT_TEST
#include "legacy.c"

Then write your tests:

/* legacy.test.c: Part 3 */
#include <assert.h>

void first_unit_test_for_legacy(void)
{
    /* writes "2:9:18" which is 6 chars */
    assert(6, legacy(2,9));
}

int main(void)
{
    first_unit_test_for_legacy();
    return 0;
}

When you compile legacy.test.c you will find your first problem - it does not compile! You have cut away all the #includes which cuts away not only the function declarations but also the type definitions, such as FILE which is a type used in the code under test, as well as in the real and the null fopen, fwrite, and fclose functions. What you need to do now is introduce a seam only for the functions:

/* stdio.seam.h */
#ifndef STDIO_SEAM_INCLUDED
#define STDIO_SEAM_INCLUDED

#include <stdio.h>

struct stdio_t
{
    FILE * (*fopen)(const char * restrict filename, 
                    const char * restrict mode);
    size_t (*fwrite)(const void * restrict ptr, 
                     size_t size,  
                     size_t nelem, 
                     FILE * restrict stream);
    int (*fclose)(FILE * stream);
};

extern const struct stdio_t stdio;

#endif

Now you Lean On The Compiler and refactor legacy.c to use stdio.seam.h:

/* legacy.c */   
#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#include LOCAL(wibble.h) 
#include LOCAL(stdio.seam.h)  /* <--- */
...
int legacy(int a, int b)
{
    FILE * stream = stdio.fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a*b);
    stdio.fwrite(buffer, 1, sizeof buffer, stream);
    stdio.fclose(stream);
    return result;
}

Now you can structure your null functions as follows:

/* legacy.test.c: Part 1 */
#include "stdio.seam.h"

static FILE * null_fopen(const char * restrict filename, 
                         const char * restrict mode)
{
    return 0;
}

static size_t null_fwrite(const void * restrict ptr, 
                          size_t size, 
                          size_t nelem, 
                          FILE * restrict stream)
{
    return 0;
}

static int null_fclose(FILE * stream)
{
    return 0;
}

const struct stdio_t stdio =
{
    .fopen  = null_fopen,
    .fwrite = null_fwrite,
    .fclose = null_fclose,
};

And viola, you have a unit test. Now you have your knife in the seam you can push it in a bit further. For example, you can do a little spying:

/* legacy.test.c: Part 1 */
#include "stdio.seam.h"
#include <assert.h>
#include <string.h>

static FILE * null_fopen(const char * restrict filename, 
                         const char * restrict mode)
{
    return 0;
}
    
static size_t spy_fwrite(const void * restrict ptr, 
                         size_t size, 
                         size_t nelem, 
                         FILE * restrict stream)
{
    assert(strmp("2:9:18", ptr) == 0);
    return 0;
}

static int null_fclose(FILE * stream)
{
    return 0;
}

const struct stdio_t stdio =
{
    .fopen  = null_fopen,
    .fwrite =  spy_fwrite,
    .fclose = null_fclose,
};

This approach is pretty brutal, but it might just allow you to create an initial seam which you can then gradually prise open. If nothing else it allows you to create characterisation tests to familiarize yourself with legacy code.

You'll also need to create a trivial implementation of stdio.seam.h that the real code uses:

/* stdio.seam.c */
#include "stdio.seam.h"
#include <stdio.h>

const struct stdio_t stdio =
{
    .fopen  = fopen,
    .fwrite = fwrite,
    .fclose = fclose,
};

The -include compiler option might also prove useful.

-include file
Process file as if #include "file" appeared as the first line of the primary source file.

Using this you can create the following file:

/* include.seam.h */
#ifndef INCLUDE_SEAM
#define INCLUDE_SEAM

#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#endif

and then compile with the -include include.seam.h option.

This allows your legacy.c file to look like this:

#include LOCAL(wibble.h) 
#include LOCAL(stdio.seam.h)
...
int legacy(int a, int b)
{
    FILE * stream = stdio.fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, "%d:%d:%d", a, b, a*b);
    stdio.fwrite(buffer, 1, sizeof buffer, stream);
    stdio.fclose(stream);
    return result;
}

every teardrop is a waterfall

I was listening to Coldplay the other day and got to thinking about waterfalls. The classic waterfall diagram is written something like this:

Analysis
    leading down to...
Design
    leading down to...
Implementation
    leading down to...
Testing.

The Testing phase at the end of the process is perhaps the biggest giveaway that something is very wrong. In waterfall, the testing phase at the end is what's known as a euphemism. Or, more technically, a lie. Testing at the end of waterfall is really Debugging. Debugging at the end of the process is one of the key dynamics that prevents waterfall from working. There are at least two reasons:

The first is that of all the activities performed in software development, debugging is the one that is the least estimable. And that's saying something! You don't know how long it's going to take to find the source of a bug let alone fix it. I recall listening to a speaker at a conference who polled the audience to see who'd spent the most time tracking down a bug (the word bug is another euphemism). It was just like an auction! Someone called out "3 days". Someone else shouted "2 weeks". Up and up it went. The poor "winner" had spent all day, every day, 9am-5pm for 3 months hunting one bug. And it wasn't even a very large audience. This 'debug it into existence' approach is one of the reasons waterfall projects take 90% of the time to get to 90% "done" (done is another euphemism) and then another 90% of the time to get to 100% done.

The second reason is Why do cars have brakes?. In waterfall, even if testing was testing rather than debugging, putting it at the end of the process means you'll have been driving around during analysis, design and implementation with no brakes! You won't be able to stop! And again, this tells you why waterfall projects take 90% of the time to get to 90% done and then another 90% of the time to get to 100% done. Assuming of course that they don't crash.

In Test First Development, the testing really is testing and it really is first. The tests become an executable specification. Specifying is the opposite of debugging. The first 8 letters of specification are S, P, E, C, I, F, I, C.
A test is specific in exactly the same way a debugging session is not.

Coupling, overcrowding, refactoring, and death

I read The Curious Incident of the Dog in the Night Time by Mark Haddon last week. I loved it. At one point the main character, Christopher, talks about this equation:

P_g+1 = α P_g (1 - P_g)

This equation was described in the 1970s by Robert May, George Oster, and Jim Yorke. You can read about it here. The gist is it models a population over time, a population at generation_g+1 being affected by the population at generation _g. If there is no overcrowding then each member of the population at generation g, denoted P_g, produces α offspring, all of whom survive. So the population at generation g+1, denoted P_g+1 equals α P_g. The additional term, (1 - P_g ) represents feedback from overcrowding. Some interesting things happen depending on the value of α

α < 1: The population goes extinct.
1 < α < 3 : The population rises to a value and then stays there.
3 < α < 3.57 : The population alternates between boom and bust.
3.57 < α : The population appears to fluctuate randomly.

In biological systems each generation has a natural lifespan. The cycle of death naturally helps to reduce overcrowding. But even with the inevitable death of each generation, the system's behaviour is delicately poised. Once α gets into the 3 - 3.57 range the growth starts to outpace the death leading to a rising population, but this leads to overcrowding which reduces the growth leading to a falling population. That reduces overcrowding, and growth rises again. The population repeats in a stable up/down cycle. But only because of the inevitable death. And only while the rate of death is sufficiently high to maintain the cycle. Once α gets beyond 3.57 the rate of death can no longer regulate the increased rate of growth and the system destabilises.

You can think about the process of writing software with this equation.

You can think of over-crowding as being analogous to over-coupling. We feel that a codebase is hard to work with, difficult to live in, if it resists our attempts to work with it. When it resists it is the coupling that is resisting.

You can also think of death as being analogous to refactoring. Just as death reduces overcrowding, so refactoring reduces coupling.

Refactoring is a hugely important dynamic in software development. Without refactoring a codebase can grow without check. Growing without check is bad. It leads to overcrowding. Overcrowding hinders growth.

less code, more software

Pages