is an excellent book by Gene Kim, Jez Humble, Patrick Debois, and John Willis
(isbn 978-1-942788-00-3).
As usual I'm going to
quote from a few pages.
Make infrastructure easier to rebuild than to repair.
The average age of a Netflix AWS instance is twenty-four days.
Interrupting technology workers is easy, because the consequences are invisible
to almost everyone.
In complex
systems,
adding more inspection steps and approval processes
actually increases the likelihood of future failures.
Over the following year, they eliminated testing
as a separate phase of work, instead integrating it into everyone's daily work. They doubled
the features being delivered per month and halved the number of defects.
Bureaucracies
are incredibly resilient and are designed to survive adverse
conditions - one can remove half the bureaucrats, and the process will
still survive.
When we have a tightly coupled architecture, small
changes
can result in large scale failure.
Our deployment pipeline infrastructure becomes as foundational for
our development processes as our version control infrastructure.
If we find that unit or acceptance
tests are too difficult
and expensive to write and maintain, it's likely that we have an
architecture that is too tightly coupled.
Any successful product or
organization
will necessarily evolve over its life cycle... eBay and Google are each on their fifth entire
rewrite of their architecture from top to bottom.
... which can lead to the unfortunate metric of mean time until declared innocent.
The principle of small batch sizes also applies to code
reviews.
80% of MTTR (mean time to recovery) is spent trying to determine what
changed.
High performing DevOps organizations
will fail and make mistakes more often... If high performers are performing thirty times more frequently but
with only half the change failure rate,
they're obviously having more failures. [Roy Rapoport, Netflix]
Spiders repair rips and tears in the web as they occur, not waiting for
the failures to accumulate. [Dr Steven Spear]