<img alt="" src="https://secure.food9wave.com/219087.png" style="display:none;">

Table Of Content

You can only ask “where are we?” and “where are we going?” when you know the answer to “where have we come from?”

In the same way, understanding the history of a software project can help us to understand its trajectory and structure. 

Source code management (SCM) analysis can show patterns that can indicate issues, and can help in understanding how to create a better product in the future.

Mining the BindHQ (BindHQ) SCM History

I joined BindHQ (now BindHQ) in Q3 2020, initially as a contractor. Before my official start date, I wanted to get as much of a background on the products as possible.

In addition to reading docs, reviewing tickets, running the source code and looking at the finished product, I ran a mixture of off-the-shelf and handcrafted tools to perform some broad analysis of the history of the main projects that comprise the BindHQ product:

  1. Hand-crafted scripts
  2. Github’s analytics
  3. Hercules and Labours https://github.com/src-d/hercules 
    1. I used this to generate an “overwrite matrix” to see which users were overwriting each other’s code
  4. Githammer https://github.com/asharov/git-hammer 
  5. Git of Theseus https://github.com/erikbern/git-of-theseus 

It’s fascinating to see authors contributing to projects over time and compare it with the test authorship:

Insurtech 1

Insurtech Picture 2

We could see here, for example, that one person contributed a vast amount of tests over time.

One of the most interesting plots was the overwrite plot which shows who overwrites whose code:

Insurtech 3

Another fun metric is the percentage of lines still present in code after n years, which is a feature that Git of Theseus offers:

Insurtech 2

It can be very interesting to compare these charts for different components of the application!

Seeing code removed from our accounting package was interesting:

gl-policy-account@2xWas mining our SCM History useful?

In isolation, it was interesting but there weren’t any earth-shattering insights upfront, except that some pieces of the project seemed to be held up by a single person (which is a business risk we have subsequently addressed).

The deeper value was as a conversation starter; employees were enthusiastic about talking about how the projects came to be, and it was really helpful to have conversations like “what led to that huge amount of code being removed from the accounting package in mid-2019?”

As an outsider coming in, this helped me to establish when features were added or removed, and under what circumstances.

What was the output?

Having used the data from the history as a jumping-off point for conversations, I delivered a development timeline of one of BindHQ’s products.

That was, in my view, a fairly valuable document because it helped me understand where we came from, so we could move forward.

Wrapping up

A rich and sophisticated understanding of the world benefits from philosophical, factual, scientific theological, literary, physical, and historical insight.

Likewise, one component of understanding a software project is getting to grips with its history, for which there are some useful tools that can help to contextualize the current state of software systems and are therefore a useful tool for understanding projects.

SCM analysis is only one vector of source code analytics - it is never the “full story”, but sometimes it shows useful patterns, and is a great conversation starter.

Quote faster and
win more business with BindHQ