ORIGINALLY POSTED ON IXA.IO
We use Redmine at Infoxchange to manage our product backlog using the Agile plugin from Redmine CRM. While this is generally good enough for making sure we don’t lose any cards and has features like burndown charting you still have to keep your own metrics. This is sort of annoying because you’re sure the data is in there somehow and what do you do when you want to explore a new metric?
This is where iPython Notebooks and the data analysis framework Pandas come in. iPython Notebooks are an interactive web-based workspace where you can intermix documentation and Python code, with graphs and tables you produce output directly into the document. Individual “cells” of the notebook are cached so you can work on part of the program, and experiment (great for data analysis) without having to run big slow number crunching or data download steps.
Pandas is a library for loading and manipulating data. It is based on the well-known numpy and scipy scientific packages and extends them to be able to load data from almost any file type or source (i.e. a Redmine CSV straight from your Redmine server) without having to know much programming. Your data becomes intuitively exposed. It also has built-in plotting and great integration with iPython Notebooks.<!–more–>The first metric I wanted to collect was to study the relationship between our story estimates (in points), our tasking estimates (in hours) and reality. First let’s plot our velocity. It is a matter of saving a custom query whose CSV export URL I can copy into my notebook. This means I can run the notebook at any time to get the latest data.
Remember that iPython notebooks cache the results of cells so I can place accessing Redmine into its own cell and I won’t have to constantly be downloading the data. I can even work offline if I want.
import pandas
data = pandas.read_csv('https://redmine/projects/devops/issues.csv?query_id=111&key={key}'.format(key=key))
Pandas exposes the result as a data frame, which is something we can manipulate in meaningful ways. For instance, we can extract all of the values of a column withdata['column name']
. We can also select all of the rows that match a certain condition:
data[data['Target version'] != 'Backlog']
We can go a step further and group this data by the sprint it belongs to:
data[data['Target version'] != 'Backlog']\
.groupby('Target version')
We can then even create a new series of data by summing the points column each group to find our velocity (as_index means we wish to make the sprint version the row identifier for each row, this will be useful when plotting the data):
data[data['Target version'] != 'Backlog']\
.groupby('Target version', as_index=True)\
['Story Points'].sum()
If you’ve entered each of these into an iPython Notebook and executed them you’ll notice that the table appeared underneath your code block.
We can use this data series to do calculations. For example to find the moving average of our sprint velocity we can do this:
velocity = data[data['Target version'] != 'Backlog']\
.groupby('Target version', as_index=True)\
['Story Points'].sum()
avg_velocity = pandas.rolling_mean(velocity, window=5, min_periods=1)
We can then plot our series together (note the passing of ax to the second plot, this allows us to share the graphs on one plot):
ax = velocity.plot(kind='bar', color='steelblue', label="Velocity",
legend=True, title="Velocity", figsize=(12,8))
avg_velocity.plot(ax=ax, color='r', style='.-', label="Average velocity (window=5)",
legend=True)
ax.xaxis.grid(False)
You can view the full notebook for this demo online. If you want to learn more about Pandas read 10 minutes to Pandas. Of course what makes iPython Notebooks so much more powerful than Excel, SPSS and R is our ability to use any 3rd party Python package we like. Another time I’ll show how we can use a 3rd party Python-Redmine API to load dataframes to extract metrics from the individual issues’ journals.