The Ladybird Guide to Business IntelligenceJanuary 26, 2011 11:33 am work
Recently I have found myself frustrated by the lack of a very simple overview for Business Intelligence explaining what problems it solves, and how.
For example, the Pentaho BI platform FAQ has a promising first question: “What is a business intelligence (BI) platform?” The answer is typical of BI overviews I have seen:
A comprehensive development and runtime environment for building complete solutions to business intelligence problems. The Pentaho BI Platform is the infrastructure and core services that integrate business intelligence components to complete the BI Suite. This includes the infrastructure necessary to build, deploy, execute and support applications.
I don’t know about you, but that gives me more questions than answers. What type of problems are business intelligence problems? What are the core services provided by a BI platform? What are BI components, and a complete BI suite? In short, what does it do?
The wikipedia article on business intelligence is a bit better, but still gets into heavy acronyms quite early.
I think I have figured out what Don Norman calls a conceptual model which is about right, so for those who have struggled as I have recently, here is the Ladybird guide to Business Intelligence.
What problems does BI solve?
Let’s say you are the CEO of a company, and you want to track what the costs of the company are, across payroll, purchasing, marketing and sales, overall and by division. You also want to track revenues by division, product line, market and month. For each variable, you’d like to drill down when you see a figure that looks odd. Payroll in Asia increased 20% this year – did we buy a company? Are there savings to be made?
All of this information spans dozens of different computer systems, applications, databases. What you want is one application to rule them all, from which you can get nice graphical clickable data.
Let’s say you’re a free software project manager or community manager. You have lots of infrastructure for people working on the project – source control, mailing lists, forums, translation infrastructure, documentation, bug tracking, downloads, …
You want to know if your community is growing, shrinking or stagnant. You’d like to know if translators are up, and spot when something is up – we lost 3 Thai translators last cycle, does the Thai translation team have a problem? Is there a problem with wiki spam? A correlation between people active on the forums and commits to the project? Some of these questions span different applications, systems, and databases. What you want is one application to rule them all, where you can get a quick overview of what’s happening in the community, and click on something to drill down into the data, or create complex queries to spot correlations and patterns across different apps.
BI software is ideally suited to helping in both of these situations.
How does it work?
Very simply, a BI platform is a web application that allows you to create queries and visualise the results across a variety of data sources. At its simplest, you bring big lumps of data together and extract some useful numbers from it. If you’ve ever used a pivot table in a spreadsheet application, you’ve written a BI query.
Now we get into the acronyms and the jargon. Here’s a quick lexicon of commonly used BI terms:
- Extract/Transform/Load – An ETL module allows you to script and automate the extraction of data from a funky data source (say, CSV files on a server, an auto-generated spreadsheet, or screen-scraping data from a HTTP query, or just an SQL database), and transform it into some other format (typically basic transformations like joins, mapping inputs to database fields, or applying simple arithmetic to convert to an agreed unit), and then store the result in a database.
- Online Analytical Processing – a fancy name for “queries”. There is a de facto standard query format called MDX and the database needs to be optimised for “multidimensional queries” (aka joins – like pivot tables in a spreadsheet).
- Data Warehouse
- A fancy name for database.
- The presentation of the results of queries in a graphical way.
In brief, then, a BI suite provides you with a way to suck in data from a variety of sources, store the data (if you need to) in a custom database which is optimised for querying across different data sources, a nice way to define the queries in which you are interested, and then present the results of those queries in a nice graphical way.
If you don’t need to do any transformation of data, and you can operate directly on SQL databases, then you can typically provide the BI platform access to them directly. If you have any unusual data sources, or want to transform data, you will need an ETL module. If you are dealing with a lot of data and want to optimise query time, you might need a specific OLAP server. A query editor will help you create queries to get the information you want out of your data. You will need a reporting module to convert query results from raw tabular form to pie charts, bar charts and the like. And the BI server provides hooks for all of these various modules to work together, sucking in, storing, manipulating and presenting data in interesting ways.
Is this all right?
I would love to know if my mental model is flawed – so if I’m missing anything important, or I’ve said something which is a pile of rubbish, please do add a comment and let me know.
I know how hard it can be to cut through the jargon in an area where it’s ubiquitous and the first step in enterprise software is usually the hardest, so hopefully this will be useful to someone other than myself.