This article was originally published to a blog which has been decommissioned.

When building a histogram, how many bins should it have?  There’s no definitive answer to the question, but the number of bins used for a histogram can drastically impact how visualized data is interpreted.  Details such as data distribution patterns and sample quantity can impact how data is displayed in bins, and quite often the number of bins is arbitrarily chosen.

In a Business Intelligence solution, determining the number of bins is more difficult than with static reports since the volume and distribution of data can change with updates.  This article intends to share one method for building bins that will automatically compensate for updates in total data quantity and range.

Numerous methods exist for determining the number of bins in a histogram.  For this example, I’ve chosen Sturges’ formula.  I wish that I had a mathematical basis for choosing it, but unfortunately I do not.  I chose Sturges’ formula because it works and it can easily be coded into a BI solution.  The equation is: k = [log2N + 1] where represents the number of bins.  The equation works well with almost any quantity of data in a typical BI solution.  A count of 100 will have 7 bins and a count of 1 billion will have 30 bins.  Either of those bin quantities is visually consumable on a histogram.

Continue reading original article by clicking here!

Posted in

Leave a Reply

Discover more from Greg Beaumont's Data & Analytics Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading