Member-only story

3 Best (Often Better) Alternatives To Histograms

Avoid the most dangerous pitfall of histograms

Bex T.
Towards AI
10 min readSep 12, 2023

Image by me with Leonardo AI

Binning Bias, The Biggest Flaw of Histograms

Histograms are probably the first plot you might have used as you embark on your journey as a data scientist. They are intuitive and easy plots to understand the shapes of distributions.

However, as you progress through your journey, you will find that histograms are not so peachy. Histograms group the values into intervals called bins, and the height of each bin in a histogram tells the number of points in that bin. Consider this example:

Image by author

From this histogram, we can immediately see that most scores are between 60 and 80. Let’s see what happens if we change the number of bins from 10 to 20:

Image by author

Still, the previous trend is apparent. Let’s keep changing, this time from 20 to 40:

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Bex T.

BEXGBoost | DataCamp Instructor |🥇Top 10 AI/ML Writer on Medium | Kaggle Master | https://www.linkedin.com/in/bextuychiev/

Responses (16)

Write a response

So in summary:
- PMF: discrete data only (any distribution)
- CDF: any data (any distribution)
- KDE: continuous data (any distribution)
Correct?

Pure gold. Thank you for a good article in a common but unexplored problem

Thank you for this article. I like that you emphasize the importance of looking at the data before summarising it. However, I do not agree to all of your points:
- A well-crafted histogram is purely descriptive. It does not parametrise the data in…