When Statistics become stories
I am thinking of creating an exercise around this for the AI-ML workshop to be conducted later this month at NID Gandhinagar for New Media Design Students.
At this stage of workshop, we would have covered basic concepts around programming and Jupyter notebooks.
Section One - Introducing Pandas
I have got French population and age distribution from here, and we have cleaned it to following structure:
Out: year males females total age 0 2018 364155 347749 711904 0 1 2017 370453 355472 725925 1 2 2016 378518 363162 741680 2 3 2015 387906 372402 760308 3 4 2014 399232 387042 786274 4
We would start with loading this data and introduce concepts of:
- Reading the data(in this case from
- Exploring the structure of data(
DataFrame), accessing it, using Rows, Columns.
- Try basic operations over the data to answer some questions, like, for which age spectrum, male population is more than females and vice versa.
- We would explore the concept of using
?for getting access to documentation of the method/attribute.
Section Two - Plotting the data
After having played around with the data and different methods we would shift to plotting it and try to see if we can answer questions we had explored in previous section using the plots.
I am thinking of introducing them to plotting Pie Charts, Bar graphs, Lines. Age distribution of country is generally represented in Population Pyramid, here we would try to plot the same Pyramid for French population.
Section three - Exercise for students.
A similar UK age distribution of the population is available here. We would apply things we have learned in above two sections and ask the students to plot Population Pyramid for UK.
Section four - Census and Age distribution of Indian population:
Akash Gutha has a repository and a IPython notebook that:
- Fetches relevant data(excel sheet) from Indian Census site.
- Cleans up the data and assign names to the columns, and related plots.
We would work on top of those steps to:
- cover how Census releases data and an accompanying guide that helps people make sense of it.
- Plot Population Pyramid graph for India.
- Observe the difference between population distribution for India and UK/France.
- Also have an open discussion around the spikes for certain age.
- Share the screenshots from Gurman's presentation that explains the spikes.
At this point we conclude the session around handling data, information visualization. Possibly we will follow it with more hands on exercise for students.