8.6 Assignment - Lecture 8
In this assignment you will experiment with data preparation and plotting in foresight of simple statistical analysis.
8.6.1 Part A - Individual
In this part, you will graph statistics about disabilities in a given country. The dataset is unprepared and needs to be manually entered and formatted in a spreadsheet. Once entered, you will import the dataset in Matlab and perform a series of plotting commands to better visualize the data.
1. Download the following PDF, taken from the
United Nations Statistics Division
. This file includes data on number of persons with disabilities, divided into age groups and gender. Note that age groups vary between countries. All countries have a table that summarizes the total number of disabilities. Sometimes, additional tables are presented for rural versus urban regions. You will ONLY use the table for total number of disabilities, and disregard the additional ones. Each table for both sexe, number of male, and number of female persons contains data on total population in a given age group, total number of persons with disabilities in that age group, and number of persons with disabilities per one thousand. For some countries one or the other column may be missing - this will not impact the assignment.
2. Find the country you were assigned to use for the purpose of this assignment. Enter the table into an Excel spreadsheet. Be sure to include column headers explaining what each column is for. Also include row headers for the different age groups. If there are any overlaps in age groups, disregard the ones with the longest range. For example, given rows for age groups 0-5, 6-10, 11-15, and an additional one for 0-15, disregard the row for 0-15. You will have to enter the data for all three tables - both sexes, male, and female. However, for each of the three tables, disregard one of the three columns, because there exists some redundancy in the data. In other words, columns are titled "Total", "Number of disabled persons", and "Number per one thousand" - you need only two of these columns, and can compute the third. If you have all 3 columns available in your dataset, simply disregard one column, and only enter two per table. For those countries that have one missing column to begin with, enter the available data. When a field in a table is not filled, and instead appears with dots (...), enter a zero. In the end, your Excel spreadsheet should include 7 columns, one for row headers, and two for each of three tables. Save the spreadsheet as a CSV (Comma Separated Value) file (NOT an Excel spreadsheet).
3. Import the dataset in Matlab. You should end up with one vector for row headers, one vector for column headers, and one numerical matrix for values.
4. Using Matlab, compute the missing columns. Use whatever methods necessary to perform this step, but you must end up with a full matrix of 9 columns, which include the data from all three original tables. Be sure to modify the column header vector to reflect the change in columns.
5. Using a 3-row, 2-column subplot, plot the following 6 graphs. I would suggest first doing this without subplot, and simply plotting each graph in the same window. When you are confident about your final answers, you can add the subplot command to each plot. For all plots, be sure to include proper titles, x-labels, y-labels. Include legends only when specified. Graphs:
- A bar graph, where each vertical bar represents an age group (in ascending order), and the y-axis represents absolute number of persons with disabilities in that age group.
- A bar graph, where each vertical bar represents an age group (in ascending order), and the y-axis represents the number of persons with disabilities in that age group per one thousand. For this plot, you must set the axes to explicitely show the y-axis in range 0..1000. Use the
command for this purpose.
- A bar graph, in which every age group is represented by two vertical bars, one for male and one for female persons with disabilities. The y-scale is again measured in terms of persons per one thousand. The y-axis has to be set for a range from 0..1000. Include a legend to distinguish the differently-colored bars.
- In this plot, you will be visually comparing two datasets. Select another country of your choice, preferrably one with comparable age-groups. Enter this country's data column for "Disabled persons, both sexes - number per one thousand" in Matlab (not Excel). Be sure to create a vector that properly maps the numerical values to age groups from the first dataset. For overlapping ages, use good judgement making. Using the 'plot' command (not bar graphs), plot the first and second data sets in one plot figure. The two lines should be colored differently. Include a legend in this plot. Do not need to scale the axes.
- A plot that includes all absolute values for every age group, i.e. the total population, the total male, and female populations, the total population of persons with disabilities, and the total population of persons with disabilities for male and female. There should be 6 plot lines in your graph. Give them all different colors, and be sure to include a legend.
- A pie chart that displays the percentage of the population of male and female persons with disabilities. The pie chart should have 3 pieces. Include a legend. You should derive all necessary instruction on how to create a pie chart by using
8.6.2 Part B - Group Component
Part of your final project for this course, you (your team) will be required to perform some statistical analysis related to your Community Service Learning project. You are free to choose any aspect, but make sure that this aspect conveys some meaningful information.
For example, if your project deals with designing a playground for disabled children, it may be helpful to know some of the following:
- What percentage of the population between 8 and 14 years of age has a disability?
- How has that figure increased over the past few years?
- What cities/regions/states are offering playgrounds specifically designed for children with disabilities?
- How have children with disabilities benefitted from such playground? I.e. are there perhaps statistical data on the improvement of integration of disabled children?
While this is just an example, your team should figure out what kind of analysis is realistic and possible. For the group portion of this assignment, you and your team members should find a problem and the necessary data that you would like to address for your final presentation. One significant constraint is obviously finding appropriate data for your specific problem. You will have to dig deep into the Internet, perhaps ask your clients, or call up agencies that may have this kind of information. You can use the on-line resources mentioned in this lecture.
8.6.3 Part C - Individual Component
After your team has gathered the necessary data, each team member should individually load the data into Matlab, either using the import wizzard or some command line function. Then, pick several (2-3) aspects of the data and plot them. You can use the examples in Part A for some guidance. Make sure that your plots contains proper labels and titles. Place all plots in a subplot grid of appropriate size.
For example: Your team has downloaded a data set that evaluates the number of disabled children for different ages, different time periods, different regions, different household income levels, etc. You are now required to import this text database into Matlab. Pick several data pairs, e.g. time period versus number of disabled children, and plot this data using Matlab.