M4 - Data Classification

 


Comparison of Classification Methods for Census Data (Normalized)

This laboratory assignment has done a phenomenal job in showing me the true power of ArcGIS Pro and how incredibly its capabilities are. This module emphasized the importance of understanding the various classification methods, and what they mean for the data and the way it is presented. I found it to be in perfect tandem with this week's laboratory assignment, as we were able to experiment with the classification methods in ArcGIS Pro and determine which work best for displaying the data relevant to the exercise.

Classification methods and pros/cons of each:
Equal Interval
This classification method operates by dividing the total number of values into the designated number of classes. For example, if one is to desire five classes and their data has 500 values, the equal interval method will divide the attributes 5 times, 0-100, 101-200, 201-300, 301-400 and 401-500. This method excels in showing data in relation to other data (for example, if something is in the top fourth of its counterparts). The Equal Interval method fails to show data that is unequally distributed. Thus, it may group together values that are not close in range while separating values that are close in range, because of its equal grouping nature.
Quantile
This method essentially divides up the number of attributes evenly among all classes, so the classes will all have the same number of features. This method reveals patterns often found in linear data or data that has large gaps or outliers. This results in no classes being left empty. However, it does tend to conceal frequent data by incorporating outliers and may group data that is close in range in different groups due to assigning the same amount of features to each class.
Standard Deviation
The Standard Deviation method works by calculating the mean and standard deviation of the data set, and then breaking the data up into classes of different variations of the standard deviations (i.e. ½ s.d., 1 s.d., 2 s.d., and so on). This method reveals how data is distributed by showcasing how values vary from the mean. However, it fails to show the actual values by simply being based upon the standard deviation, making it difficult to assess concrete data figures when analyzing the data.
Natural Break
The Natural Break method works by utilizing the differences already present in the dataset. It essentially breaks up data into groups by separating the values that have the largest differences. This method reveals “natural” differences in the data and is user-friendly as the algorithm runs its course to define the designated number of classes. However, this method can be difficult to use by hand and cannot be used with solely a calculator.

Overall, I believe that the quantile method is superior to the others in displaying the data for the audience to target the senior population. This is because this method breaks the data into a designated number of classes (in this case, five classes), with an equal amount of features in each class. This proved to be effective in highlighting the senior population, as it was often the least populated of all the subgroups of individuals in a census tract. One may clearly identify the census tracts with high percentages of seniors with the quantile method, as they are most frequent when using this classification method (because of the reasons previously outlined).
The population count normalized by area would be the data presentation I would utilize. This is because the normalization of the data accounts for the size of the census tracts as opposed to solely the number of residents in it. This is important when making spatial analyses, because there is often more to a situation that may factor into a decision than what first meets the eye. In other terms, if one was to solely assess the percentage of residents over 65, this would realistically only show half of the picture. To also assess the size of the census tract is important, because many tracts are significantly larger or smaller than others. These differences could (and should) equate to significant considerations in decision-making. Solely assessing the percentage presentation fails to account for the factors that normalization accounts for.