Module 1.3 - Data Quality: Assessment
Roadway Completeness Comparison |
This week's laboratory assignment concluded this course's focus on data quality, more specifically in data quality comparison and assessment. I particularly enjoyed this module, as I understand how critical data accuracy is and what it could mean in terms of analysis and change making. Minor differences in data quality could mean all the difference for decisions to be made on a particular matter. For this analysis, two sets of road networks were provided (both located in Jackson County, Oregon). Whilst they appear similar from a distance, there are many important differences that were highlighted in this analysis. From a bigger picture point of view, I determined that the TIGER 2000 road network is more complete, as it has a longer total length than the Street Centerlines network. Utilizing the Calculate Geometry tool, I determined the following total road network lengths:
Street Centerlines = 10,805.8 km
TIGER Roads = 11,382.7 km
Analysis Steps
I began my analysis by adding a field to the Street Centerlines and TIGER Roads Reprojected files named ‘Length_KM.’ I used the Calculate Geometry within Attribute tool to calculate the roadway lengths in kilometers for each of these layers. Then I used the Summarize Within tool to determine the total roadway network lengths in kilometers. To narrow my data down to the grid layer only, I ran the Clip tool on both road networks and clipped them to only what exists within the grid layers. After doing this, I utilized the split tool on each road network to separate the roads by gridcode for each network. To join the data into one singular attribute table, I used the Add Join tool. I then added a field to this table that I populated with the percent difference equation to calculate the differences in roadway completeness for each grid. Length is a critical component in measuring completeness, as outlined in Haklay, M. 2010. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. In this article, Haklay conducted a similar analysis by measuring the completeness of OpenStreetMap (OSM) and Meridian roadways, also involving reprojected roadways to match and dividing lengths by 1km grids.
Out of the total 297 grid polygons, I found that 155 of the TIGER roadways grid polygons are more complete, and 142 of the Street Centerline grid polygons are more complete. This stayed true to the initial big picture view, where the TIGER 2000 road network was slightly longer than that of Street Centerlines'. I presume this to be important in terms of selecting a better quality data source for other projects and analyses. One would want to use the most up-to-date, accurate data source available over one that is lacking elements. This data quality assessment was thus important to learn and I will certainly utilize the techniques covered in this analysis when determining what data source I should use in future projects and work assignments.