This map is an American snapshot; it provides an accessible visualization of geographic distribution, population density, and racial diversity of the American people in every neighborhood in the entire country. The map displays 308,745,538 dots, one for each person residing in the United States at the location they were counted during the 2010 Census. Each dot is color-coded by the individual’s race and ethnicity. The map is presented in both black and white and full color versions. In the color version, each dot is color-coded by race.
All of the data displayed on the map are from the U.S. Census Bureau’s 2010 Summary File 1 dataset made publicly available through the National Historical Geographic Information System. The data is based on the “census block,” the smallest area of geography for which data is collected (roughly equivalent to a city block in an urban area).
The map was created by Dustin Cable, a former demographic researcher at the University of Virginia’s Weldon Cooper Center for Public Service. Brandon Martin-Anderson from the MIT Media Lab deserves credit for the original inspiration for the project. This map builds on his work by adding the Census Bureau’s racial data, and by correcting for mapping errors.
Each of the 308 million dots are smaller than a pixel on your computer screen at most zoom levels. Therefore, the “smudges” you see at the national and regional levels are actually aggregations of many individual dots. The dots themselves are only resolvable at the city and neighborhood zoom levels.
Each dot on the map is also color-coded by race and ethnicity. Whites are coded as blue; African-Americans, green; Asians, red; Hispanics, orange; and all other racial categories are coded as brown.
Shades of Purple, Teal, and Other Colors
Since dots are smaller than one pixel at most zoom levels, colors are assigned to a pixel depending on the number of colored dots within that pixel. For example, if a pixel contains a number of White (blue dots) and Asian (red dots) residents, the pixel will be colored a particular shade of purple according to the proportion of each within that pixel.
Different shades of purple, teal, and other colors can therefore be a measure of racial integration in a particular area. However, a place that may seem racially integrated at wider zoom levels may obscure racial segregation at the city or neighborhood level.
Take the Minneapolis-St. Paul metro area as an example:
While Minneapolis and St. Paul may appear purple and racially integrated when zoomed out at the state level, a closer look reveals a greater degree of segregation between different neighborhoods in both cities. While some areas remain relatively integrated, there are clear delineations between Asian, black, and white neighborhoods.
Lightly Populated Areas
Toggling between color-coded and non-color-coded map views in lightly populated areas provides more contrast to see differences in population density. Take North and South Dakota as illustrative examples:
In the black and white version, it is easier to see the smaller towns and low-density areas than in the color-coded version. Different monitor settings and configurations may make it harder or easier to see color variations in lightly populated areas, but the non-color-coded map should always show differences in population density fairly well.
Dots Located in Parks, Cemeteries, and Lakes
The locations of the dots do not represent actual addresses. The most detailed geographic identifier in Census Bureau data is the census block. Individual dots are randomly located within a particular census block to match aggregate population totals for that block. As a result, dots in some census blocks may be located in the middle of parks, cemeteries, lakes, or other clearly non-residential areas within that census block. No greater geographic resolution for the 2010 Census data is publicly available (and for good reason).
A more accurate portrayal of the geographic distribution of residents is possible if data is available on the location of parks, buildings, and/or physical addresses. Individual dots could therefore be conditionally placed based on this data.
The following is an example of using additional data to improve the dot density map for the City of Charlottesville, Virginia:
No Extra Data
Using Additional Address and Park Data
By conditioning the location of dots based on physical address and excluding locations with parks or commercial property, the dot map for Charlottesville becomes a more accurate portrayal of the population distribution of the city. However, the City of Charlottesville is unusual in that this data is made publicly available. There are no nationwide datasets for all parks or physical addresses. As a result, the national-level Racial Dot Map does not make these adjustments.
All of the data displayed on the map are from the 2010 Summary File 1 (SF1) tables from the U.S. Census Bureau. Table P5, “Hispanic or Latino Origin by Race,” was merged with block-level state shapefiles from the National Historical Geographic Information System. Five racial categories were created based on the data in table P5: non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, Hispanic or Latino, and a category for all other racial categories including the multiracial identifications. The sum of all five categories equals the total population.
Python was used to read the 50 state and District of Columbia shapefiles (with the merged SF1 data). The GDAL and Shapely libraries were used to read the data and create the point objects. The code retrieves the population data for each census block, creates the appropriate number of geographic points randomly distributed within each census block, and outputs the point information to a database file. The resulting file has x-y coordinates for each point, a quadkey reference to the Google Maps tile system, and a categorical variable for race. The final database file has 308,745,538 observations and is about 21 GB in size. The processing time was about five hours for the entire nation.
The database file was then sorted by quadkey and converted to a .csv format. SAS was able to do this within an hour without crashing.
Processing 2.0.1 for 64-bit Windows was used to create the map tiles. The Java code reads each point from the .csv file and plots a dot on a 512×512 .png map tile using the quadkey reference and x-y coordinates. The racial categorical variable is used to color-code each plotted dot. This process used the default JAVA2D renderer, but other platforms may work better using P2D. Map tiles were created for Google Maps’ zoom levels 4 through 13 to make the final map. A non-color-coded map was also produced to help add more contrast for lightly populated areas. In total, the color-coded and non-color-coded maps contain 1.2 million .png files totaling about 7 GB. Producing all of the map tiles in Processing took about 16 hours for the two maps.
The Google Maps API is used to display the map tiles. Map tiles with zero population are never created using the above method. Therefore, an index was used to tell the map application whether a tile exists in order to prevent 404 errors.
The entire code is up on GitHub and was adapted from code developed by Brandon Martin-Anderson and Peter Richardson in order to account for the racial coding and errors in reading the shapefiles.