ArcGIS REST API

Data apportionment

The GeoEnrichment service employs a sophisticated geographic retrieval methodology to aggregate data for rings and other polygons. A geographic retrieval methodology determines how data is gathered and summarized or aggregated for input features. For standard geographic units, such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. For example, if an input study trade area contains a selection of ZIP Codes, the data retrieval is a simple process of gathering the data for those areas.

How data is summarized

The geographic retrieval process for ring buffers, drive-time service areas, and other non-standard geography polygons is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated. The following diagram illustrates this case. The polygon in the center represents an input study area that is being enriched. For example, the GeoEnrichment service can calculate the total population for this area. The polygons labeled represent Census geographies that contain total population values. In the United States, these can be Block Groups with enrichment data; in Canada, they can be Enumeration Areas.

polygon

The GeoEnrichment service employs a Weighted Centroid geographic retrieval methodology to aggregate data for rings and other polygons. The Weighted Centroid retrieval approach uses Census Block data to better apportion block groups that are not exclusively contained within a ring. In the United States, Canada, and many other countries, Census blocks are the smallest unit of Census geography. These small areas are used to create all other levels of Census geography. For example, in the United States, one or many blocks are aggregated to create a Block Group.

Note:

The GeoEnrichment service uses a dasymetric apportionment technique to aggregate Census based demographic data for smaller areas. If areas are very small and do not intersect any Census block data the service returns zero values. The GeoEnrichment service provides an estimate of population and does not count roof tops.

The Weighted Centroid method is illustrated in the following figure:

Weighted Centroid method

In the previous figure, Census Blocks are illustrated as black points. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. The sum of these weights will provide a proportion of area P3 that is within the study area. Summarizing a demography variable such as the Total Population, will use this proportion to aggregate and summarize data. For example, if 90 percent of the P3 Blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.

The weight w1 of the site P1 is calculated as a sum of weights of block points belonging to the intersection of the site P1 and the target polygon T:

formula for weight of the site

Here, ß is a block and W1(ß) is a weight of this block in the site P1.

Summarizing a demography variable such as the Total Population, the weights need to be determined for all intersecting geographies. The GeoEnrichment service calculates the weight W1(ß) as a ratio of the total population associated with the block (ß) belonging to the site P1 to the sum of total population values for all blocks belonging to the site P1:

formula

Data apportionment outside the United States

The GeoEnrichment service uses the underlying statistical boundaries (e.g. postal code boundaries in Turkey) and population data in each country to aggregate demographic attributes for any given study area.

A traditional GIS approach to aggregating the population for a study area would find all the statistical areas that have their center point inside a ring buffer and sum their populations to get the total. The challenge with this approach is that it excludes the statistical areas that have their center point outside the ring even though a portion of those statistical areas might be inside the ring. For example, how would the population of the postal code in Turkey, highlighted below (i.e. Postcode 20160 with a population value of 94,294), be considered in the aggregation? Because neither the whole area nor its center point is completely within the ring buffer, Postcode 20160 would be excluded from aggregation using the traditional GIS approach.

Postcode5 layer

To better handle such statistical areas that are only partially contained inside the buffer ring, Esri uses a weighted population (aka Dasymetric mapping) approach to aggregate population and other demographic attributes for smaller study areas. In the above example, Esri has a behind-the-scenes weighted population layer of almost 11 million points that represent how population is distributed throughout each of the statistical areas (e.g. postal codes). This allows the GeoEnrichment service to better determine how the population of 94,294 is distributed across the Postcode 20160 statistical area and estimate the portion of the population that is within the ring buffer study area. For the entire 10-kilometer ring study area, this process is repeated for all partially intersected postal code statistical areas to determine that there is a estimated population of 544,097.

Challenges with the weighted population approach for large study areas

The approach described above for handling partially intersected statistical areas using a weighted population approach works very well for small input study areas (e.g. ring buffer of less than 100-miles in diameter). Since this technique uses very detailed population distribution points, the performance of aggregating these points can becomes slow and consume an enormous amount of computing power for large study areas. Therefore, the GeoEnrichment service falls back to using a traditional GIS approach when aggregating data for large study areas.

Small and large study areas are defined by the diameter of the area. For non-circular areas, such as custom polygons or drive-time service areas, the diameter is defined by choosing the maximum of the height and width of the box bounding that area. This bounding box is often referred to as the extent of the study area. For example, the same location in the original example above in Turkey is shown on the map below with a radial buffer of 50 kilometers (i.e. a diameter of 100 kilometers), so any study area this size or smaller will use the population weighted approach for aggregating data. Study areas that are larger than this size will fall back to the traditional GIS approach.

50 km ring buffer

Understanding what aggregation approach is used

When using the GeoEnrichment service, the enrich response provides information about what aggregation approach is used to aggregate demographic data. For example, a 10 kilometers ring for Turkey provides the following response:

"features" : [ {
          "attributes" : {
            "ID" : "1",
            "OBJECTID" : 2,
            "sourceCountry" : "TR",
            "myID" : "point2",
            "areaType" : "RingBufferBands",
            "bufferUnits" : "esriKilometers",
            "bufferUnitsAlias" : "Kilometers",
            "bufferRadii" : 10,
            "aggregationMethod" : "BlockApportionment:TR.Postcodes5",
            "HasData" : 1,
            "TOTPOP" : 544097

The aggregationMethod property in the response describes how the data is aggregated for the input study area:

aggregationMethod=BlockApportionment:TR.Postcodes5

This means that the 10 kilometer ring has data aggregated with the BlockApportionment (weighted population) approach with the Postcode5 statistical boundaries.

Conversely, a 120 kilometer study area has an aggregation method of:

aggregationMethod=CentroidsInPolygon:TR.Postcodes5

120 kilometer study area response

"features" : [ {
          "attributes" : {
            "ID" : "1",
            "OBJECTID" : 7,
            "sourceCountry" : "TR",
            "myID" : "point2",
            "areaType" : "RingBufferBands",
            "bufferUnits" : "esriKilometers",
            "bufferUnitsAlias" : "Kilometers",
            "bufferRadii" : 120,
            "aggregationMethod" : "CentroidsInPolygon:TR.Postcodes5",
            "HasData" : 1,
            "TOTPOP" : 3071779

This means that the traditional GIS approach of intersecting center points of the post code boundaries (i.e. CentroidsInPolygon) was used with the Postcode5 statistical boundaries for this larger study area.

Improving how population is estimated for large study areas

A new property in the GeoEnrichment service called detailedAggregationMethod uses the weighted population approach for study areas as large as 300 kilometers in diameter. This property has a limitation on the number of areas that can be run at a time. For small study areas, the GeoEnrichment service allows up to 100 input study areas in a single request, but this is over-ridden when the detailed aggregation property is set to true. Users can run only a single area at a time when the detailedAggregationMethod is set since aggregation for any areas larger than 100 kilometers is computer intensive. The service would not be able to support multiple concurrent requests with very large study areas and would result in request timeouts, very poor performance, overloading of servers, and other negative impacts.

Note:

  • When using the detailedAggregationMethod, larger tolerances for the population weighted approach for aggregating data can be used. In most countries, the tolerance will be increased from 100 kilometer to 150 kilometer when the aggregation method is switched from a weighted population approach to a traditional GIS approach.
  • The GeoEnrichment service will always use the weighted population aggregation approach for all drive-time service areas up to and including 90-minutes.