# Apportionment reliability

The reliability of any GeoEnrichment result is determined by the quality of each country's census from which population and other demographic variables are derived and the quality of a weighted footprint of human settlement used to apportion the selected GeoEnrichment variables onto a given polygon.

Each country is given two scores:

1. Area to population reliability which focuses only on the size (area) of the census polygon and the number of people estimated to live in that polygon.
2. The overall reliability score which encompasses all aspects of reliability and includes the Area to population score.

## Area to population reliability

There are currently two methods for calculating each of these. The older method is simpler and beginning with the countries released in the October 2020 and later, a more robust method is used.

### Old method: Ratio of the population polygon to the number of people (1.0 – 5.0)

The larger the area of a census tabulation area, the less likely the specific locations where people live can reliably be found. For large areas with relatively low populations, this means the likelihood of correctly locating where those people live is even lower. The table below shows the combinations of area and population density used to classify a census tabulation polygon's reliability for the ratio of population to area.

### New method: Mean of scores for polygon size and level of population (1.0 – 5.0)

The old method underestimated the potential for error in population estimates of over 500 people in a census unit. The new uses the same ranges but assigns reliability as follows:

Population of a given census polygonReliability Score

Over 100,001

100

10,001 to 100,000

25

2,501 to 10,000

10

501 to 2500

5

101 to 500

2

Less than 100

1

Area of a given census polygonReliability Score

Over 100.1 Km2

100

16.1 to 100 Km2

25

4.1 to 16.0 Km2

10

1.1 to 4.0 Km2

5

0.11 to 1.0 Km2

2

Less than 0.1 Km2

1

The mean of these scores is then scaled using a simple linear equation to a range of 1.0 (best) to 5.0 (worst), so the range is identical to the older method.

## Overall reliability

The new and old method combined are the mean of multiple factors including their respective Area to Population Reliability.

### Old method

Combines Area to Population Reliability with Complexity of the weighted footprint of human settlement and Census Quality reliability scores from The United Nations Statistical Division.

#### Complexity of settlement footprint relative to NoData and zero population cells (1.0 to 5.0)

Esri processes Landsat8 panchromatic imagery (15-meter resolution) to find texture. When levels of texture are sufficiently high, the likelihood that it represents human settlement is high. However, because this model is largely completed using raster data, underestimation of the footprint edges occur due to resampling. The amount of area is proportional to the complexity of the (raster) human settlement footprint. Complexity is measured as the sum of distances from a given cell to all NoData cells within 8 kilometers (this figure is then scaled to 1.0 to 5.0). NoData occurs at coastlines and where low amounts of texture from Landsat8 imagery are located.

#### Census Quality reliability scores (1.0 - 5.0)

The United Nations Statistical Division reports characteristics for each country’s census with respect to completeness, age of the most recent census and subsequent estimates used to derive the current estimate, and the type of census (de-jure vs. de facto). Examples of these scores can be found on the U.N. Statistical Division’s website. Esri scores each of these characteristics as shown below and produces a combined mean score that is represented as a constant value everywhere in the country.

Census TypeReliability Score

Census - de jure - complete tabulation

1

Estimate - de jure

2

Census - de facto - complete tabulation

3

Estimate - de facto

4

Sample survey - de facto

5

Completeness / Reliability Reliability Score

Final figure, complete

1

Provisional figure

2

Final figure, incomplete/questionable reliability

5

Provisional figure with questionable reliability

5

Age of Census or Estimate Reliability Score

1 - 2 years (rarely available this soon)

1

3 years

2

3 - 5 years

3

5 - 6 years

4

6 - 10 years

5

More detailed information about the data sources and methods for each country can be found on the Esri Demographics Global data overview and U.S. data overview pages.

#### Inherent biases to the old method

Because of the varied nature of each country and how census information is collected, there are some biases in this score. In general:

• Small countries tend to have better scores.
• Countries with small, similarly-sized tabulation units, have better scores.
• Countries with a wide variety of tabulation geography sizes—for example, Saudi Arabia or the United States, have middling scores.

### New method

Two new metrics replace the old method's complexity of settlement footprint, which is expressed as settlement points, which are explained further in Data Apportionment.

1. Percent of Area Settled—This is determined using a raster analysis at 75-m resolution where the count of cells in the footprint of settlement is divided by the count of cells representing the census polygon.
Percent of Area SettledReliability Score

Less than 0.1%

100

0.1 to 4.99%

25

5.0 to 19.99%

10

20.0 to 49.9%

5

50.0 to 98.9%

2

99.0 to 100%

1

2. Ratio of Settlement points (centroids of 75-m raster cells) to population—Depending on how settlement points were derived, especially those not based on address points or building footprint centroids, it is possible to have extra points representing buildings people do not live in, or other landscape disturbances Esri's algorithm may mistake for human settlement. Ideally each settlement point represents where 15-50 people live.
Ratio of population to settlement pointReliability Score

Less than 2.9

Over 500

100

3.0 to 6.9

200.1 to 500

25

7.0 to 11.9

100.1 to 200

10

12.0 to 17.9

50.1 to 100

5

18.0 to 23.9

36.1.to 50.0

2

Between 24-36

1

The census quality reliability scores for census type and completeness remain the same, while age of census was modified as follows in the new method:

Age of Census or EstimateReliability Score

1 - 2 years (rarely available this soon)

1

3 - 5 years

3

6 - 20+ years

5

The mean of all the new method scores are scaled using a simple linear equation to a range of 1.0 (best) to 5.0 (worst), and then are averaged with the census reliability scores to produce the overall reliability score.

## Applying the reliability score

The reliability score can be used to estimate the smallest polygon that can be accurately enriched in a given country. To perform this estimate, you square the reliability score and multiply by three to derive the number of square kilometers a polygon's area will need to expect the best quality results when enriching polygons in that country. The reliability score can be modified by up to a 1.0 depending on whether the polygon to be enriched is covering an urban area or a rural area. Subtract 1.0 if it is an urban area because census data, generally, is more reliable in urban areas. Conversely, add 1.0 for rural areas.

As the reliability score increases (becomes poorer), so must the size of the enriched polygons in order to obtain reliable results.

##### Tip:

It is important to remember that while a country's reliability may be average or even poor, there may be areas of better reliability within the country which could reliably support enriching smaller polygons.

## Example usage

The reliability score values are determined on a per country basis. This makes them easily discoverable using the Countries query method.

Request example 1

In order to find the reliability estimates for each country, you can use the Countries discovery method. The values that represent the reliability estimates are: populationToPolygonSizeRating and apportionmentConfidence.

``https://geoenrich.arcgis.com/arcgis/rest/services/World/GeoEnrichmentServer/Geoenrichment/Countries/US?f=pjson``

JSON response example 1
``````{
"messages": [],
"countries": [
{
"id": "US",
"name": "United States",
"abbr3": "USA",
"altName": "UNITED STATES",
"continent": "North America",
"distanceUnits": "Miles",
"esriUnits": "esriMiles",
"defaultExtent": {
"xmin": -178.48633078,
"ymin": 18.8717424169,
"xmax": -66.9076521618,
"ymax": 71.403759084
},
"defaultDatasetID": "USA_ESRI_2018",
"datasets": [
"USA_ESRI_2018",
"USA_ASR_2018",
"USA_RMP_2018",
"USA_ACS_2018",
"Landscape"
],
"hierarchies": [
{
"ID": "census",
"alias": "Standard",
"shortDescription": "This data source for the standard US data is ESRI. Vintage 2018.",
"longDescription": "<p>This data source is provided by Esri Inc.</p><p>Esri offers comprehensive demographic, lifestyle segmentation, consumer spending, and business content for a variety of geographic levels in the United States for use in applications such as site selection, profiling customers, analyzing markets, evaluating competitors, identifying opportunities, and many more.</p><p><a href=\\'https://doc.arcgis.com/en/esri-demographics/data/us-intro.htm\\' target=\\'_blank\\'>Learn more.</a></p>",
"datasets": [
"USA_ESRI_2018",
"USA_ACS_2018",
"USA_ASR_2018",
"USA_RMP_2018"
],
"levelsInfo": {
"geographyLevels": [
"Entire Country",
"States",
"Counties",
"ZIP Codes",
"Block Groups",
"CBSAs",
"Census Tracts",
"Cities and Towns (Places)",
"Congressional Districts",
"County Subdivisions",
"DMAs"
]
},
"variablesInfo": {
"categories": [
"Age",
"At Risk",
"Behaviors",
"Education",
"Health",
"Households",
"Housing",
"Income",
"Jobs",
"Key Facts",
"Marital Status",
"Policy",
"Population",
"Poverty",
"Race",
"Spending",
"Supply and Demand",
"Tapestry"
]
},
"populationToPolygonSizeRating": 2.191,
"apportionmentConfidence": 2.576
},
{
"ID": "landscape",
"alias": "Landscape",
"shortDescription": "This data source for the landscape US data is ESRI. Vintage 2012.",
"longDescription": "<p>The Esri Landscape Layers are a collection of data, currently available for the United States, that are applicable to a wide range of uses such as biogeographic analysis, natural resource management, and land use and conservation planning.  There are map layers that describe the physical structure of the land, such as hydrography, soil characteristics, geologic units, and land surface forms.  Plus, there are a variety of map layers in the biological and climatological domains, such as ecological systems, evapotranspiration, and critical habitat and other protected areas.  The term “landscape” also refers to the recoverable resources and manmade features that influence how we use the land and water.  Coal bed methane basins, oil shale basins, agricultural potential, and infrastructure, such as pipelines and transmission lines are examples of these types of landscape layers.</p><p><a href=\\'https://arcg.is/WILhrp\\' target=\\'_blank\\'>Learn more.</a></p>",
"datasets": [
"Landscape"
],
"levelsInfo": {
"geographyLevels": [
"st",
"huc4",
"huc8",
"huc12",
"cy",
"huc10",
"huc2",
"huc6",
"us"
]
},
"variablesInfo": {
"categories": [
"Landscape"
]
},
"populationToPolygonSizeRating": 2.191,
"apportionmentConfidence": 2.576
}
],
"defaultDataCollection": "KeyGlobalFacts",
"dataCollections": "",
"defaultReportTemplate": "Demographic and Income Profile",
"currencySymbol": "\$",
"currencyFormat": "\$0;-\$0"
}
],
"childResources": []
}``````

Notes:

Request example 2

The values representing reliability estimates are also returned in the results of the enrich method. The names of the variables are the same as in the Countries discovery method: populationToPolygonSizeRating and apportionmentConfidence.

``https://geoenrich.arcgis.com/arcgis/rest/services/World/geoenrichmentserver/GeoEnrichment/enrich?studyareas=[{"address":{"text":" 102 Aqua Ct New Smyrna Beach FL 32168"}}]&studyareasoptions={"areaType": "NetworkServiceArea","bufferUnits": "Minutes","bufferRadii": [15],"travel_mode":"Walking"}&dataCollections=["KeyGlobalFacts"]&returngeometry=false&f=pjson``

JSON response example 2
``````{
"results": [
{
"paramName": "GeoEnrichmentResult",
"dataType": "GeoEnrichmentResult",
"value": {
"version": "2.0",
"FeatureSet": [
{
"displayFieldName": "",
"fieldAliases": {
"ID": "ID",
"OBJECTID": "Object ID",
"sourceCountry": "sourceCountry",
"X": "X",
"Y": "Y",
"areaType": "areaType",
"bufferUnits": "bufferUnits",
"bufferUnitsAlias": "bufferUnitsAlias",
"aggregationMethod": "aggregationMethod",
"populationToPolygonSizeRating": "Population to polygon size rating for the country",
"apportionmentConfidence": "Apportionment confidence for the country",
"HasData": "HasData",
"TOTPOP": "Total Population",
"TOTHH": "Total Households",
"AVGHHSZ": "Average Household Size",
"TOTMALES": "Male Population",
"TOTFEMALES": "Female Population"
},
"spatialReference": {
"wkid": 4326,
"latestWkid": 4326
},
"fields": [
{
"name": "ID",
"type": "esriFieldTypeString",
"alias": "ID",
"length": 256
},
{
"name": "OBJECTID",
"type": "esriFieldTypeOID",
"alias": "Object ID"
},
{
"name": "sourceCountry",
"type": "esriFieldTypeString",
"alias": "sourceCountry",
"length": 256
},
{
"name": "X",
"type": "esriFieldTypeDouble",
"alias": "X"
},
{
"name": "Y",
"type": "esriFieldTypeDouble",
"alias": "Y"
},
{
"name": "areaType",
"type": "esriFieldTypeString",
"alias": "areaType",
"length": 256
},
{
"name": "bufferUnits",
"type": "esriFieldTypeString",
"alias": "bufferUnits",
"length": 256
},
{
"name": "bufferUnitsAlias",
"type": "esriFieldTypeString",
"alias": "bufferUnitsAlias",
"length": 256
},
{
"type": "esriFieldTypeDouble",
},
{
"name": "aggregationMethod",
"type": "esriFieldTypeString",
"alias": "aggregationMethod",
"length": 256
},
{
"name": "populationToPolygonSizeRating",
"type": "esriFieldTypeDouble",
"alias": "Population to polygon size rating for the country"
},
{
"name": "apportionmentConfidence",
"type": "esriFieldTypeDouble",
"alias": "Apportionment confidence for the country"
},
{
"name": "HasData",
"type": "esriFieldTypeInteger",
"alias": "HasData"
},
{
"name": "TOTPOP",
"type": "esriFieldTypeDouble",
"alias": "Total Population",
"fullName": "KeyGlobalFacts.TOTPOP",
"component": "demographics",
"decimals": 0,
"units": "count"
},
{
"name": "TOTHH",
"type": "esriFieldTypeDouble",
"alias": "Total Households",
"fullName": "KeyGlobalFacts.TOTHH",
"component": "demographics",
"decimals": 0,
"units": "count"
},
{
"name": "AVGHHSZ",
"type": "esriFieldTypeDouble",
"alias": "Average Household Size",
"fullName": "KeyGlobalFacts.AVGHHSZ",
"component": "scripts",
"decimals": 2,
"units": "count"
},
{
"name": "TOTMALES",
"type": "esriFieldTypeDouble",
"alias": "Male Population",
"fullName": "KeyGlobalFacts.TOTMALES",
"component": "demographics",
"decimals": 0,
"units": "count"
},
{
"name": "TOTFEMALES",
"type": "esriFieldTypeDouble",
"alias": "Female Population",
"fullName": "KeyGlobalFacts.TOTFEMALES",
"component": "demographics",
"decimals": 0,
"units": "count"
}
],
"features": [
{
"attributes": {
"ID": "0",
"OBJECTID": 1,
"sourceCountry": "US",
"X": -80.94857302553523,
"Y": 29.03368152986305,
"areaType": "NetworkServiceArea",
"bufferUnits": "Minutes",
"bufferUnitsAlias": "Walk Time Minutes",
"aggregationMethod": "BlockApportionment:US.BlockGroups",
"populationToPolygonSizeRating": 2.191,
"apportionmentConfidence": 2.576,
"HasData": 1,
"TOTPOP": 198,
"TOTHH": 122,
"AVGHHSZ": 1.62,
"TOTMALES": 92,
"TOTFEMALES": 105
}
}
]
}
]
}
}
],
"messages": []
}``````

Notes:

• Reliability estimates cannot be used if the studyAreas, being used for analysis, have areas that include parts of more than one country. The values: populationToPolygonSizeRating and apportionmentConfidence will have NULL values as a result.