Which college district has the fewest low-income families?

A pilot program was run by a local cable operator in the county to provide low-cost computers and Internet access to low-income families with kids in high school. This showed a marked improvement in school performance for these kids, and the program has brought the company a fair amount of positive publicity and goodwill in the community.

Company officials now want to set up a similar program for community college students. The company provides Internet access to the five community college districts in the county, and officials are aware that the colleges are under a lot of pressure - they are facing funding cuts at the same time as increased demand for enrollment. To try to improve the situation the colleges are turning more and more to distance learning, primarily via the Internet. By providing computers and Internet access, the cable company can enable more low-income students to take advantage of online classes.

This case study uses ArcGIS API for Python to find districts that have the fewest low income families in order to empower these students.

We will use summarize_within tool to get the number of low-income families within each community district. We will also visualize this using the map widget.

Connect to your ArcGIS Online organization

We first establish a connection to our organization which could be an ArcGIS Online organization or an ArcGIS Enterprise. To be able to run the code using ArcGIS API for Python, we will need to provide credentials of a user within an ArcGIS Online organization.

Input
from arcgis.gis import GIS
import pandas as pd
Input
gis = GIS('home')

Get data for analysis

Input
san_diego_data = gis.content.search('title:CommunityCollege_CensusTracts owner:api_data_owner', 
                                 'Feature layer',
                                  outside_org=True)
Input
san_diego_data
Output
[<Item title:"CommunityCollege_CensusTracts" type:Feature Layer Collection owner:api_data_owner>]
Input
from IPython.display import display

for item in san_diego_data:
    display(item)
CommunityCollege_CensusTracts
Feature Layer Collection by api_data_owner
Last Modified: April 11, 2020
0 comments, 3 views
Input
san_diego_item = san_diego_data[0] # get first item from the list of items
Input
for lyr in san_diego_item.layers:
    print(lyr.properties.name)
census_tract_income
Community_College_Dist

Since the item is a Feature Layer Collection, accessing the layers property will give us a list of Feature Layers.

Input
census_tract_income = san_diego_item.layers[0]
Input
community_college_dist = san_diego_item.layers[1] 
Input
m1 = gis.map('San Diego')
m1
Output
Input
m1.add_layer(community_college_dist)
Input
m2 = gis.map('San Diego')
m2
Output
Input
m2.add_layer(census_tract_income)

Find the community college district with the fewest low income families

Convert the layer into pandas dataframe to calculate the number of households in each tract with income less than $30,000.

Input
sdf = pd.DataFrame.spatial.from_layer(census_tract_income)
Input
sdf.columns
Output
Index(['FID', 'TRACT', 'INCOME_ALL', 'INCOME_LES', 'INCOME_10K', 'INCOME_15K',
       'INCOME_20K', 'INCOME_25K', 'INCOME_30K', 'INCOME_35K', 'INCOME_40K',
       'INCOME_45K', 'INCOME_50K', 'INCOME_60K', 'INCOME_75K', 'INCOME_100',
       'INCOME_125', 'INCOME_150', 'INCOME_200', 'Shape__Area',
       'Shape__Length', 'SHAPE'],
      dtype='object')
Input
sdf.head()
Output
FID TRACT INCOME_ALL INCOME_LES INCOME_10K INCOME_15K INCOME_20K INCOME_25K INCOME_30K INCOME_35K ... INCOME_50K INCOME_60K INCOME_75K INCOME_100 INCOME_125 INCOME_150 INCOME_200 Shape__Area Shape__Length SHAPE
0 1 7700 4148 243 205 158 195 229 279 278 ... 445 526 370 379 73 127 125 1.724049e+06 6919.424522 {'rings': [[[-13051046.6746253, 3866695.333166...
1 2 7800 2510 294 132 180 160 135 250 116 ... 280 263 178 107 64 52 9 2.889814e+06 11223.567885 {'rings': [[[-13049196.649225, 3869830.7042951...
2 3 7901 2953 240 156 154 191 209 233 168 ... 325 393 233 150 49 42 25 1.785775e+06 5749.634908 {'rings': [[[-13051806.5792234, 3868598.509832...
3 4 7903 2429 154 163 184 174 171 139 195 ... 288 145 310 124 30 43 19 1.075471e+06 4651.499315 {'rings': [[[-13050375.5212048, 3868973.977334...
4 5 7904 3157 335 219 187 208 218 199 188 ... 304 316 326 162 53 67 19 1.318394e+06 4961.527797 {'rings': [[[-13050786.6266337, 3868042.625540...

5 rows × 22 columns

The census tract layer contains the number of households in each of several income categories, such as less than \$10,000, \$10,000 to \$15,000, \$15,000 to \$20,000, and so on.

The aim of the project is to provide support to families with an annual income less than \$30,000.

We will add a field to the census tract dataframe and sum the number of households in each tract with income less than \$30,000.

Input
sdf['income_lt_30k'] = sdf['INCOME_LES'] + sdf['INCOME_10K'] + sdf['INCOME_15K'] + sdf['INCOME_20K'] + sdf['INCOME_25K']
Input
sdf.income_lt_30k.head()
Output
0    1030
1     901
2     950
3     846
4    1167
Name: income_lt_30k, dtype: int64
Input
sdf.head()
Output
FID TRACT INCOME_ALL INCOME_LES INCOME_10K INCOME_15K INCOME_20K INCOME_25K INCOME_30K INCOME_35K ... INCOME_60K INCOME_75K INCOME_100 INCOME_125 INCOME_150 INCOME_200 Shape__Area Shape__Length SHAPE income_lt_30k
0 1 7700 4148 243 205 158 195 229 279 278 ... 526 370 379 73 127 125 1.724049e+06 6919.424522 {'rings': [[[-13051046.6746253, 3866695.333166... 1030
1 2 7800 2510 294 132 180 160 135 250 116 ... 263 178 107 64 52 9 2.889814e+06 11223.567885 {'rings': [[[-13049196.649225, 3869830.7042951... 901
2 3 7901 2953 240 156 154 191 209 233 168 ... 393 233 150 49 42 25 1.785775e+06 5749.634908 {'rings': [[[-13051806.5792234, 3868598.509832... 950
3 4 7903 2429 154 163 184 174 171 139 195 ... 145 310 124 30 43 19 1.075471e+06 4651.499315 {'rings': [[[-13050375.5212048, 3868973.977334... 846
4 5 7904 3157 335 219 187 208 218 199 188 ... 316 326 162 53 67 19 1.318394e+06 4961.527797 {'rings': [[[-13050786.6266337, 3868042.625540... 1167

5 rows × 23 columns

Input
sdf.shape
Output
(605, 23)

We will import the spatially enabled dataframe back into the GIS and create a feature layer.

Input
census_tract = gis.content.import_data(sdf,
                                       title='CensusTract',
                                       tags='datascience')
Input
census_tract
Output
CensusTract
Feature Layer Collection by arcgis_python
Last Modified: September 06, 2019
0 comments, 0 views

Get the number of low-income households in each district

We will summarize census tracts by community college districts to find the total number of low-income households in each district. If a tract falls in two or more districts, the value for that tract will be split proportionally between the districts (based on the area of the tract in each district).

Input
from arcgis.features.summarize_data import summarize_within
from datetime import datetime as dt
Input
tracts_within_boundary = summarize_within(community_college_dist,
                                          census_tract,
                                          summary_fields=["income_lt_ SUM"],
                                          shape_units='SquareMiles',
                                          output_name='TractsWithinBoundary' + str(dt.now().microsecond))
Input
tracts_within_boundary
Output
TractsWithinBoundary888389
Feature Layer Collection by arcgis_python
Last Modified: April 11, 2020
0 comments, 1 views
Input
m3 = gis.map('San Diego')
m3
Output
Input
m3.add_layer(tracts_within_boundary)

The map displays the census tracts color-coded by the number of households in each census tract with income less than $30,000 per year.

Input
tracts_within_boundary_lyr = tracts_within_boundary.layers[0]
Input
sdf = pd.DataFrame.spatial.from_layer(tracts_within_boundary_lyr)
Input
sdf.columns
Output
Index(['OBJECTID_1', 'OBJECTID', 'DISTRICT', 'Shape_Leng', 'Shape__Area',
       'Shape__Length', 'Shape__Area_2', 'Shape__Length_2',
       'SUM_Area_SquareMiles', 'SUM_income_lt_', 'AnalysisArea', 'SHAPE'],
      dtype='object')
Input
sdf.sort_values(['SUM_income_lt_'], inplace=True)
Input
sdf.head()
Output
OBJECTID_1 OBJECTID DISTRICT Shape_Leng Shape__Area Shape__Length Shape__Area_2 Shape__Length_2 SUM_Area_SquareMiles SUM_income_lt_ AnalysisArea SHAPE
1 2 3 MIRA COSTA COMMUNITY COLLEGE 5.292542e+05 6.666802e+08 193041.705158 0.045057 1.554687 179.904966 28288 180.057317 {'rings': [[[-117.405857134, 33.343909379], [-...
0 1 5 SOUTHWESTERN COMMUNITY COLLEGE 4.845452e+05 6.272847e+08 175596.854611 0.042637 1.460371 171.085801 40861 171.343524 {'rings': [[[-117.190352975, 32.7128696710001]...
3 4 1 GROSSMONT-CUYAMACA COMMUNITY COLLEGE 9.623860e+05 4.176966e+09 349055.694465 0.283456 2.967744 1137.093734 48777 1137.329803 {'rings': [[[-116.788795455, 32.9637127680001]...
4 5 2 PALOMAR COMMUNITY COLLEGE 1.538205e+06 9.475183e+09 560765.010295 0.639828 4.732599 2554.695554 56549 2554.787798 {'rings': [[[-117.487635615, 33.5050662620001]...
2 3 4 SAN DIEGO COMMUNITY COLLEGE 6.086365e+05 7.999274e+08 221005.474787 0.054256 1.811543 217.566586 127843 217.585904 {'rings': [[[-117.1356743, 32.9486238640001], ...

Visualization to show district with fewest households

Input
m4 = gis.map('San Diego')
m4
Output
Input
m4.add_layer(tracts_within_boundary, {"renderer":"ClassedSizeRenderer",
                                      "field_name": "SUM_income_lt_"})

It's clear that the Mira Costa district has by far the fewest low-income households. That's where the pilot program could be set up.

Conclusion

We have successfully located a district with the fewest low income families. We can assess the success of the project for the next 6 months and give recommendations to expand the program across other areas in the country.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.