What is Geospatial big data
Geospatial big data not only refers to large amounts of data related to the locations and positions of the earth's various features, but to large amounts of data that change very quickly and vary in origin and quality.
Geospatial data is a crucial component of big data, adding context that allows for better, multidimensional insights that facilitate smarter decisions, improved business operations and innovation to deliver a strong competitive advantage.
Big Data is defined by volume, velocity and variety — large amounts of data coming in quickly in a number of formats.
Volume: Big data "size" is a constantly moving target, ranging from a few dozen terabytes to many petabytes of data. Some examples of volume in big data:
Every 60 seconds:
- 100,000 tweets
- 2.4 million google searches
- 11 million instant messages
- 170 million email messages
- 1800 Tb of data
For example: The New York Stock Exchange generates about one terabyte of new trade data per day.
Velocity: The term velocity refers to the speed of data generation. It is the measure of how fast the data is coming. The two kinds of velocity related to big data are:
- Frequency of generation
- Frequency of handling, recording and publishing
For example: Facebook users upload more than 900 million photos a day.
Variety: The type and nature of data defines the variety of big data. Data drawn from text, video, audio, images, social media, mobiles, email messages and documents etc is different variety of data. It can be structured, un-structured or semi-structured. All this data diversity makes up the third dimension of big data i.e., variety.
Visualizing this geospatial big data creates opportunities for analyzing spatial relationships, exploring multiple dimensions across geographies, and predicting or modeling events in meaningful and timely ways. Where are disease outbreaks occurring now? Where have they historically occurred? Where are current population shifts increasing insurance risks? What spatial relationships and behavior patterns emerge before, during and after catastrophic storms?
Through the The Science of Where, Esri software adds a new dimension to big data problem solving - see big data and Esri for insights on how the ArcGIS Platform helps you realize everything your big data has to offer. At 10.5, ArcGIS Enterprise introduced an ArcGIS GeoAnalytics Server that provides the ability to perform big data analysis on your Web GIS.
ArcGIS GeoAnalytics Server is a big data processing and analysis capability of ArcGIS Enterprise. It provides a distributed computing framework that powers a collection of analysis tools for analyzing large volumes of data. Through aggregation, regression, detection, clustering, and so on, you can visualize, understand, and act upon your big data. GeoAnalytics Server allows you to gain insights that may otherwise be hidden in your data, such as patterns, trends, and anomalies.
GeoAnalytics Server works with your vector (points, lines, and polygons) and tabular data and can read directly from CSV files, .txt files, shapefiles, and big data sources, such as cloud storage, HDFS, and Hive. GeoAnalytics Server also works with your existing GIS data, using feature layers as input.
GeoAnalytics Server tools focus on the different spatial analysis approaches you can take with big data: Analyze Patterns, Find Locations, Manage Data, Summarize Data, Use Proximity, and Data Enrichment. Whether you need to complete a quick spatial join, run regression analysis on multiple datasets, or find areas of data clustering, the GeoAnalytics Server toolbox provides many options to explore your data. In addition to the provided tools, you can customize analysis to complete your workflows and analyses through Python, using distributed computation and tools on your GeoAnalytics Server.
All analysis is performed on your GeoAnalytics Server, and results can be stored either in ArcGIS Enterprise so you can continue to explore, analyze, map, and share those results, or to your own data sources for further processing.
What you get in ArcGIS GeoAnalytics
GeoAnalytics Server uses
GeoAnalytics Server is helpful when you find that your current GIS analyses aren’t processing your data fast enough. It accelerates traditional workflows so you can get results quicker. GeoAnalytics Server is also helpful when you have large datasets and you need to analyze them spatially. GeoAnalytics Server is a good solution for the following:
- Your existing tools and workflows aren’t processing your data fast enough.
- Your data is growing and you need a better way of managing and analyzing it.
- You need to transform your data into something more manageable to use in other GIS analyses (for example using ArcGIS Pro analysis tools).
- Your data has a lot of noise and you want to explore it to identify important points.
- You want to use spatial statistical analysis and machine learning tools suitable for large datasets.
Examples of analysis with ArcGIS GeoAnalytics Server
GeoAnalytics tools are versatile across industries. The following examples illustrate how GeoAnalytics Server can be used with different goals in mind:
- As a crime analyst, you can understand the location and time of crimes in your state, as well as the proximity of crimes to areas of interest, such as events, police stations, and city centers. Related tools are Aggregate Points and Join Features.
- As a manager at a state Department of Transportation, you can analyze decades of traffic and crash data to determine the interstates with the most incidents. You can also analyze when certain vehicles were speeding and braking and correlate them with the locations of vehicular accidents. Related tools are Find Point Clusters and Reconstruct Tracks.
- As a water utility technician, you can sort through work orders for leaks and join them to a dataset of soil types to determine if leaks have occurred in areas where there is particularly corrosive soil. Related tools are Create Space Time Cube and Find Hot Spots.
- As a city GIS analyst, you can use ArcGIS GeoEvent Server to ingest GPS data on all city vehicles, like public works vehicles and snow plows, and see where the vehicles have travelled, areas that have less coverage, and instances where vehicles exceeded the speed limit. Related tools are Reconstruct Tracks, Aggregate Points, and Detect Incidents.
GeoAnalytics Server enables distributed analysis on a single machine or across a set of three machines. With this distributed computing power, your analysis can be performed more quickly and with larger quantities of data than could previously be computed on a desktop machine. Results from your analysis can be stored in ArcGIS Enterprise for use in web maps, apps, and other information products, or you can write back to your own data store.
To get started with GeoAnalytics Server, install an ArcGIS Enterprise base deployment and ArcGIS Data Store configured as the spatiotemporal big data store. If you will use three machines in your GeoAnalytics Server site, you should also set up a three-machine spatiotemporal big data store. For details on how to set up your deployment to enable GeoAnalytics Server, see Set up GeoAnalytics Server and Best practices for GeoAnalytics Server sites.
arcgis.geoanalytics module available in ArcGIS API for Python provides submodules, data types, classes and functions to process your big data using an ArcGIS GeoAnalytics Server. The ArcGIS GeoAnalytics Server enables distributed analysis across multiple ArcGIS Server machines. You can analyze more data in less time because you harness the compute power of multiple machines. Navigate to the Get Started with GeoAnalytics Server page to learn how to configure the GeoAnalytics Server with your ArcGIS Enterprise deployment.
The module verifies that your Enterprise has been properly configured with the
is_supported() function. Once it is confirmed that the Enterprise is configured, the geoanalytics tools work with big data file share items registered in the Web GIS and with feature layers.
geoanalytics module presents various tools grouped into submodules:
arcgis.geoanalytics.analyze_patternsmodule provides tools that help you identify, quantify, and visualize spatial patterns in your data.
arcgis.geoanalytics.find_locationsmodule provides tools that help you identify areas or features that meet any number of criteria you specify.
arcgis.geoanalytics.manage_datamodule provides tools for the day-to-day management of geographic and tabular data.
arcgis.geoanalytics.summarize_datamodule provides tools that output descriptive statistics of features and their attributes based on spatial relationships to other features.
arcgis.geoanalytics.use_proximitymodule provides tools that define geometries to help analyze which features are near other features.
In this guide, we have gained familiarity with how to utilize GeoAnalytics server to analyze big data. In the next guide, we will learn how data can be made accessible to the GeoAnalytics server.