Optimizing Custom Data Providers

Filter Data on the Remote Data Source

Not all remote data sources will have options for filtering data, but if the remote data source does, then your provider can be optimized to only request data that are needed. This will both reduce the amount of data that needs to be sent over the network, and it will help ensure that unnecessary data processing does not need to be performed by the ArcGIS client.

You can access the query parameters for any requests through the provider. The example code snippet below shows how to access the query parameters.

Use dark colors for code blocksCopy
Model.prototype.getData(request, callback) {
  const {params: {collection}, query: {where}} = request
  request(`https://exampleAPI.com/resource/${collection}?${where}`, (err, res, body) => {
    if (err) return callback(err)
    // format response body as GeoJSON
    callback(null, body)
  })
}

Note that the where parameter in feature service requests is SQL-like and may require parsing and careful translation before adding it to the remote API request.

For instance, if you want to query results by more than one attribute, in SQL you would use an AND operator between attributes. If your provider is directly contacting a SQL database, then the query filter generated by your ArcGIS client may work with little to no modification of the where statement. However, many REST APIs use query strings that reference attributes of the collection items, such as: https://exampleAPI.com/collection?attribute1=val1&attribute2=val2.
In this example, the raw where clause generated by a filter in an ArcGIS client could not be directly passed to the API and would require some modification in the provider code. See the MongoDB sample code for an example of how a SQL-like statement is translated.

Caching Responses

Caching responses can greatly improve the performance of your custom data provider and feature service by eliminating requests when previous responses are reusable. Managing the cache in your custom data provider is requires little code, but in order to get the most out of caching, careful understanding of your remote data source and the principles behind how the cache works is crucial.

The Custom Data Feeds Cache

In the example below, the cache "time to live" (ttl) is defined as 30 seconds. If in the 30 second window any duplicate request is recieved, the cached response will be returned from the custom data provider instead of sending a request to the remote data source. In order to implement caching, the ttl property with a value in seconds needs to be appended in the callback.

Use dark colors for code blocksCopy
const cacheTtl = 30; // thirty seconds

class Model {
  getData(req, callback) {
    // custom data provider code
    callback(null, {
      ...geojson,
      metadata: {
        idField,
        maxRecordCount
      },
      filtersApplied,
      ttl: cacheTtl,
      inputCrs,
    })
  }
}

The cache is based on the strategy of "Least Recently Used" (LRU). The cache has a capacity of 500 elements, and each element is referred to by a cache key that is generated based on the query string in the request. When a unique query is sent to the CDF server, a new key and cache element is created. When the cache reaches capacity, the oldest element is discarded to make room for a newer one. Cache elements are also discarded after their ttl expires.

The primary concern when implementing the cache is deciding how long the cached element should persist. If your custom data provider accesses a remote data source that is quickly updating, then a very small ttl or no cache may be desired. However, if the remote data source is rarely updated, or slightly out of date data are acceptable, then a large cache ttl could be used.

When deciding whether to use the cache functionality, be aware that the presence of a valid cached response will bypass the getData() function. In other words, if a query that matches a currently cached query is received, the cached response will be sent back without ever executing the getData() function.

Setting the `idField` Metadata Property

Custom data feeds will automatically create an OBJECTID for each feature in a set if the provider code does not explicity set the idField metatdata property. In the best case scenario, the remote dataset will contain a field of type integer with a unique value for each item in the set that can be used as an OBJECTID. To explicitly assign the value of the feature's OBJECTID, see the example below.

In the geojson.metadata, set the value of idField to the name of the property that holds the unique identifier, and it must be numeric.

Use dark colors for code blocksCopy
  geojson.metadata = {
    name: "Your feature layer name",
    idField: "id",
    description: "Your feature layer description."
  }

In the code where you construct the the GeoJSON, assign the chosen unique property value.

Use dark colors for code blocksCopy
  properties: {
    id: item.unique_integer_field,
    name: item.name,
    year: item.year
  }

Assigning a value for the geojson.metadata property idField is highly recommended in most cases because the custom data provider will not have to create an OBJECTID for each feature in the set. However, if the remote dataset does not have a suitable property that can act as a unique identifier or one cannot be added to it, custom data feeds will compute and assign a 64-bit OBJECTID for each feature from a hash of the entire feature's GeoJSON. Be sure to refer to the specifications of the ArcGIS clients you expect to consume your custom data feeds feature services. JavaScript-based web clients have a max safe integer value smaller than 64 bits.

Optimizing Custom Data Providers

Filter Data on the Remote Data Source

Caching Responses

The Custom Data Feeds Cache

Setting the idField Metadata Property

Setting the `idField` Metadata Property