-
class
arcgis.learn.text.
EntityRecognizer
(data=None, lang='en', backbone='spacy', **kwargs) Creates an entity recognition model to extract text entities from unstructured text documents.
To load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Parameter
Description
data
Optional data object returned from
prepare_data()
function. data object can be None, in case where someone wants to use a Hugging Face Transformer model fine-tuned on entity-recognition task. In this case the model should be used directly for inference.lang
Optional string. Language-specific code, named according to the language’s ISO code The default value is ‘en’ for English.
backbone
Optional string. Specify spacy, mistral or the HuggingFace transformer model name to be used to train the entity recognizer model. Default set to spacy.
Entity recognition via spaCy is based on <https://spacy.io/api/entityrecognizer>
To learn more about the available transformer models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about the available transformer models fine-tuned on Named Entity Recognition Task, kindly visit:- https://huggingface.co/models?pipeline_tag=token-classification
To learn more about mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
kwargs
Parameter
Description
verbose
Optional string. Default set to error. The log level you want to set. It means the amount of information you want to display while training or calling the various methods of this class. Allowed values are - debug, info, warning, error and critical. Applicable only for models with HuggingFace transformer backbones.
seq_len
Optional Integer. Default set to 512. Maximum sequence length (at sub-word level after tokenization) of the training data to be considered for training the model. Applicable only for models with HuggingFace transformer backbones.
mixed_precision
Optional Bool. Default set to False. If set True, then mixed precision training is used to train the model. Applicable only for models with HuggingFace transformer backbones.
pretrained_path
Optional String. Path where pre-trained model is saved. Accepts a Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
prompt
Optional String. This parameter is applicable if the selected model backbone is from the LLM family.
This parameter use to describe the task and guardrails for the task.
examples
Optional List. The list comprises tuple(s) where the first element denotes the text for entity extraction, while the second element is a dictionary used for mapping named entities.
This parameter is applicable if the selected model backbone is from the LLM family.
Pydantic Schema: List[Tuple[str, Dict[str, List]]]
Example: [(“Jim stays in London”, {“name”: [“Jim”], “location”: [“London”]})]
If examples are not supplied, a data object must be provided.
- Returns
EntityRecognizer
Object
-
classmethod
available_backbone_models
(architecture) Get available models for the given entity recognition backbone
Parameter
Description
architecture
Required string. name of the architecture or ‘llm’ one wishes to use.
To learn more about the available models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about llm and mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- Returns
a tuple containing the available models for the given entity recognition backbone
-
property
available_metrics
List of available metrics that are displayed in the training table. Set monitor value to be one of these while calling the fit method.
-
extract_entities
(text_list, drop=True, batch_size=4, show_progress=True) Extracts the entities from [documents in the mentioned path or text_list].
Field defined as ‘address_tag’ in
prepare_data()
function’s class mapping attribute will be treated as a location. In cases where trained model extracts multiple locations from a single document, that document will be replicated for each location in the resulting dataframe.Parameter
Description
text_list
Required string(path) or list(documents). List of documents for entity extraction OR path to the documents.
drop
Optional bool. If documents without address needs to be dropped from the results. Default is set to True.
batch_size
Optional integer. Number of items to process at once. (Reduce it if getting CUDA Out of Memory Errors). Default is set to 4. Not applicable for models with spaCy backbone.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far. Applicable only when a list of text is passed
- Returns
Pandas DataFrame
-
fit
(epochs=20, lr=None, one_cycle=True, early_stopping=False, checkpoint=True, **kwargs) Train the model for the specified number of epochs and using the specified learning rates
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
epochs
Required integer. Number of cycles of training on the data. Increase it if underfitting.
lr
Optional float or slice of floats. Learning rate to be used for training the model. If
lr=None
, an optimal learning rate is automatically deduced for training the model.Note
Passing slice of floats as lr value is not supported for models with spaCy backbone.
one_cycle
Optional boolean. Parameter to select 1cycle learning rate schedule. If set to False no learning rate schedule is used.
Note
Not applicable for models with spaCy backbone
early_stopping
Optional boolean. Parameter to add early stopping. If set to ‘True’ training will stop if parameter monitor value stops improving for 5 epochs.
Note
Not applicable for models with spaCy backbone
checkpoint
Optional boolean or string. Parameter to save checkpoint during training. If set to True the best model based on monitor will be saved during training. If set to ‘all’, all checkpoints are saved. If set to False, checkpointing will be off. Setting this parameter loads the best model at the end of training.
Note
Not applicable for models with spaCy backbone
tensorboard
Optional boolean. Parameter to write the training log. If set to ‘True’ the log will be saved at <dataset-path>/training_log which can be visualized in tensorboard. Required tensorboardx version=2.1
The default value is ‘False’.
Note
Not applicable for Text Models
monitor
Optional string. Parameter specifies which metric to monitor while checkpointing and early stopping. Defaults to ‘valid_loss’. Value should be one of the metric that is displayed in the training table. Use {model_name}.available_metrics to list the available metrics to set here.
Note
Not applicable for models with spaCy backbone
-
freeze
() Freeze up to last layer group to train only the last layer group of the model.
This method is not supported when the backbone is configured as llm/mistral.
-
classmethod
from_model
(emd_path, data=None, **kwargs) Creates an EntityRecognizer model object from a Deep Learning Package(DLPK) or Esri Model Definition (EMD) file.
To load a custom DLPK using the model extensibility support, instantiate an object of the class using this method.
- Returns
EntityRecognizer
Object
-
classmethod
from_pretrained
(backbone, **kwargs) Creates an EntityRecognizer model object from an already fine-tuned Hugging Face Transformer backbone.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
backbone
Required string. Specify the Hugging Face Transformer backbone name fine-tuned on Named Entity Recognition(NER)/ Token Classification task.
To get more details on available transformer models fine-tuned on Named Entity Recognition(NER) Task, kindly visit:- https://huggingface.co/models?pipeline_tag=token-classification
kwargs
Parameter
Description
verbose
Optional string. Default set to error. The log level you want to set. It means the amount of information you want to display while calling the various methods of this class. Allowed values are - debug, info, warning, error and critical.
- Returns
EntityRecognizer
Object
-
load
(name_or_path) To load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Loads a saved EntityRecognizer model from disk.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
name_or_path
Required string. Path to Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
-
lr_find
(allow_plot=True) Runs the Learning Rate Finder. Helps in choosing the optimum learning rate for training the model.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
allow_plot
Optional boolean. Display the plot of losses against the learning rates and mark the optimal value of the learning rate on the plot. The default value is ‘True’.
-
metrics_per_label
() Calculate precision, recall & F1 scores per labels/entities for which the model was trained on
-
plot_losses
(show=True) Plot training and validation losses.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
show
Optional bool. Defaults to True If set to False, figure will not be plotted but will be returned, when set to True function will plot the figure and return nothing.
- Returns
-
save
(name_or_path, **kwargs) Saves the model weights, creates an Esri Model Definition and Deep Learning Package zip for deployment to Image Server or ArcGIS Pro.
Parameter
Description
name_or_path
Required string. Name of the model to save. It stores it at the pre-defined location. If path is passed then it stores at the specified path with model name as directory name and creates all the intermediate directories.
publish
Optional boolean. Publishes the DLPK as an item. Default is set to False.
gis
Optional
GIS
Object. Used for publishing the item. If not specified then active gis user is taken.compute_metrics
Optional boolean. Used for computing model metrics. Default is set to True.
save_optimizer
Optional boolean. Used for saving the model-optimizer state along with the model. Default is set to False Not applicable for models with spaCy backbone.
kwargs
Optional Parameters: Boolean overwrite if True, it will overwrite the item on ArcGIS Online/Enterprise, default False. Boolean zip_files if True, it will create the Deep Learning Package (DLPK) file while saving the model.
-
show_results
(ds_type='valid') Runs entity extraction on a random batch from the mentioned ds_type.
Parameter
Description
ds_type
Optional string, defaults to valid.
- Returns
Pandas DataFrame
TextClassifier
-
class
arcgis.learn.text.
TextClassifier
(data, backbone='bert-base-cased', **kwargs) Creates a
TextClassifier
Object.To load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Based on the Hugging Face transformers library
Parameter
Description
data
Optional data object returned from
prepare_textdata
function. data object can be None, in case where someone wants to use a Hugging Face Transformer model fine-tuned on classification task. In this case the model should be used directly for inference.backbone
Optional string. Specify gpt or the HuggingFace transformer model name to be used to train the classifier. Default set to bert-base-cased.
To learn more about the available models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about the available transformer models fine-tuned on Text Classification Task, kindly visit:- https://huggingface.co/models?pipeline_tag=text-classification
To learn more about mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
kwargs
Parameter
Description
verbose
Optional string. Default set to error. The log level you want to set. It means the amount of information you want to display while training or calling the various methods of this class. Allowed values are - debug, info, warning, error and critical.
seq_len
Optional Integer. Default set to 512. Maximum sequence length (at sub-word level after tokenization) of the training data to be considered for training the model.
thresh
Optional Float. This parameter is used to set the threshold value to pick labels in case of multi-label text classification problem. Default value is set to 0.25
mixed_precision
Optional Bool. Default set to False. If set True, then mixed precision training is used to train the model
pretrained_path
Optional String. Path where pre-trained model is saved. Accepts a Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
prompt
Optional String. This parameter is applicable if the selected model backbone is from the LLM family.
This parameter use to describe the task and guardrails for the task.
examples
Optional dictionary. The dictionary’s keys represent labels or classes, with the corresponding values being lists of sentences belonging to each class.
This parameter is applicable if the selected model backbone is from the LLM family.
Pydantic notation
Optional[Dict[str, List]]
Example:
{“Label_1” :[example 1, example 2],“Label_2” : [example 1, example 2]}If examples are not supplied, a data object must be provided.
- Returns
TextClassifier
Object
-
accuracy
() Calculates the following metric:
accuracy: the number of correctly predicted labels in the validation set divided by the total number of items in the validation set
- Returns
a floating point number depicting the accuracy of the classification model.
-
classmethod
available_backbone_models
(architecture) Get available models for the given transformer backbone
Parameter
Description
architecture
Required string. name of the transformer or llm backbone one wish to use.
To learn more about the available models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- Returns
a tuple containing the available models for the given transformer backbone
-
property
available_metrics
List of available metrics that are displayed in the training table. Set monitor value to be one of these while calling the fit method.
-
fit
(epochs=10, lr=None, one_cycle=True, early_stopping=False, checkpoint=True, tensorboard=False, monitor='valid_loss', **kwargs) Train the model for the specified number of epochs and using the specified learning rates.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
epochs
Required integer. Number of cycles of training on the data. Increase it if underfitting.
lr
Optional float or slice of floats. Learning rate to be used for training the model. If
lr=None
, an optimal learning rate is automatically deduced for training the model.one_cycle
Optional boolean. Parameter to select 1cycle learning rate schedule. If set to False no learning rate schedule is used.
early_stopping
Optional boolean. Parameter to add early stopping. If set to ‘True’ training will stop if parameter monitor value stops improving for 5 epochs. A minimum difference of 0.001 is required for it to be considered an improvement.
checkpoint
Optional boolean or string. Parameter to save checkpoint during training. If set to True the best model based on monitor will be saved during training. If set to ‘all’, all checkpoints are saved. If set to False, checkpointing will be off. Setting this parameter loads the best model at the end of training.
tensorboard
Optional boolean. Parameter to write the training log. If set to ‘True’ the log will be saved at <dataset-path>/training_log which can be visualized in tensorboard. Required tensorboardx version=2.1
The default value is ‘False’.
Note
Not applicable for Text Models
monitor
Optional string. Parameter specifies which metric to monitor while checkpointing and early stopping. Defaults to ‘valid_loss’. Value should be one of the metric that is displayed in the training table. Use {model_name}.available_metrics to list the available metrics to set here.
-
freeze
() Freeze up to last layer group to train only the last layer group of the model.
This method is not supported when the backbone is configured as llm/mistral.
-
classmethod
from_model
(emd_path, data=None, **kwargs) Creates an TextClassifier model object from a Deep Learning Package(DLPK) or Esri Model Definition (EMD) file.
To load a custom DLPK using the model extensibility support, instantiate an object of the class using this method. ===================== =========================================== Parameter Description ——————— ——————————————- emd_path Required string. Path to Deep Learning Package
(DLPK) or Esri Model Definition(EMD) file.
——————— ——————————————- data Required fastai Databunch or None. Returned data
object from
prepare_textdata
function or None for inferencing.
-
classmethod
from_pretrained
(backbone, **kwargs) Creates an TextClassifier model object from an already fine-tuned Hugging Face Transformer backbone.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
backbone
Required string. Specify the Hugging Face Transformer backbone name fine-tuned on Text Classification task.
To get more details on available transformer models fine-tuned on Text Classification Task, kindly visit:- https://huggingface.co/models?pipeline_tag=text-classification
- Returns
TextClassifier
Object
-
get_misclassified_records
() This method is not supported when the backbone is configured as llm/mistral.
- Returns
get misclassified records for this classification model.
-
load
(name_or_path) To load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Loads a saved TextClassifier model from disk.
This method is not supported when the backbone is configured as llm/mistral and model extension.
Parameter
Description
name_or_path
Required string. Path to Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
-
lr_find
(allow_plot=True) Runs the Learning Rate Finder. Helps in choosing the optimum learning rate for training the model.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
allow_plot
Optional boolean. Display the plot of losses against the learning rates and mark the optimal value of the learning rate on the plot. The default value is ‘True’.
-
metrics_per_label
() - Returns
precision, recall and f1 score for each label in the classification model.
-
plot_losses
() Plot validation and training losses after fitting the model.
This method is not supported when the backbone is configured as llm/mistral.
-
predict
(text_or_list, show_progress=True, thresh=None, explain=False, explain_index=None, batch_size=64) Predicts the class label(s) for the input text
Parameter
Description
text_or_list
Required String or List. text or a list of texts for which we wish to find the class label(s).
prompt
Optional String. This parameter is applicable if the selected model backbone is from the LLM family.
This parameter use to describe the task and guardrails for the task.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far. Applicable only when a list of text is passed
thresh
Optional Float. The threshold value set to get the class label(s). Applicable only for multi-label classification task. Default is the value set during the model creation time, otherwise the value of 0.25 is set.
explain
Optional Bool. If set to True it shall generate SHAP based explanation. Kindly visit:- https://shap.readthedocs.io/en/latest/
explain_index
Optional List. Index of the rows for which explanation is required. If the value is None, it will generate an explanation for every row.
batch_size
Optional integer. Number of inputs to be processed at once. Try reducing the batch size in case of out of memory errors. Default value : 64
- Returns
In case of single label classification problem, a tuple containing the text, its predicted class label and the confidence score.
In case of multi label classification problem, a tuple containing the text, its predicted class labels, a list containing 1’s for the predicted labels, 0’s otherwise and list containing a score for each label
-
save
(name_or_path, framework='PyTorch', publish=False, gis=None, compute_metrics=True, save_optimizer=False, **kwargs) Saves the model weights, creates an Esri Model Definition and Deep Learning Package zip for deployment.
Parameter
Description
name_or_path
Required string. Folder path to save the model.
framework
Optional string. Defines the framework of the model. (Only supported by
SingleShotDetector
, currently.) If framework used isTF-ONNX
,batch_size
can be passed as an optional keyword argument.Framework choice: ‘PyTorch’ and ‘TF-ONNX’
publish
Optional boolean. Publishes the DLPK as an item.
gis
Optional
GIS
Object. Used for publishing the item. If not specified then active gis user is taken.compute_metrics
Optional boolean. Used for computing model metrics.
save_optimizer
Optional boolean. Used for saving the model-optimizer state along with the model. Default is set to False.
kwargs
Optional Parameters: Boolean overwrite if True, it will overwrite the item on ArcGIS Online/Enterprise, default False. Boolean zip_files if True, it will create the Deep Learning Package (DLPK) file while saving the model.
- Returns
the qualified path at which the model is saved
-
show_results
(rows=5, **kwargs) Prints the rows of the dataframe with target and prediction columns.
Parameter
Description
rows
Optional Integer. Number of rows to print.
- Returns
dataframe
SequenceToSequence
-
class
arcgis.learn.text.
SequenceToSequence
(data, backbone='t5-base', **kwargs) Creates a
SequenceToSequence
Object. Based on the Hugging Face transformers libraryTo load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Parameter
Description
data
Required text data object, returned from
prepare_textdata
function.backbone
Optional string. Specifying the HuggingFace transformer model name to be used to train the model. Default set to ‘t5-base’.
To learn more about the available models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
kwargs
Parameter
Description
verbose
Optional string. Default set to error. The log level you want to set. It means the amount of information you want to display while training or calling the various methods of this class. Allowed values are - debug, info, warning, error and critical.
seq_len
Optional Integer. Default set to 512. Maximum sequence length (at sub-word level after tokenization) of the training data to be considered for training the model.
mixed_precision
Optional Bool. Default set to False. If set True, then mixed precision training is used to train the model
pretrained_path
Optional String. Path where pre-trained model is saved. Accepts a Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
prompt
Optional String. This parameter is applicable if the selected model backbone is from the LLM family.
This parameter use to describe the task and guardrails for the task.
examples
Optional List of Tuples. It contains List of tuples. Each of the tuples has two elements which represents input and target sentence
This parameter is applicable if the selected model backbone is from the LLM family.
Pydantic notation
Optional[List[List[str, str]]]
Example:
[[“input_1”, “output_1”], [“input_2”, “output_2”]]If examples are not supplied, a data object must be provided.
- Returns
SequenceToSequence
model object for sequence_translation task.
-
classmethod
available_backbone_models
(architecture) Get available models for the given transformer backbone
Parameter
Description
architecture
Required string. name of the transformer backbone one wish to use. To learn more about the available models or choose models that are suitable for your dataset, kindly visit:- https://huggingface.co/transformers/pretrained_models.html
To learn more about mistral https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- Returns
a tuple containing the available models for the given transformer backbone
-
property
available_metrics
List of available metrics that are displayed in the training table. Set monitor value to be one of these while calling the fit method.
-
fit
(epochs=10, lr=None, one_cycle=True, early_stopping=False, checkpoint=True, tensorboard=False, monitor='valid_loss', **kwargs) Train the model for the specified number of epochs and using the specified learning rates.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
epochs
Required integer. Number of cycles of training on the data. Increase it if underfitting.
lr
Optional float or slice of floats. Learning rate to be used for training the model. If
lr=None
, an optimal learning rate is automatically deduced for training the model.one_cycle
Optional boolean. Parameter to select 1cycle learning rate schedule. If set to False no learning rate schedule is used.
early_stopping
Optional boolean. Parameter to add early stopping. If set to ‘True’ training will stop if parameter monitor value stops improving for 5 epochs. A minimum difference of 0.001 is required for it to be considered an improvement.
checkpoint
Optional boolean or string. Parameter to save checkpoint during training. If set to True the best model based on monitor will be saved during training. If set to ‘all’, all checkpoints are saved. If set to False, checkpointing will be off. Setting this parameter loads the best model at the end of training.
tensorboard
Optional boolean. Parameter to write the training log. If set to ‘True’ the log will be saved at <dataset-path>/training_log which can be visualized in tensorboard. Required tensorboardx version=2.1
The default value is ‘False’.
Note
Not applicable for Text Models
monitor
Optional string. Parameter specifies which metric to monitor while checkpointing and early stopping. Defaults to ‘valid_loss’. Value should be one of the metric that is displayed in the training table. Use {model_name}.available_metrics to list the available metrics to set here.
-
freeze
() Freeze up to last layer group to train only the last layer group of the model.
This method is not supported when the backbone is configured as llm/mistral.
-
classmethod
from_model
(emd_path, data=None, **kwargs) Creates an SequenceToSequence model object from a Deep Learning Package(DLPK) or Esri Model Definition (EMD) file.
To load a custom DLPK using the model extensibility support, instantiate an object of the class using this method. ===================== =========================================== Parameter Description ——————— ——————————————- emd_path Required string. Path to Deep Learning Package
(DLPK) or Esri Model Definition(EMD) file.
——————— ——————————————- data Optional fastai Databunch. Returned data
object from
prepare_textdata
function or None for inferencing. Default value: None
-
get_model_metrics
() Calculates the following metrics:
accuracy: the number of correctly predicted labels in the validation set divided by the total number of items in the validation set
bleu-score This value indicates the similarity between model predictions and the ground truth text. Maximum value is 1
- Returns
a dictionary containing the metrics for classification model.
-
load
(name_or_path) To load a custom DLPK using the model extensibility support, instantiate an object of the class using from_model.
Loads a saved SequenceToSequence model from disk.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
name_or_path
Required string. Path to Deep Learning Package (DLPK) or Esri Model Definition(EMD) file.
-
lr_find
(allow_plot=True) Runs the Learning Rate Finder. Helps in choosing the optimum learning rate for training the model.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
allow_plot
Optional boolean. Display the plot of losses against the learning rates and mark the optimal value of the learning rate on the plot. The default value is ‘True’.
-
plot_losses
(show=True) Plot training and validation losses.
This method is not supported when the backbone is configured as llm/mistral.
Parameter
Description
show
Optional bool. Defaults to True If set to False, figure will not be plotted but will be returned, when set to True function will plot the figure and return nothing.
- Returns
-
predict
(text_or_list, batch_size=64, show_progress=True, explain=False, explain_index=None, **kwargs) Predicts the translated outcome.
Parameter
Description
text_or_list
Required input string or list of input strings.
batch_size
Optional integer. Number of inputs to be processed at once. Try reducing the batch size in case of out of memory errors. Default value : 64
show_progress
Optional bool. To show or not to show the progress of prediction task. Default value : True
explain
Optional bool. To enable shap based importance Default value : False
explain_index
Optional list. Index of the input rows for which the importance score will be generated Default value : None
kwargs
Parameter
Description
num_beams
Optional integer. Number of beams for beam search. 1 means no beam search. Default value is set to 1
max_length
Optional integer. The maximum length of the sequence to be generated. Default value is set to 20
min_length
Optional integer. The minimum length of the sequence to be generated. Default value is set to 10
- Returns
list of tuples(input , predicted output strings).
-
save
(name_or_path, framework='PyTorch', publish=False, gis=None, compute_metrics=True, save_optimizer=False, **kwargs) Saves the model weights, creates an Esri Model Definition and Deep Learning Package zip for deployment.
Parameter
Description
name_or_path
Required string. Folder path to save the model.
framework
Optional string. Defines the framework of the model. (Only supported by
SingleShotDetector
, currently.) If framework used isTF-ONNX
,batch_size
can be passed as an optional keyword argument.Framework choice: ‘PyTorch’ and ‘TF-ONNX’
publish
Optional boolean. Publishes the DLPK as an item.
gis
Optional
GIS
Object. Used for publishing the item. If not specified then active gis user is taken.compute_metrics
Optional boolean. Used for computing model metrics.
save_optimizer
Optional boolean. Used for saving the model-optimizer state along with the model. Default is set to False.
kwargs
Optional Parameters: Boolean overwrite if True, it will overwrite the item on ArcGIS Online/Enterprise, default False. Boolean zip_files if True, it will create the Deep Learning Package (DLPK) file while saving the model.
- Returns
the qualified path at which the model is saved
Inference Only Models
FillMask
-
class
arcgis.learn.text.
FillMask
(backbone=None, **kwargs) Creates a
FillMask
Object. Based on the Hugging Face transformers libraryParameter
Description
backbone
Optional string. Specify the HuggingFace transformer model name which will be used to generate the suggestion token.
To learn more about the available models for fill-mask task, kindly visit:- https://huggingface.co/models?pipeline_tag=fill-mask
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
FillMask
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
predict_token
(text_or_list, num_suggestions=5, show_progress=True) Summarize the given text or list of text
Parameter
Description
text_or_list
Required string or list. A text/sentence or a list of texts/sentences for which on wishes to generate the recommendations for masked-token.
num_suggestions
Optional Integer. The number of suggestions to return. The maximum number of suggestion that can be generated for a missing-token is 10.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
- Returns
A list or a list of list of
dict
: Each result comes as list of dictionaries with the following keys:sequence (
str
) – The corresponding input with the mask token prediction.score (
float
) – The corresponding probability.token_str (
str
) – The predicted token (to replace the masked one).
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
supported_backbones
= ['Albert', 'Bart', 'Bert', 'BigBird', 'Camembert', 'ConvBert', 'Data2VecText', 'Deberta', 'DebertaV2', 'DistilBert', 'Electra', 'Ernie', 'Esm', 'Flaubert', 'FNet', 'Funnel', 'IBert', 'LayoutLM', 'Longformer', 'Luke', 'MBart', 'Mega', 'MegatronBert', 'MobileBert', 'MPNet', 'Mra', 'Mvp', 'Nezha', 'Nystromformer', 'Perceiver', 'QDQBert', 'Reformer', 'RemBert', 'Roberta', 'RobertaPreLayerNorm', 'RoCBert', 'RoFormer', 'SqueezeBert', 'Tapas', 'Wav2Vec2', 'XLM', 'XLMRoberta', 'XLMRobertaXL', 'Xmod', 'Yoso'] supported transformer architectures
QuestionAnswering
-
class
arcgis.learn.text.
QuestionAnswering
(backbone=None, **kwargs) Creates a
QuestionAnswering
Object. Based on the Hugging Face transformers libraryParameter
Description
backbone
Optional string. Specify the HuggingFace transformer model name which will be used to extract the answers from a given passage/context.
To learn more about the available models for question-answering task, kindly visit:- https://huggingface.co/models?pipeline_tag=question-answering
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
QuestionAnswering
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
get_answer
(text_or_list, context, show_progress=True, explain=False, explain_start_word=True, explain_index=None, **kwargs) Find answers for the asked questions from the given passage/context
Parameter
Description
text_or_list
Required string or list. Questions or a list of questions one wishes to seek an answer for.
context
Required string. The context associated with the question(s) which contains the answers.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
explain
optional Bool. If set to True, will generate a shap based explanation
explain_start_word
optional Bool. E.g. Context: Point cloud datasets are typically collected using Lidar sensors ( light detection and ranging ) Question: “How is Point cloud dataset collected?” Answer: Lidar Sensors
If set to True, will generate a shap based explanation for start word. if set to False, will generate explanation for last word of the answer.
In the above example, if the value of explain_start_word is True, it will generate the importance of different context words that leads to selection of “Lidar” as a starting word of the span. If explain_start_word is set to False then it will generate explanation for the word sensors
explain_index
optional List. Index of the question for which answer needs to be generated
kwargs
Parameter
Description
num_answers
Optional integer. The number of answers to return. The answers will be chosen by order of likelihood. Default value is set to 1.
max_answer_length
Optional integer. The maximum length of the predicted answers. Default value is set to 15.
max_question_length
Optional integer. The maximum length of the question after tokenization. Questions will be truncated if needed. Default value is set to 64.
impossible_answer
Optional bool. Whether or not we accept impossible as an answer. Default value is set to False
- Returns
a list or a list of list containing the answer(s) for the input question(s)
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
supported_backbones
= ['Albert', 'Bart', 'Bert', 'BigBird', 'BigBirdPegasus', 'Bloom', 'Camembert', 'Canine', 'ConvBert', 'Data2VecText', 'Deberta', 'DebertaV2', 'DistilBert', 'Electra', 'Ernie', 'ErnieM', 'Falcon', 'Flaubert', 'FNet', 'Funnel', 'GPT2', 'GPTNeo', 'GPTNeoX', 'GPTJ', 'IBert', 'LayoutLMv2', 'LayoutLMv3', 'LED', 'Lilt', 'Longformer', 'Luke', 'Lxmert', 'MarkupLM', 'MBart', 'Mega', 'MegatronBert', 'MobileBert', 'MPNet', 'Mpt', 'Mra', 'MT5', 'Mvp', 'Nezha', 'Nystromformer', 'OPT', 'QDQBert', 'Reformer', 'RemBert', 'Roberta', 'RobertaPreLayerNorm', 'RoCBert', 'RoFormer', 'Splinter', 'SqueezeBert', 'T5', 'UMT5', 'XLM', 'XLMRoberta', 'XLMRobertaXL', 'XLNet', 'Xmod', 'Yoso'] supported transformer architectures
TextGenerator
-
class
arcgis.learn.text.
TextGenerator
(backbone=None, **kwargs) Creates a
TextGenerator
Object. Based on the Hugging Face transformers libraryParameter
Description
backbone
Optional string. Specifying the HuggingFace transformer model name which will be used to generate the text.
To learn more about the available models for text-generation task, kindly visit:- https://huggingface.co/models?pipeline_tag=text-generation
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
TextGenerator
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
generate_text
(text_or_list, show_progress=True, **kwargs) Generate text(s) for a text or a list of incomplete sentence(s)
Parameter
Description
text_or_list
Required string or list. A text/sentence or a list of texts/sentences to complete.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
kwargs
Parameter
Description
min_length
Optional integer. The minimum length of the sequence to be generated. Default value is set to to min_length parameter of the model config.
max_length
Optional integer. The maximum length of the sequence to be generated. Default value is set to max_length parameter of the model config.
num_return_sequences
Optional integer. The number of independently computed returned sequences for each element in the batch. Default value is set to 1.
num_beams
Optional integer. Number of beams for beam search. 1 means no beam search. Default value is set to 1.
length_penalty
Optional float. Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences. Default value is set to 1.0.
early_stopping
Optional bool. Whether to stop the beam search when at least
num_beams
sentences are finished per batch or not. Default value is set to False.- Returns
a list or a list of list containing the generated text for the input prompt(s) / sentence(s)
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
supported_backbones
= ['Bart', 'Bert', 'BertGeneration', 'BigBird', 'BigBirdPegasus', 'BioGpt', 'Blenderbot', 'BlenderbotSmall', 'Bloom', 'Camembert', 'Llama', 'CodeGen', 'CpmAnt', 'CTRL', 'Data2VecText', 'Electra', 'Ernie', 'Falcon', 'Fuyu', 'Git', 'GPT2', 'GPT2', 'GPTBigCode', 'GPTNeo', 'GPTNeoX', 'GPTNeoXJapanese', 'GPTJ', 'Llama', 'Marian', 'MBart', 'Mega', 'MegatronBert', 'Mistral', 'Mixtral', 'Mpt', 'Musicgen', 'Mvp', 'OpenLlama', 'OpenAIGPT', 'OPT', 'Pegasus', 'Persimmon', 'Phi', 'PLBart', 'ProphetNet', 'QDQBert', 'Reformer', 'RemBert', 'Roberta', 'RobertaPreLayerNorm', 'RoCBert', 'RoFormer', 'Rwkv', 'Speech2Text2', 'TransfoXL', 'TrOCR', 'Whisper', 'XGLM', 'XLM', 'XLMProphetNet', 'XLMRoberta', 'XLMRobertaXL', 'XLNet', 'Xmod'] supported transformer architectures
TextSummarizer
-
class
arcgis.learn.text.
TextSummarizer
(backbone=None, **kwargs) Creates a
TextSummarizer
Object. Based on the Hugging Face transformers libraryParameter
Description
backbone
Optional string. Specify the HuggingFace transformer model name which will be used to summarize the text.
To learn more about the available models for summarization task, kindly visit:- https://huggingface.co/models?pipeline_tag=summarization
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
TextSummarizer
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
summarize
(text_or_list, show_progress=True, **kwargs) Summarize the given text or list of text
Parameter
Description
text_or_list
Required string or list. A text/passage or a list of texts/passages to generate the summary for.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
kwargs
Parameter
Description
min_length
Optional integer. The minimum length of the sequence to be generated. Default value is set to to min_length parameter of the model config.
max_length
Optional integer. The maximum length of the sequence to be generated. Default value is set to to max_length parameter of the model config.
num_return_sequences
Optional integer. The number of independently computed returned sequences for each element in the batch. Default value is set to 1.
num_beams
Optional integer. Number of beams for beam search. 1 means no beam search. Default value is set to 1.
length_penalty
Optional float. Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences. Default value is set to 1.0.
early_stopping
Optional bool. Whether to stop the beam search when at least
num_beams
sentences are finished per batch or not. Default value is set to False.- Returns
a list or a list of list containing the summary/summaries for the input prompt(s) / sentence(s)
-
supported_backbones
= ['Bart', 'BigBirdPegasus', 'Blenderbot', 'BlenderbotSmall', 'EncoderDecoder', 'FSMT', 'GPTSanJapanese', 'LED', 'LongT5', 'M2M100', 'Marian', 'MBart', 'MT5', 'Mvp', 'NllbMoe', 'Pegasus', 'PegasusX', 'PLBart', 'ProphetNet', 'SeamlessM4T', 'SeamlessM4Tv2', 'SwitchTransformers', 'T5', 'UMT5', 'XLMProphetNet'] supported transformer architectures
TextTranslator
-
class
arcgis.learn.text.
TextTranslator
(source_language='es', target_language='en', **kwargs) Creates a
TextTranslator
Object. Based on the Hugging Face transformers library To learn more about the available models for translation task, kindly visit:- https://huggingface.co/models?pipeline_tag=translation&search=HelsinkiParameter
Description
source_language
Optional string. Specify the language of the text you would like to get the translation of. Default value is ‘es’ (Spanish)
target_language
Optional string. The language into which one wishes to translate the input text. Default value is ‘en’ (English)
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
TextTranslator
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
translate
(text_or_list, show_progress=True, **kwargs) Translate the given text or list of text into the target language
Parameter
Description
text_or_list
Required string or list. A text/passage or a list of texts/passages to translate.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
kwargs
Parameter
Description
min_length
Optional integer. The minimum length of the sequence to be generated. Default value is set to to min_length parameter of the model config.
max_length
Optional integer. The maximum length of the sequence to be generated. Default value is set to to max_length parameter of the model config.
num_return_sequences
Optional integer. The number of independently computed returned sequences for each element in the batch. Default value is set to 1.
num_beams
Optional integer. Number of beams for beam search. 1 means no beam search. Default value is set to 1.
length_penalty
Optional float. Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences. Default value is set to 1.0.
early_stopping
Optional bool. Whether to stop the beam search when at least
num_beams
sentences are finished per batch or not. Default value is set to False.- Returns
a list or a list of list containing the translation of the input prompt(s) / sentence(s) to the target language
ZeroShotClassifier
-
class
arcgis.learn.text.
ZeroShotClassifier
(backbone=None, **kwargs) Creates a
ZeroShotClassifier
Object. Based on the Hugging Face transformers libraryParameter
Description
backbone
Optional string. Specifying the HuggingFace transformer model name which will be used to predict the answers from a given passage/context.
To learn more about the available models for zero-shot-classification task, kindly visit:- https://huggingface.co/models?pipeline_tag=zero-shot-classification
kwargs
Parameter
Description
pretrained_path
Option str. Path to a directory, where pretrained model files are saved. If pretrained_path is provided, the model is loaded from that path on the local disk.
working_dir
Option str. Path to a directory on local filesystem. If directory is not present, it will be created. This directory is used as the location to save the model.
- Returns
ZeroShotClassifier
Object
-
classmethod
from_model
(emd_path, **kwargs) Creates an
SequenceToSequence
model object from an Esri Model Definition (EMD) file.Parameter
Description
emd_path
Required string. Path to Esri Model Definition(EMD) file or the folder with saved model files.
- Returns
SequenceToSequence
Object
-
predict
(text_or_list, candidate_labels, show_progress=True, **kwargs) Predicts the class label(s) for the input text
Parameter
Description
text_or_list
Required string or list. The sequence or a list of sequences to classify.
candidate_labels
Required string or list. The set of possible class labels to classify each sequence into. Can be a single label, a string of comma-separated labels, or a list of labels.
show_progress
optional Bool. If set to True, will display a progress bar depicting the items processed so far.
kwargs
Parameter
Description
multi_class
Optional boolean. Whether or not multiple candidate labels can be true. Default value is set to False.
hypothesis
Optional string. The template used to turn each label into an NLI-style hypothesis. This template must include a {} or similar syntax for the candidate label to be inserted into the template. Default value is set to “This example is {}.”.
- Returns
a list of
dict
: Each result comes as a dictionary with the following keys:sequence (
str
) – The sequence for which this is the output.labels (
List[str]
) – The labels sorted by order of likelihood.scores (
List[float]
) – The probabilities for each of the labels.
-
save
(name_or_path) Saves the translator model files on a specified path on the local disk.
Parameter
Description
name_or_path
Required string. Path to save model files on the local disk.
- Returns
Absolute path for the saved model
-
supported_backbones
= ['Albert', 'Bart', 'Bert', 'BigBird', 'BigBirdPegasus', 'BioGpt', 'Bloom', 'Camembert', 'Canine', 'Llama', 'ConvBert', 'CTRL', 'Data2VecText', 'Deberta', 'DebertaV2', 'DistilBert', 'Electra', 'Ernie', 'ErnieM', 'Esm', 'Falcon', 'Flaubert', 'FNet', 'Funnel', 'GPT2', 'GPT2', 'GPTBigCode', 'GPTNeo', 'GPTNeoX', 'GPTJ', 'IBert', 'LayoutLM', 'LayoutLMv2', 'LayoutLMv3', 'LED', 'Lilt', 'Llama', 'Longformer', 'Luke', 'MarkupLM', 'MBart', 'Mega', 'MegatronBert', 'Mistral', 'Mixtral', 'MobileBert', 'MPNet', 'Mpt', 'Mra', 'MT5', 'Mvp', 'Nezha', 'Nystromformer', 'OpenLlama', 'OpenAIGPT', 'OPT', 'Perceiver', 'Persimmon', 'Phi', 'PLBart', 'QDQBert', 'Reformer', 'RemBert', 'Roberta', 'RobertaPreLayerNorm', 'RoCBert', 'RoFormer', 'SqueezeBert', 'T5', 'Tapas', 'TransfoXL', 'UMT5', 'XLM', 'XLMRoberta', 'XLMRobertaXL', 'XLNet', 'Xmod', 'Yoso'] supported transformer architectures