ArcGIS API for Python | Esri Developer

Leveraging TabPFN for Human Activity Recognition Using Mobile Dataset

Introduction

Human Activity Recognition (HAR) using mobile data has become an important area of research and application due to the increasing ubiquity of smartphones, wearables, and other mobile devices that can collect a wealth of sensor data. HAR is a crucial task in various fields, including healthcare, fitness, workplace safety, and smart cities, where the goal is to classify human activities (e.g., walking, running, sitting) based on sensor data. Traditional methods for HAR often require substantial computational resources and complex hyperparameter tuning, making them difficult to deploy in real-time applications. TabPFN (Tabular Prior-Data Fitted Network), a Transformer-based model designed for fast and efficient classification of small tabular datasets, offers a promising solution to overcome these challenges.

TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers' activities in hazardous industrial environments, such as in mining and on oil rigs, ensuring safety and reducing accidents. Furthermore, in the case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications.

Necessary imports

%matplotlib inline
import matplotlib.pyplot as plt

import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

from arcgis.gis import GIS
from arcgis.learn import MLModel, prepare_tabulardata

CPU times: total: 0 ns
Wall time: 1.01 ms

Connect to ArcGIS

gis = GIS("home")

Access the datasets

Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks.

# access the training data
data_table = gis.content.get('1fafacc88bc3491696f981758a72de50')
data_table

train_har_dataset
HAR dataset

CSV by api_data_owner
Last Modified: January 10, 2025
0 comments, 3 views

# Download the train data and saving it in local folder
data_path = data_table.get_data()

# Read the downloaded data
train_har_data = pd.read_csv(data_path)
train_har_data.head(5)

	tBodyAcc-mean()-X	tBodyAcc-mean()-Y	tBodyAcc-mean()-Z	tBodyAcc-std()-X	tBodyAcc-std()-Y	tBodyAcc-std()-Z	tBodyAcc-mad()-X	tBodyAcc-mad()-Y	tBodyAcc-mad()-Z	tBodyAcc-max()-X	...	fBodyBodyGyroJerkMag-kurtosis()	angle(tBodyAccMean,gravity)	angle(tBodyAccJerkMean),gravityMean)	angle(tBodyGyroMean,gravityMean)	angle(tBodyGyroJerkMean,gravityMean)	angle(X,gravityMean)	angle(Y,gravityMean)	angle(Z,gravityMean)	subject	Activity
0	0.271144	-0.033031	-0.121829	-0.987884	-0.867081	-0.945087	-0.991858	-0.906651	-0.943951	-0.909793	...	-0.685932	0.007629	0.068842	-0.762768	-0.751408	-0.786647	0.234417	-0.040345	22	STANDING
1	0.278211	-0.020855	-0.103400	-0.996593	-0.980402	-0.988998	-0.997065	-0.977596	-0.988861	-0.941245	...	-0.771901	0.001727	0.295551	-0.035877	-0.360496	-0.661464	0.221240	0.223323	17	STANDING
2	0.276012	-0.015713	-0.103117	-0.982340	-0.834824	-0.973649	-0.986465	-0.862017	-0.976193	-0.914101	...	-0.206414	0.127391	0.028581	0.053358	0.637500	-0.826721	0.212775	-0.018280	25	STANDING
3	0.272753	-0.016910	-0.101737	-0.997409	-0.996203	-0.983416	-0.997425	-0.996439	-0.984400	-0.944557	...	-0.894735	0.096469	0.319319	0.229398	0.267721	-0.672092	0.195500	-0.194527	16	SITTING
4	0.275565	-0.014967	-0.107715	-0.995365	-0.988601	-0.988218	-0.995880	-0.989519	-0.986747	-0.937645	...	-0.888258	0.152336	0.217308	-0.377648	0.733588	-0.749599	0.048129	-0.156654	11	SITTING

5 rows × 563 columns

train_har_data.shape

(1020, 563)

Next, we will access the test dataset, which is significantly larger, containing 6,332 samples.

# access the test data
test_data_table = gis.content.get('e65312babe5b4efbaa2842235b79f653')
test_data_table

test_har_dataset
HAR dataset

CSV by api_data_owner
Last Modified: January 10, 2025
0 comments, 0 views

# Download the test data and save it to a local folder
test_data_path = test_data_table.get_data()

# read the test data
test_har_data = pd.read_csv(test_data_path).drop(["Unnamed: 0"], axis=1)
test_har_data.head(5)

	tBodyAcc-mean()-X	tBodyAcc-mean()-Y	tBodyAcc-mean()-Z	tBodyAcc-std()-X	tBodyAcc-std()-Y	tBodyAcc-std()-Z	tBodyAcc-mad()-X	tBodyAcc-mad()-Y	tBodyAcc-mad()-Z	tBodyAcc-max()-X	...	fBodyBodyGyroJerkMag-kurtosis()	angle(tBodyAccMean,gravity)	angle(tBodyAccJerkMean),gravityMean)	angle(tBodyGyroMean,gravityMean)	angle(tBodyGyroJerkMean,gravityMean)	angle(X,gravityMean)	angle(Y,gravityMean)	angle(Z,gravityMean)	subject	Activity
0	0.288585	-0.020294	-0.132905	-0.995279	-0.983111	-0.913526	-0.995112	-0.983185	-0.923527	-0.934724	...	-0.710304	-0.112754	0.030400	-0.464761	-0.018446	-0.841247	0.179941	-0.058627	1	STANDING
1	0.278419	-0.016411	-0.123520	-0.998245	-0.975300	-0.960322	-0.998807	-0.974914	-0.957686	-0.943068	...	-0.861499	0.053477	-0.007435	-0.732626	0.703511	-0.844788	0.180289	-0.054317	1	STANDING
2	0.276629	-0.016570	-0.115362	-0.998139	-0.980817	-0.990482	-0.998321	-0.979672	-0.990441	-0.942469	...	-0.699205	0.123320	0.122542	0.693578	-0.615971	-0.847865	0.185151	-0.043892	1	STANDING
3	0.277293	-0.021751	-0.120751	-0.997328	-0.961245	-0.983672	-0.997596	-0.957236	-0.984379	-0.940598	...	-0.572995	0.012954	0.080936	-0.234313	0.117797	-0.847971	0.188982	-0.037364	1	STANDING
4	0.277175	-0.014713	-0.106756	-0.999188	-0.990526	-0.993365	-0.999211	-0.990687	-0.992168	-0.943323	...	-0.765901	0.105620	-0.090278	-0.132403	0.498814	-0.849773	0.188812	-0.035063	1	STANDING

5 rows × 563 columns

test_har_data.shape

(6332, 563)

Prepare training data for TabPFN

# View column names except the columns - 'subject','Activity'
ls = list(train_har_data.columns)
X = [item for item in ls if item not in ['subject','Activity']]

['tBodyAcc-mean()-X',
 'tBodyAcc-mean()-Y',
 'tBodyAcc-mean()-Z',
 'tBodyAcc-std()-X',
 'tBodyAcc-std()-Y',
 'tBodyAcc-std()-Z',
 'tBodyAcc-mad()-X',
 'tBodyAcc-mad()-Y',
 'tBodyAcc-mad()-Z',
 'tBodyAcc-max()-X',
 'tBodyAcc-max()-Y',
 'tBodyAcc-max()-Z',
 'tBodyAcc-min()-X',
 'tBodyAcc-min()-Y',
 'tBodyAcc-min()-Z',
 'tBodyAcc-sma()',
 'tBodyAcc-energy()-X',
 'tBodyAcc-energy()-Y',
 'tBodyAcc-energy()-Z',
 'tBodyAcc-iqr()-X',
 'tBodyAcc-iqr()-Y',
 'tBodyAcc-iqr()-Z',
 'tBodyAcc-entropy()-X',
 'tBodyAcc-entropy()-Y',
 'tBodyAcc-entropy()-Z',
 'tBodyAcc-arCoeff()-X,1',
 'tBodyAcc-arCoeff()-X,2',
 'tBodyAcc-arCoeff()-X,3',
 'tBodyAcc-arCoeff()-X,4',
 'tBodyAcc-arCoeff()-Y,1',
 'tBodyAcc-arCoeff()-Y,2',
 'tBodyAcc-arCoeff()-Y,3',
 'tBodyAcc-arCoeff()-Y,4',
 'tBodyAcc-arCoeff()-Z,1',
 'tBodyAcc-arCoeff()-Z,2',
 'tBodyAcc-arCoeff()-Z,3',
 'tBodyAcc-arCoeff()-Z,4',
 'tBodyAcc-correlation()-X,Y',
 'tBodyAcc-correlation()-X,Z',
 'tBodyAcc-correlation()-Y,Z',
 'tGravityAcc-mean()-X',
 'tGravityAcc-mean()-Y',
 'tGravityAcc-mean()-Z',
 'tGravityAcc-std()-X',
 'tGravityAcc-std()-Y',
 'tGravityAcc-std()-Z',
 'tGravityAcc-mad()-X',
 'tGravityAcc-mad()-Y',
 'tGravityAcc-mad()-Z',
 'tGravityAcc-max()-X',
 'tGravityAcc-max()-Y',
 'tGravityAcc-max()-Z',
 'tGravityAcc-min()-X',
 'tGravityAcc-min()-Y',
 'tGravityAcc-min()-Z',
 'tGravityAcc-sma()',
 'tGravityAcc-energy()-X',
 'tGravityAcc-energy()-Y',
 'tGravityAcc-energy()-Z',
 'tGravityAcc-iqr()-X',
 'tGravityAcc-iqr()-Y',
 'tGravityAcc-iqr()-Z',
 'tGravityAcc-entropy()-X',
 'tGravityAcc-entropy()-Y',
 'tGravityAcc-entropy()-Z',
 'tGravityAcc-arCoeff()-X,1',
 'tGravityAcc-arCoeff()-X,2',
 'tGravityAcc-arCoeff()-X,3',
 'tGravityAcc-arCoeff()-X,4',
 'tGravityAcc-arCoeff()-Y,1',
 'tGravityAcc-arCoeff()-Y,2',
 'tGravityAcc-arCoeff()-Y,3',
 'tGravityAcc-arCoeff()-Y,4',
 'tGravityAcc-arCoeff()-Z,1',
 'tGravityAcc-arCoeff()-Z,2',
 'tGravityAcc-arCoeff()-Z,3',
 'tGravityAcc-arCoeff()-Z,4',
 'tGravityAcc-correlation()-X,Y',
 'tGravityAcc-correlation()-X,Z',
 'tGravityAcc-correlation()-Y,Z',
 'tBodyAccJerk-mean()-X',
 'tBodyAccJerk-mean()-Y',
 'tBodyAccJerk-mean()-Z',
 'tBodyAccJerk-std()-X',
 'tBodyAccJerk-std()-Y',
 'tBodyAccJerk-std()-Z',
 'tBodyAccJerk-mad()-X',
 'tBodyAccJerk-mad()-Y',
 'tBodyAccJerk-mad()-Z',
 'tBodyAccJerk-max()-X',
 'tBodyAccJerk-max()-Y',
 'tBodyAccJerk-max()-Z',
 'tBodyAccJerk-min()-X',
 'tBodyAccJerk-min()-Y',
 'tBodyAccJerk-min()-Z',
 'tBodyAccJerk-sma()',
 'tBodyAccJerk-energy()-X',
 'tBodyAccJerk-energy()-Y',
 'tBodyAccJerk-energy()-Z',
 'tBodyAccJerk-iqr()-X',
 'tBodyAccJerk-iqr()-Y',
 'tBodyAccJerk-iqr()-Z',
 'tBodyAccJerk-entropy()-X',
 'tBodyAccJerk-entropy()-Y',
 'tBodyAccJerk-entropy()-Z',
 'tBodyAccJerk-arCoeff()-X,1',
 'tBodyAccJerk-arCoeff()-X,2',
 'tBodyAccJerk-arCoeff()-X,3',
 'tBodyAccJerk-arCoeff()-X,4',
 'tBodyAccJerk-arCoeff()-Y,1',
 'tBodyAccJerk-arCoeff()-Y,2',
 'tBodyAccJerk-arCoeff()-Y,3',
 'tBodyAccJerk-arCoeff()-Y,4',
 'tBodyAccJerk-arCoeff()-Z,1',
 'tBodyAccJerk-arCoeff()-Z,2',
 'tBodyAccJerk-arCoeff()-Z,3',
 'tBodyAccJerk-arCoeff()-Z,4',
 'tBodyAccJerk-correlation()-X,Y',
 'tBodyAccJerk-correlation()-X,Z',
 'tBodyAccJerk-correlation()-Y,Z',
 'tBodyGyro-mean()-X',
 'tBodyGyro-mean()-Y',
 'tBodyGyro-mean()-Z',
 'tBodyGyro-std()-X',
 'tBodyGyro-std()-Y',
 'tBodyGyro-std()-Z',
 'tBodyGyro-mad()-X',
 'tBodyGyro-mad()-Y',
 'tBodyGyro-mad()-Z',
 'tBodyGyro-max()-X',
 'tBodyGyro-max()-Y',
 'tBodyGyro-max()-Z',
 'tBodyGyro-min()-X',
 'tBodyGyro-min()-Y',
 'tBodyGyro-min()-Z',
 'tBodyGyro-sma()',
 'tBodyGyro-energy()-X',
 'tBodyGyro-energy()-Y',
 'tBodyGyro-energy()-Z',
 'tBodyGyro-iqr()-X',
 'tBodyGyro-iqr()-Y',
 'tBodyGyro-iqr()-Z',
 'tBodyGyro-entropy()-X',
 'tBodyGyro-entropy()-Y',
 'tBodyGyro-entropy()-Z',
 'tBodyGyro-arCoeff()-X,1',
 'tBodyGyro-arCoeff()-X,2',
 'tBodyGyro-arCoeff()-X,3',
 'tBodyGyro-arCoeff()-X,4',
 'tBodyGyro-arCoeff()-Y,1',
 'tBodyGyro-arCoeff()-Y,2',
 'tBodyGyro-arCoeff()-Y,3',
 'tBodyGyro-arCoeff()-Y,4',
 'tBodyGyro-arCoeff()-Z,1',
 'tBodyGyro-arCoeff()-Z,2',
 'tBodyGyro-arCoeff()-Z,3',
 'tBodyGyro-arCoeff()-Z,4',
 'tBodyGyro-correlation()-X,Y',
 'tBodyGyro-correlation()-X,Z',
 'tBodyGyro-correlation()-Y,Z',
 'tBodyGyroJerk-mean()-X',
 'tBodyGyroJerk-mean()-Y',
 'tBodyGyroJerk-mean()-Z',
 'tBodyGyroJerk-std()-X',
 'tBodyGyroJerk-std()-Y',
 'tBodyGyroJerk-std()-Z',
 'tBodyGyroJerk-mad()-X',
 'tBodyGyroJerk-mad()-Y',
 'tBodyGyroJerk-mad()-Z',
 'tBodyGyroJerk-max()-X',
 'tBodyGyroJerk-max()-Y',
 'tBodyGyroJerk-max()-Z',
 'tBodyGyroJerk-min()-X',
 'tBodyGyroJerk-min()-Y',
 'tBodyGyroJerk-min()-Z',
 'tBodyGyroJerk-sma()',
 'tBodyGyroJerk-energy()-X',
 'tBodyGyroJerk-energy()-Y',
 'tBodyGyroJerk-energy()-Z',
 'tBodyGyroJerk-iqr()-X',
 'tBodyGyroJerk-iqr()-Y',
 'tBodyGyroJerk-iqr()-Z',
 'tBodyGyroJerk-entropy()-X',
 'tBodyGyroJerk-entropy()-Y',
 'tBodyGyroJerk-entropy()-Z',
 'tBodyGyroJerk-arCoeff()-X,1',
 'tBodyGyroJerk-arCoeff()-X,2',
 'tBodyGyroJerk-arCoeff()-X,3',
 'tBodyGyroJerk-arCoeff()-X,4',
 'tBodyGyroJerk-arCoeff()-Y,1',
 'tBodyGyroJerk-arCoeff()-Y,2',
 'tBodyGyroJerk-arCoeff()-Y,3',
 'tBodyGyroJerk-arCoeff()-Y,4',
 'tBodyGyroJerk-arCoeff()-Z,1',
 'tBodyGyroJerk-arCoeff()-Z,2',
 'tBodyGyroJerk-arCoeff()-Z,3',
 'tBodyGyroJerk-arCoeff()-Z,4',
 'tBodyGyroJerk-correlation()-X,Y',
 'tBodyGyroJerk-correlation()-X,Z',
 'tBodyGyroJerk-correlation()-Y,Z',
 'tBodyAccMag-mean()',
 'tBodyAccMag-std()',
 'tBodyAccMag-mad()',
 'tBodyAccMag-max()',
 'tBodyAccMag-min()',
 'tBodyAccMag-sma()',
 'tBodyAccMag-energy()',
 'tBodyAccMag-iqr()',
 'tBodyAccMag-entropy()',
 'tBodyAccMag-arCoeff()1',
 'tBodyAccMag-arCoeff()2',
 'tBodyAccMag-arCoeff()3',
 'tBodyAccMag-arCoeff()4',
 'tGravityAccMag-mean()',
 'tGravityAccMag-std()',
 'tGravityAccMag-mad()',
 'tGravityAccMag-max()',
 'tGravityAccMag-min()',
 'tGravityAccMag-sma()',
 'tGravityAccMag-energy()',
 'tGravityAccMag-iqr()',
 'tGravityAccMag-entropy()',
 'tGravityAccMag-arCoeff()1',
 'tGravityAccMag-arCoeff()2',
 'tGravityAccMag-arCoeff()3',
 'tGravityAccMag-arCoeff()4',
 'tBodyAccJerkMag-mean()',
 'tBodyAccJerkMag-std()',
 'tBodyAccJerkMag-mad()',
 'tBodyAccJerkMag-max()',
 'tBodyAccJerkMag-min()',
 'tBodyAccJerkMag-sma()',
 'tBodyAccJerkMag-energy()',
 'tBodyAccJerkMag-iqr()',
 'tBodyAccJerkMag-entropy()',
 'tBodyAccJerkMag-arCoeff()1',
 'tBodyAccJerkMag-arCoeff()2',
 'tBodyAccJerkMag-arCoeff()3',
 'tBodyAccJerkMag-arCoeff()4',
 'tBodyGyroMag-mean()',
 'tBodyGyroMag-std()',
 'tBodyGyroMag-mad()',
 'tBodyGyroMag-max()',
 'tBodyGyroMag-min()',
 'tBodyGyroMag-sma()',
 'tBodyGyroMag-energy()',
 'tBodyGyroMag-iqr()',
 'tBodyGyroMag-entropy()',
 'tBodyGyroMag-arCoeff()1',
 'tBodyGyroMag-arCoeff()2',
 'tBodyGyroMag-arCoeff()3',
 'tBodyGyroMag-arCoeff()4',
 'tBodyGyroJerkMag-mean()',
 'tBodyGyroJerkMag-std()',
 'tBodyGyroJerkMag-mad()',
 'tBodyGyroJerkMag-max()',
 'tBodyGyroJerkMag-min()',
 'tBodyGyroJerkMag-sma()',
 'tBodyGyroJerkMag-energy()',
 'tBodyGyroJerkMag-iqr()',
 'tBodyGyroJerkMag-entropy()',
 'tBodyGyroJerkMag-arCoeff()1',
 'tBodyGyroJerkMag-arCoeff()2',
 'tBodyGyroJerkMag-arCoeff()3',
 'tBodyGyroJerkMag-arCoeff()4',
 'fBodyAcc-mean()-X',
 'fBodyAcc-mean()-Y',
 'fBodyAcc-mean()-Z',
 'fBodyAcc-std()-X',
 'fBodyAcc-std()-Y',
 'fBodyAcc-std()-Z',
 'fBodyAcc-mad()-X',
 'fBodyAcc-mad()-Y',
 'fBodyAcc-mad()-Z',
 'fBodyAcc-max()-X',
 'fBodyAcc-max()-Y',
 'fBodyAcc-max()-Z',
 'fBodyAcc-min()-X',
 'fBodyAcc-min()-Y',
 'fBodyAcc-min()-Z',
 'fBodyAcc-sma()',
 'fBodyAcc-energy()-X',
 'fBodyAcc-energy()-Y',
 'fBodyAcc-energy()-Z',
 'fBodyAcc-iqr()-X',
 'fBodyAcc-iqr()-Y',
 'fBodyAcc-iqr()-Z',
 'fBodyAcc-entropy()-X',
 'fBodyAcc-entropy()-Y',
 'fBodyAcc-entropy()-Z',
 'fBodyAcc-maxInds-X',
 'fBodyAcc-maxInds-Y',
 'fBodyAcc-maxInds-Z',
 'fBodyAcc-meanFreq()-X',
 'fBodyAcc-meanFreq()-Y',
 'fBodyAcc-meanFreq()-Z',
 'fBodyAcc-skewness()-X',
 'fBodyAcc-kurtosis()-X',
 'fBodyAcc-skewness()-Y',
 'fBodyAcc-kurtosis()-Y',
 'fBodyAcc-skewness()-Z',
 'fBodyAcc-kurtosis()-Z',
 'fBodyAcc-bandsEnergy()-1,8',
 'fBodyAcc-bandsEnergy()-9,16',
 'fBodyAcc-bandsEnergy()-17,24',
 'fBodyAcc-bandsEnergy()-25,32',
 'fBodyAcc-bandsEnergy()-33,40',
 'fBodyAcc-bandsEnergy()-41,48',
 'fBodyAcc-bandsEnergy()-49,56',
 'fBodyAcc-bandsEnergy()-57,64',
 'fBodyAcc-bandsEnergy()-1,16',
 'fBodyAcc-bandsEnergy()-17,32',
 'fBodyAcc-bandsEnergy()-33,48',
 'fBodyAcc-bandsEnergy()-49,64',
 'fBodyAcc-bandsEnergy()-1,24',
 'fBodyAcc-bandsEnergy()-25,48',
 'fBodyAcc-bandsEnergy()-1,8.1',
 'fBodyAcc-bandsEnergy()-9,16.1',
 'fBodyAcc-bandsEnergy()-17,24.1',
 'fBodyAcc-bandsEnergy()-25,32.1',
 'fBodyAcc-bandsEnergy()-33,40.1',
 'fBodyAcc-bandsEnergy()-41,48.1',
 'fBodyAcc-bandsEnergy()-49,56.1',
 'fBodyAcc-bandsEnergy()-57,64.1',
 'fBodyAcc-bandsEnergy()-1,16.1',
 'fBodyAcc-bandsEnergy()-17,32.1',
 'fBodyAcc-bandsEnergy()-33,48.1',
 'fBodyAcc-bandsEnergy()-49,64.1',
 'fBodyAcc-bandsEnergy()-1,24.1',
 'fBodyAcc-bandsEnergy()-25,48.1',
 'fBodyAcc-bandsEnergy()-1,8.2',
 'fBodyAcc-bandsEnergy()-9,16.2',
 'fBodyAcc-bandsEnergy()-17,24.2',
 'fBodyAcc-bandsEnergy()-25,32.2',
 'fBodyAcc-bandsEnergy()-33,40.2',
 'fBodyAcc-bandsEnergy()-41,48.2',
 'fBodyAcc-bandsEnergy()-49,56.2',
 'fBodyAcc-bandsEnergy()-57,64.2',
 'fBodyAcc-bandsEnergy()-1,16.2',
 'fBodyAcc-bandsEnergy()-17,32.2',
 'fBodyAcc-bandsEnergy()-33,48.2',
 'fBodyAcc-bandsEnergy()-49,64.2',
 'fBodyAcc-bandsEnergy()-1,24.2',
 'fBodyAcc-bandsEnergy()-25,48.2',
 'fBodyAccJerk-mean()-X',
 'fBodyAccJerk-mean()-Y',
 'fBodyAccJerk-mean()-Z',
 'fBodyAccJerk-std()-X',
 'fBodyAccJerk-std()-Y',
 'fBodyAccJerk-std()-Z',
 'fBodyAccJerk-mad()-X',
 'fBodyAccJerk-mad()-Y',
 'fBodyAccJerk-mad()-Z',
 'fBodyAccJerk-max()-X',
 'fBodyAccJerk-max()-Y',
 'fBodyAccJerk-max()-Z',
 'fBodyAccJerk-min()-X',
 'fBodyAccJerk-min()-Y',
 'fBodyAccJerk-min()-Z',
 'fBodyAccJerk-sma()',
 'fBodyAccJerk-energy()-X',
 'fBodyAccJerk-energy()-Y',
 'fBodyAccJerk-energy()-Z',
 'fBodyAccJerk-iqr()-X',
 'fBodyAccJerk-iqr()-Y',
 'fBodyAccJerk-iqr()-Z',
 'fBodyAccJerk-entropy()-X',
 'fBodyAccJerk-entropy()-Y',
 'fBodyAccJerk-entropy()-Z',
 'fBodyAccJerk-maxInds-X',
 'fBodyAccJerk-maxInds-Y',
 'fBodyAccJerk-maxInds-Z',
 'fBodyAccJerk-meanFreq()-X',
 'fBodyAccJerk-meanFreq()-Y',
 'fBodyAccJerk-meanFreq()-Z',
 'fBodyAccJerk-skewness()-X',
 'fBodyAccJerk-kurtosis()-X',
 'fBodyAccJerk-skewness()-Y',
 'fBodyAccJerk-kurtosis()-Y',
 'fBodyAccJerk-skewness()-Z',
 'fBodyAccJerk-kurtosis()-Z',
 'fBodyAccJerk-bandsEnergy()-1,8',
 'fBodyAccJerk-bandsEnergy()-9,16',
 'fBodyAccJerk-bandsEnergy()-17,24',
 'fBodyAccJerk-bandsEnergy()-25,32',
 'fBodyAccJerk-bandsEnergy()-33,40',
 'fBodyAccJerk-bandsEnergy()-41,48',
 'fBodyAccJerk-bandsEnergy()-49,56',
 'fBodyAccJerk-bandsEnergy()-57,64',
 'fBodyAccJerk-bandsEnergy()-1,16',
 'fBodyAccJerk-bandsEnergy()-17,32',
 'fBodyAccJerk-bandsEnergy()-33,48',
 'fBodyAccJerk-bandsEnergy()-49,64',
 'fBodyAccJerk-bandsEnergy()-1,24',
 'fBodyAccJerk-bandsEnergy()-25,48',
 'fBodyAccJerk-bandsEnergy()-1,8.1',
 'fBodyAccJerk-bandsEnergy()-9,16.1',
 'fBodyAccJerk-bandsEnergy()-17,24.1',
 'fBodyAccJerk-bandsEnergy()-25,32.1',
 'fBodyAccJerk-bandsEnergy()-33,40.1',
 'fBodyAccJerk-bandsEnergy()-41,48.1',
 'fBodyAccJerk-bandsEnergy()-49,56.1',
 'fBodyAccJerk-bandsEnergy()-57,64.1',
 'fBodyAccJerk-bandsEnergy()-1,16.1',
 'fBodyAccJerk-bandsEnergy()-17,32.1',
 'fBodyAccJerk-bandsEnergy()-33,48.1',
 'fBodyAccJerk-bandsEnergy()-49,64.1',
 'fBodyAccJerk-bandsEnergy()-1,24.1',
 'fBodyAccJerk-bandsEnergy()-25,48.1',
 'fBodyAccJerk-bandsEnergy()-1,8.2',
 'fBodyAccJerk-bandsEnergy()-9,16.2',
 'fBodyAccJerk-bandsEnergy()-17,24.2',
 'fBodyAccJerk-bandsEnergy()-25,32.2',
 'fBodyAccJerk-bandsEnergy()-33,40.2',
 'fBodyAccJerk-bandsEnergy()-41,48.2',
 'fBodyAccJerk-bandsEnergy()-49,56.2',
 'fBodyAccJerk-bandsEnergy()-57,64.2',
 'fBodyAccJerk-bandsEnergy()-1,16.2',
 'fBodyAccJerk-bandsEnergy()-17,32.2',
 'fBodyAccJerk-bandsEnergy()-33,48.2',
 'fBodyAccJerk-bandsEnergy()-49,64.2',
 'fBodyAccJerk-bandsEnergy()-1,24.2',
 'fBodyAccJerk-bandsEnergy()-25,48.2',
 'fBodyGyro-mean()-X',
 'fBodyGyro-mean()-Y',
 'fBodyGyro-mean()-Z',
 'fBodyGyro-std()-X',
 'fBodyGyro-std()-Y',
 'fBodyGyro-std()-Z',
 'fBodyGyro-mad()-X',
 'fBodyGyro-mad()-Y',
 'fBodyGyro-mad()-Z',
 'fBodyGyro-max()-X',
 'fBodyGyro-max()-Y',
 'fBodyGyro-max()-Z',
 'fBodyGyro-min()-X',
 'fBodyGyro-min()-Y',
 'fBodyGyro-min()-Z',
 'fBodyGyro-sma()',
 'fBodyGyro-energy()-X',
 'fBodyGyro-energy()-Y',
 'fBodyGyro-energy()-Z',
 'fBodyGyro-iqr()-X',
 'fBodyGyro-iqr()-Y',
 'fBodyGyro-iqr()-Z',
 'fBodyGyro-entropy()-X',
 'fBodyGyro-entropy()-Y',
 'fBodyGyro-entropy()-Z',
 'fBodyGyro-maxInds-X',
 'fBodyGyro-maxInds-Y',
 'fBodyGyro-maxInds-Z',
 'fBodyGyro-meanFreq()-X',
 'fBodyGyro-meanFreq()-Y',
 'fBodyGyro-meanFreq()-Z',
 'fBodyGyro-skewness()-X',
 'fBodyGyro-kurtosis()-X',
 'fBodyGyro-skewness()-Y',
 'fBodyGyro-kurtosis()-Y',
 'fBodyGyro-skewness()-Z',
 'fBodyGyro-kurtosis()-Z',
 'fBodyGyro-bandsEnergy()-1,8',
 'fBodyGyro-bandsEnergy()-9,16',
 'fBodyGyro-bandsEnergy()-17,24',
 'fBodyGyro-bandsEnergy()-25,32',
 'fBodyGyro-bandsEnergy()-33,40',
 'fBodyGyro-bandsEnergy()-41,48',
 'fBodyGyro-bandsEnergy()-49,56',
 'fBodyGyro-bandsEnergy()-57,64',
 'fBodyGyro-bandsEnergy()-1,16',
 'fBodyGyro-bandsEnergy()-17,32',
 'fBodyGyro-bandsEnergy()-33,48',
 'fBodyGyro-bandsEnergy()-49,64',
 'fBodyGyro-bandsEnergy()-1,24',
 'fBodyGyro-bandsEnergy()-25,48',
 'fBodyGyro-bandsEnergy()-1,8.1',
 'fBodyGyro-bandsEnergy()-9,16.1',
 'fBodyGyro-bandsEnergy()-17,24.1',
 'fBodyGyro-bandsEnergy()-25,32.1',
 'fBodyGyro-bandsEnergy()-33,40.1',
 'fBodyGyro-bandsEnergy()-41,48.1',
 'fBodyGyro-bandsEnergy()-49,56.1',
 'fBodyGyro-bandsEnergy()-57,64.1',
 'fBodyGyro-bandsEnergy()-1,16.1',
 'fBodyGyro-bandsEnergy()-17,32.1',
 'fBodyGyro-bandsEnergy()-33,48.1',
 'fBodyGyro-bandsEnergy()-49,64.1',
 'fBodyGyro-bandsEnergy()-1,24.1',
 'fBodyGyro-bandsEnergy()-25,48.1',
 'fBodyGyro-bandsEnergy()-1,8.2',
 'fBodyGyro-bandsEnergy()-9,16.2',
 'fBodyGyro-bandsEnergy()-17,24.2',
 'fBodyGyro-bandsEnergy()-25,32.2',
 'fBodyGyro-bandsEnergy()-33,40.2',
 'fBodyGyro-bandsEnergy()-41,48.2',
 'fBodyGyro-bandsEnergy()-49,56.2',
 'fBodyGyro-bandsEnergy()-57,64.2',
 'fBodyGyro-bandsEnergy()-1,16.2',
 'fBodyGyro-bandsEnergy()-17,32.2',
 'fBodyGyro-bandsEnergy()-33,48.2',
 'fBodyGyro-bandsEnergy()-49,64.2',
 'fBodyGyro-bandsEnergy()-1,24.2',
 'fBodyGyro-bandsEnergy()-25,48.2',
 'fBodyAccMag-mean()',
 'fBodyAccMag-std()',
 'fBodyAccMag-mad()',
 'fBodyAccMag-max()',
 'fBodyAccMag-min()',
 'fBodyAccMag-sma()',
 'fBodyAccMag-energy()',
 'fBodyAccMag-iqr()',
 'fBodyAccMag-entropy()',
 'fBodyAccMag-maxInds',
 'fBodyAccMag-meanFreq()',
 'fBodyAccMag-skewness()',
 'fBodyAccMag-kurtosis()',
 'fBodyBodyAccJerkMag-mean()',
 'fBodyBodyAccJerkMag-std()',
 'fBodyBodyAccJerkMag-mad()',
 'fBodyBodyAccJerkMag-max()',
 'fBodyBodyAccJerkMag-min()',
 'fBodyBodyAccJerkMag-sma()',
 'fBodyBodyAccJerkMag-energy()',
 'fBodyBodyAccJerkMag-iqr()',
 'fBodyBodyAccJerkMag-entropy()',
 'fBodyBodyAccJerkMag-maxInds',
 'fBodyBodyAccJerkMag-meanFreq()',
 'fBodyBodyAccJerkMag-skewness()',
 'fBodyBodyAccJerkMag-kurtosis()',
 'fBodyBodyGyroMag-mean()',
 'fBodyBodyGyroMag-std()',
 'fBodyBodyGyroMag-mad()',
 'fBodyBodyGyroMag-max()',
 'fBodyBodyGyroMag-min()',
 'fBodyBodyGyroMag-sma()',
 'fBodyBodyGyroMag-energy()',
 'fBodyBodyGyroMag-iqr()',
 'fBodyBodyGyroMag-entropy()',
 'fBodyBodyGyroMag-maxInds',
 'fBodyBodyGyroMag-meanFreq()',
 'fBodyBodyGyroMag-skewness()',
 'fBodyBodyGyroMag-kurtosis()',
 'fBodyBodyGyroJerkMag-mean()',
 'fBodyBodyGyroJerkMag-std()',
 'fBodyBodyGyroJerkMag-mad()',
 'fBodyBodyGyroJerkMag-max()',
 'fBodyBodyGyroJerkMag-min()',
 'fBodyBodyGyroJerkMag-sma()',
 'fBodyBodyGyroJerkMag-energy()',
 'fBodyBodyGyroJerkMag-iqr()',
 'fBodyBodyGyroJerkMag-entropy()',
 'fBodyBodyGyroJerkMag-maxInds',
 'fBodyBodyGyroJerkMag-meanFreq()',
 'fBodyBodyGyroJerkMag-skewness()',
 'fBodyBodyGyroJerkMag-kurtosis()',
 'angle(tBodyAccMean,gravity)',
 'angle(tBodyAccJerkMean),gravityMean)',
 'angle(tBodyGyroMean,gravityMean)',
 'angle(tBodyGyroJerkMean,gravityMean)',
 'angle(X,gravityMean)',
 'angle(Y,gravityMean)',
 'angle(Z,gravityMean)']

len(X)

Data preprocessing for TabPFN classifier model

To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 561 to below the TabPFN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions.

# Data processing to reduce the features to 100 or less as required for TabPFN models
X = train_har_data.drop(columns=['Activity'])
y = train_har_data['Activity']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1))
X_reduced_lda = lda.fit_transform(X_scaled, y)
X_train_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])
X_train_lda_df['Activity'] = y.reset_index(drop=True)
X_train_lda_df.shape

(1020, 6)

# Visualize the final processed training data columns
X_train_lda_df.columns

Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')

We define the explanatory variables as follows: In the training dataframe above we use Activity as the target label to be predicted, using the rest of the features as explanatory variables X. We define the explanatory variables as follows:

# define the explanatory vairables
X = list(X_train_lda_df.columns)
X =X[:-1]
len(X)

Once the explanatory variables X are defined, they are used as input in the prepare_tabulardata method from the tabular learner in arcgis.learn. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.

The input parameters required for the tool are used as follows:

data = prepare_tabulardata(X_train_lda_df, 'Activity', explanatory_variables=X)

Visualize training data

To get a sense of what the training data looks like, the show_batch() method will randomly pick a few training samples and visualize them. The samples show the explanatory variables and the Activity target label to predict.

data.show_batch(rows=5)

	Activity	LDA1	LDA2	LDA3	LDA4	LDA5
28	STANDING	-19.431367	-11.174415	0.816325	-0.687153	2.809134
130	SITTING	-18.377005	-9.429310	0.342490	-0.757008	-4.074501
311	WALKING	17.852727	-1.390446	-8.164604	4.176480	-0.160302
734	WALKING_UPSTAIRS	23.633640	2.301121	2.475455	-10.447339	-0.302447
847	LAYING	-26.620913	14.783496	-0.705847	0.511383	-0.568820

Model training

First, we initialize the model as follows:

Model initialization

The default, initialization of the TabPFN classifier model object is shown below:

tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier', device='cpu', N_ensemble_configurations=32)

Fit the model

Next, we will train the model.

tabpfn_classifier.fit()

tabpfn_classifier.score()

0.9901960784313726

We can see the model score is showing excellent results.

Visualize results in validation set

It is a good practice to see the results of the model viz-a-viz ground truth. The code below picks random samples and shows us the Activity which is the ground truth or target state and model predicted Activity_results side by side. This enables us to preview the results of the model we trained.

tabpfn_classifier.show_results()

	Activity	LDA1	LDA2	LDA3	LDA4	LDA5	Activity_results
101	LAYING	-27.925746	17.698939	-0.211993	0.137931	-0.872844	LAYING
299	WALKING	22.230659	0.572533	-8.318169	1.275016	0.139479	WALKING
693	SITTING	-18.977440	-10.123049	0.551960	-0.028800	-3.010251	SITTING
884	WALKING	23.685543	-0.261172	-11.338481	3.464041	-0.628183	WALKING
967	SITTING	-18.285170	-11.017798	1.873937	-0.702550	-1.983438	SITTING

Predict using the TabPFN classifier model

Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples.

Before using the trained TabPFN model to predict the classes of the test dataset, we will first apply Linear Discriminant Analysis (LDA) to reduce the test data to the same feature space as the training data. This ensures consistency between the training and test datasets, enabling the trained TabPFN model to effectively classify the larger test sample.

# Align tset data with the train data format 
X = test_har_data.drop(columns=['Activity'])
y = test_har_data['Activity']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1)) 
X_reduced_lda = lda.fit_transform(X_scaled, y)
X_test_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])
X_test_lda_df['Activity'] = y.reset_index(drop=True)
print(X_test_lda_df.shape)

(6332, 6)

X_test_lda_df.head(5)

	LDA1	LDA2	LDA3	LDA4	LDA5	Activity
0	-10.188443	-8.641377	0.606669	1.120983	3.836137	STANDING
1	-9.735631	-6.716675	0.537841	-0.543385	2.295157	STANDING
2	-8.954351	-7.376296	0.798942	-0.507465	2.508069	STANDING
3	-10.400401	-7.267321	1.035134	0.272738	2.034312	STANDING
4	-9.596161	-6.980061	0.480017	-0.284537	1.103180	STANDING

Predict

activity_predicted_tabpfn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')

activity_predicted_tabpfn.tail(5)

	LDA1	LDA2	LDA3	LDA4	LDA5	Activity	prediction_results
6327	18.095783	2.593775	8.704337	5.424257	0.443448	WALKING_DOWNSTAIRS	WALKING_DOWNSTAIRS
6328	17.094286	1.997284	5.270752	2.839847	0.550822	WALKING_DOWNSTAIRS	WALKING_DOWNSTAIRS
6329	15.909594	1.537803	4.887237	4.771153	-0.157321	WALKING_DOWNSTAIRS	WALKING_DOWNSTAIRS
6330	11.944985	0.834660	0.116338	-6.285236	0.045984	WALKING_UPSTAIRS	WALKING_UPSTAIRS
6331	14.575570	1.737412	-0.866397	-5.458011	-1.150021	WALKING_UPSTAIRS	WALKING_UPSTAIRS

Accuracy assessment

Next, we will evaluate the model's performance. This will print out multiple model metrics that we can use to assess the model quality. These metrics include a combination of multiple evaluation criteria, such as accuracy, precision, recall and F1-Score, which collectively measure the model's performance on the validation set.

# Extract ground truth and predictions
y_true = activity_predicted_tabpfn['Activity']
y_pred = activity_predicted_tabpfn['prediction_results']

# Calculate Accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Calculate Precision 
precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'Precision: {precision:.2f}')

# Calculate Recall 
recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'Recall: {recall:.2f}')

# Calculate F1-Score 
f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'F1 Score: {f1:.2f}')

# classification_report 
print("\nClassification Report:")
print(classification_report(y_true, y_pred))

Accuracy: 96.83%
Precision: 0.97
Recall: 0.97
F1 Score: 0.97

Classification Report:
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00      1219
           SITTING       1.00      0.83      0.91      1119
          STANDING       0.86      0.99      0.93      1197
           WALKING       1.00      1.00      1.00      1031
WALKING_DOWNSTAIRS       1.00      0.99      1.00       835
  WALKING_UPSTAIRS       0.99      1.00      1.00       931

          accuracy                           0.97      6332
         macro avg       0.97      0.97      0.97      6332
      weighted avg       0.97      0.97      0.97      6332

The performance metrics obtained from the trained TabPFN model on the test dataset of 6,332 samples indicate excellent classification quality.

Accuracy (96.81%) : The model correctly classified approximately 97% of the samples, which is a strong indication of its ability to generalize well to unseen data, despite being trained on a smaller dataset of just 1,020 samples.

Precision (0.97) : Precision measures the proportion of true positive predictions among all positive predictions made by the model. A precision of 0.97 means that 97% of the predicted positive activity classes are correct, indicating that the model rarely makes false positive errors.

Recall (0.97) : Recall represents the model's ability to correctly identify all relevant instances of a class. A recall of 0.97 means that the model correctly identifies 97% of all actual positive instances, with minimal false negatives.

F1 Score (0.97) : The F1 Score is the harmonic mean of precision and recall, and a value of 0.97 shows that the model balances precision and recall very well. This indicates that the model is both highly accurate and sensitive in detecting the correct activity classes.

Overall, these metrics demonstrate that the TabPFN model performs exceptionally well, achieving near-perfect classification with minimal errors. This performance is particularly impressive given that it was trained on a relatively small sample size of 1,020 data points, highlighting its efficiency and effectiveness in handling human activity recognition tasks.

Conclusion

This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, and precision, recall, and F1 scores all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems.

TabPFN license information

License Description
Built with TabPFN - tabpfn.TabPFNClassifier https://priorlabs.ai/tabpfn-license