Leveraging TabPFN for Human Activity Recognition Using Mobile Dataset
Introduction
Human Activity Recognition (HAR) using mobile data has become an important area of research and application due to the increasing ubiquity of smartphones, wearables, and other mobile devices that can collect a wealth of sensor data. HAR is a crucial task in various fields, including healthcare, fitness, workplace safety, and smart cities, where the goal is to classify human activities (e.g., walking, running, sitting) based on sensor data. Traditional methods for HAR often require substantial computational resources and complex hyperparameter tuning, making them difficult to deploy in real-time applications. TabPFN (Tabular Prior-Data Fitted Network), a Transformer-based model designed for fast and efficient classification of small tabular datasets, offers a promising solution to overcome these challenges.
TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers' activities in hazardous industrial environments, such as in mining and on oil rigs, ensuring safety and reducing accidents. Furthermore, in the case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications.
Necessary imports
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from arcgis.gis import GIS
from arcgis.learn import MLModel, prepare_tabulardataCPU times: total: 0 ns Wall time: 1.01 ms
Connect to ArcGIS
gis = GIS("home")Access the datasets
Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks.
# access the training data
data_table = gis.content.get('1fafacc88bc3491696f981758a72de50')
data_table# Download the train data and saving it in local folder
data_path = data_table.get_data()# Read the downloaded data
train_har_data = pd.read_csv(data_path)
train_har_data.head(5)| tBodyAcc-mean()-X | tBodyAcc-mean()-Y | tBodyAcc-mean()-Z | tBodyAcc-std()-X | tBodyAcc-std()-Y | tBodyAcc-std()-Z | tBodyAcc-mad()-X | tBodyAcc-mad()-Y | tBodyAcc-mad()-Z | tBodyAcc-max()-X | ... | fBodyBodyGyroJerkMag-kurtosis() | angle(tBodyAccMean,gravity) | angle(tBodyAccJerkMean),gravityMean) | angle(tBodyGyroMean,gravityMean) | angle(tBodyGyroJerkMean,gravityMean) | angle(X,gravityMean) | angle(Y,gravityMean) | angle(Z,gravityMean) | subject | Activity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.271144 | -0.033031 | -0.121829 | -0.987884 | -0.867081 | -0.945087 | -0.991858 | -0.906651 | -0.943951 | -0.909793 | ... | -0.685932 | 0.007629 | 0.068842 | -0.762768 | -0.751408 | -0.786647 | 0.234417 | -0.040345 | 22 | STANDING |
| 1 | 0.278211 | -0.020855 | -0.103400 | -0.996593 | -0.980402 | -0.988998 | -0.997065 | -0.977596 | -0.988861 | -0.941245 | ... | -0.771901 | 0.001727 | 0.295551 | -0.035877 | -0.360496 | -0.661464 | 0.221240 | 0.223323 | 17 | STANDING |
| 2 | 0.276012 | -0.015713 | -0.103117 | -0.982340 | -0.834824 | -0.973649 | -0.986465 | -0.862017 | -0.976193 | -0.914101 | ... | -0.206414 | 0.127391 | 0.028581 | 0.053358 | 0.637500 | -0.826721 | 0.212775 | -0.018280 | 25 | STANDING |
| 3 | 0.272753 | -0.016910 | -0.101737 | -0.997409 | -0.996203 | -0.983416 | -0.997425 | -0.996439 | -0.984400 | -0.944557 | ... | -0.894735 | 0.096469 | 0.319319 | 0.229398 | 0.267721 | -0.672092 | 0.195500 | -0.194527 | 16 | SITTING |
| 4 | 0.275565 | -0.014967 | -0.107715 | -0.995365 | -0.988601 | -0.988218 | -0.995880 | -0.989519 | -0.986747 | -0.937645 | ... | -0.888258 | 0.152336 | 0.217308 | -0.377648 | 0.733588 | -0.749599 | 0.048129 | -0.156654 | 11 | SITTING |
5 rows × 563 columns
train_har_data.shape(1020, 563)
Next, we will access the test dataset, which is significantly larger, containing 6,332 samples.
# access the test data
test_data_table = gis.content.get('e65312babe5b4efbaa2842235b79f653')
test_data_table# Download the test data and save it to a local folder
test_data_path = test_data_table.get_data()# read the test data
test_har_data = pd.read_csv(test_data_path).drop(["Unnamed: 0"], axis=1)
test_har_data.head(5)| tBodyAcc-mean()-X | tBodyAcc-mean()-Y | tBodyAcc-mean()-Z | tBodyAcc-std()-X | tBodyAcc-std()-Y | tBodyAcc-std()-Z | tBodyAcc-mad()-X | tBodyAcc-mad()-Y | tBodyAcc-mad()-Z | tBodyAcc-max()-X | ... | fBodyBodyGyroJerkMag-kurtosis() | angle(tBodyAccMean,gravity) | angle(tBodyAccJerkMean),gravityMean) | angle(tBodyGyroMean,gravityMean) | angle(tBodyGyroJerkMean,gravityMean) | angle(X,gravityMean) | angle(Y,gravityMean) | angle(Z,gravityMean) | subject | Activity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.288585 | -0.020294 | -0.132905 | -0.995279 | -0.983111 | -0.913526 | -0.995112 | -0.983185 | -0.923527 | -0.934724 | ... | -0.710304 | -0.112754 | 0.030400 | -0.464761 | -0.018446 | -0.841247 | 0.179941 | -0.058627 | 1 | STANDING |
| 1 | 0.278419 | -0.016411 | -0.123520 | -0.998245 | -0.975300 | -0.960322 | -0.998807 | -0.974914 | -0.957686 | -0.943068 | ... | -0.861499 | 0.053477 | -0.007435 | -0.732626 | 0.703511 | -0.844788 | 0.180289 | -0.054317 | 1 | STANDING |
| 2 | 0.276629 | -0.016570 | -0.115362 | -0.998139 | -0.980817 | -0.990482 | -0.998321 | -0.979672 | -0.990441 | -0.942469 | ... | -0.699205 | 0.123320 | 0.122542 | 0.693578 | -0.615971 | -0.847865 | 0.185151 | -0.043892 | 1 | STANDING |
| 3 | 0.277293 | -0.021751 | -0.120751 | -0.997328 | -0.961245 | -0.983672 | -0.997596 | -0.957236 | -0.984379 | -0.940598 | ... | -0.572995 | 0.012954 | 0.080936 | -0.234313 | 0.117797 | -0.847971 | 0.188982 | -0.037364 | 1 | STANDING |
| 4 | 0.277175 | -0.014713 | -0.106756 | -0.999188 | -0.990526 | -0.993365 | -0.999211 | -0.990687 | -0.992168 | -0.943323 | ... | -0.765901 | 0.105620 | -0.090278 | -0.132403 | 0.498814 | -0.849773 | 0.188812 | -0.035063 | 1 | STANDING |
5 rows × 563 columns
test_har_data.shape(6332, 563)
Prepare training data for TabPFN
# View column names except the columns - 'subject','Activity'
ls = list(train_har_data.columns)
X = [item for item in ls if item not in ['subject','Activity']]X['tBodyAcc-mean()-X', 'tBodyAcc-mean()-Y', 'tBodyAcc-mean()-Z', 'tBodyAcc-std()-X', 'tBodyAcc-std()-Y', 'tBodyAcc-std()-Z', 'tBodyAcc-mad()-X', 'tBodyAcc-mad()-Y', 'tBodyAcc-mad()-Z', 'tBodyAcc-max()-X', 'tBodyAcc-max()-Y', 'tBodyAcc-max()-Z', 'tBodyAcc-min()-X', 'tBodyAcc-min()-Y', 'tBodyAcc-min()-Z', 'tBodyAcc-sma()', 'tBodyAcc-energy()-X', 'tBodyAcc-energy()-Y', 'tBodyAcc-energy()-Z', 'tBodyAcc-iqr()-X', 'tBodyAcc-iqr()-Y', 'tBodyAcc-iqr()-Z', 'tBodyAcc-entropy()-X', 'tBodyAcc-entropy()-Y', 'tBodyAcc-entropy()-Z', 'tBodyAcc-arCoeff()-X,1', 'tBodyAcc-arCoeff()-X,2', 'tBodyAcc-arCoeff()-X,3', 'tBodyAcc-arCoeff()-X,4', 'tBodyAcc-arCoeff()-Y,1', 'tBodyAcc-arCoeff()-Y,2', 'tBodyAcc-arCoeff()-Y,3', 'tBodyAcc-arCoeff()-Y,4', 'tBodyAcc-arCoeff()-Z,1', 'tBodyAcc-arCoeff()-Z,2', 'tBodyAcc-arCoeff()-Z,3', 'tBodyAcc-arCoeff()-Z,4', 'tBodyAcc-correlation()-X,Y', 'tBodyAcc-correlation()-X,Z', 'tBodyAcc-correlation()-Y,Z', 'tGravityAcc-mean()-X', 'tGravityAcc-mean()-Y', 'tGravityAcc-mean()-Z', 'tGravityAcc-std()-X', 'tGravityAcc-std()-Y', 'tGravityAcc-std()-Z', 'tGravityAcc-mad()-X', 'tGravityAcc-mad()-Y', 'tGravityAcc-mad()-Z', 'tGravityAcc-max()-X', 'tGravityAcc-max()-Y', 'tGravityAcc-max()-Z', 'tGravityAcc-min()-X', 'tGravityAcc-min()-Y', 'tGravityAcc-min()-Z', 'tGravityAcc-sma()', 'tGravityAcc-energy()-X', 'tGravityAcc-energy()-Y', 'tGravityAcc-energy()-Z', 'tGravityAcc-iqr()-X', 'tGravityAcc-iqr()-Y', 'tGravityAcc-iqr()-Z', 'tGravityAcc-entropy()-X', 'tGravityAcc-entropy()-Y', 'tGravityAcc-entropy()-Z', 'tGravityAcc-arCoeff()-X,1', 'tGravityAcc-arCoeff()-X,2', 'tGravityAcc-arCoeff()-X,3', 'tGravityAcc-arCoeff()-X,4', 'tGravityAcc-arCoeff()-Y,1', 'tGravityAcc-arCoeff()-Y,2', 'tGravityAcc-arCoeff()-Y,3', 'tGravityAcc-arCoeff()-Y,4', 'tGravityAcc-arCoeff()-Z,1', 'tGravityAcc-arCoeff()-Z,2', 'tGravityAcc-arCoeff()-Z,3', 'tGravityAcc-arCoeff()-Z,4', 'tGravityAcc-correlation()-X,Y', 'tGravityAcc-correlation()-X,Z', 'tGravityAcc-correlation()-Y,Z', 'tBodyAccJerk-mean()-X', 'tBodyAccJerk-mean()-Y', 'tBodyAccJerk-mean()-Z', 'tBodyAccJerk-std()-X', 'tBodyAccJerk-std()-Y', 'tBodyAccJerk-std()-Z', 'tBodyAccJerk-mad()-X', 'tBodyAccJerk-mad()-Y', 'tBodyAccJerk-mad()-Z', 'tBodyAccJerk-max()-X', 'tBodyAccJerk-max()-Y', 'tBodyAccJerk-max()-Z', 'tBodyAccJerk-min()-X', 'tBodyAccJerk-min()-Y', 'tBodyAccJerk-min()-Z', 'tBodyAccJerk-sma()', 'tBodyAccJerk-energy()-X', 'tBodyAccJerk-energy()-Y', 'tBodyAccJerk-energy()-Z', 'tBodyAccJerk-iqr()-X', 'tBodyAccJerk-iqr()-Y', 'tBodyAccJerk-iqr()-Z', 'tBodyAccJerk-entropy()-X', 'tBodyAccJerk-entropy()-Y', 'tBodyAccJerk-entropy()-Z', 'tBodyAccJerk-arCoeff()-X,1', 'tBodyAccJerk-arCoeff()-X,2', 'tBodyAccJerk-arCoeff()-X,3', 'tBodyAccJerk-arCoeff()-X,4', 'tBodyAccJerk-arCoeff()-Y,1', 'tBodyAccJerk-arCoeff()-Y,2', 'tBodyAccJerk-arCoeff()-Y,3', 'tBodyAccJerk-arCoeff()-Y,4', 'tBodyAccJerk-arCoeff()-Z,1', 'tBodyAccJerk-arCoeff()-Z,2', 'tBodyAccJerk-arCoeff()-Z,3', 'tBodyAccJerk-arCoeff()-Z,4', 'tBodyAccJerk-correlation()-X,Y', 'tBodyAccJerk-correlation()-X,Z', 'tBodyAccJerk-correlation()-Y,Z', 'tBodyGyro-mean()-X', 'tBodyGyro-mean()-Y', 'tBodyGyro-mean()-Z', 'tBodyGyro-std()-X', 'tBodyGyro-std()-Y', 'tBodyGyro-std()-Z', 'tBodyGyro-mad()-X', 'tBodyGyro-mad()-Y', 'tBodyGyro-mad()-Z', 'tBodyGyro-max()-X', 'tBodyGyro-max()-Y', 'tBodyGyro-max()-Z', 'tBodyGyro-min()-X', 'tBodyGyro-min()-Y', 'tBodyGyro-min()-Z', 'tBodyGyro-sma()', 'tBodyGyro-energy()-X', 'tBodyGyro-energy()-Y', 'tBodyGyro-energy()-Z', 'tBodyGyro-iqr()-X', 'tBodyGyro-iqr()-Y', 'tBodyGyro-iqr()-Z', 'tBodyGyro-entropy()-X', 'tBodyGyro-entropy()-Y', 'tBodyGyro-entropy()-Z', 'tBodyGyro-arCoeff()-X,1', 'tBodyGyro-arCoeff()-X,2', 'tBodyGyro-arCoeff()-X,3', 'tBodyGyro-arCoeff()-X,4', 'tBodyGyro-arCoeff()-Y,1', 'tBodyGyro-arCoeff()-Y,2', 'tBodyGyro-arCoeff()-Y,3', 'tBodyGyro-arCoeff()-Y,4', 'tBodyGyro-arCoeff()-Z,1', 'tBodyGyro-arCoeff()-Z,2', 'tBodyGyro-arCoeff()-Z,3', 'tBodyGyro-arCoeff()-Z,4', 'tBodyGyro-correlation()-X,Y', 'tBodyGyro-correlation()-X,Z', 'tBodyGyro-correlation()-Y,Z', 'tBodyGyroJerk-mean()-X', 'tBodyGyroJerk-mean()-Y', 'tBodyGyroJerk-mean()-Z', 'tBodyGyroJerk-std()-X', 'tBodyGyroJerk-std()-Y', 'tBodyGyroJerk-std()-Z', 'tBodyGyroJerk-mad()-X', 'tBodyGyroJerk-mad()-Y', 'tBodyGyroJerk-mad()-Z', 'tBodyGyroJerk-max()-X', 'tBodyGyroJerk-max()-Y', 'tBodyGyroJerk-max()-Z', 'tBodyGyroJerk-min()-X', 'tBodyGyroJerk-min()-Y', 'tBodyGyroJerk-min()-Z', 'tBodyGyroJerk-sma()', 'tBodyGyroJerk-energy()-X', 'tBodyGyroJerk-energy()-Y', 'tBodyGyroJerk-energy()-Z', 'tBodyGyroJerk-iqr()-X', 'tBodyGyroJerk-iqr()-Y', 'tBodyGyroJerk-iqr()-Z', 'tBodyGyroJerk-entropy()-X', 'tBodyGyroJerk-entropy()-Y', 'tBodyGyroJerk-entropy()-Z', 'tBodyGyroJerk-arCoeff()-X,1', 'tBodyGyroJerk-arCoeff()-X,2', 'tBodyGyroJerk-arCoeff()-X,3', 'tBodyGyroJerk-arCoeff()-X,4', 'tBodyGyroJerk-arCoeff()-Y,1', 'tBodyGyroJerk-arCoeff()-Y,2', 'tBodyGyroJerk-arCoeff()-Y,3', 'tBodyGyroJerk-arCoeff()-Y,4', 'tBodyGyroJerk-arCoeff()-Z,1', 'tBodyGyroJerk-arCoeff()-Z,2', 'tBodyGyroJerk-arCoeff()-Z,3', 'tBodyGyroJerk-arCoeff()-Z,4', 'tBodyGyroJerk-correlation()-X,Y', 'tBodyGyroJerk-correlation()-X,Z', 'tBodyGyroJerk-correlation()-Y,Z', 'tBodyAccMag-mean()', 'tBodyAccMag-std()', 'tBodyAccMag-mad()', 'tBodyAccMag-max()', 'tBodyAccMag-min()', 'tBodyAccMag-sma()', 'tBodyAccMag-energy()', 'tBodyAccMag-iqr()', 'tBodyAccMag-entropy()', 'tBodyAccMag-arCoeff()1', 'tBodyAccMag-arCoeff()2', 'tBodyAccMag-arCoeff()3', 'tBodyAccMag-arCoeff()4', 'tGravityAccMag-mean()', 'tGravityAccMag-std()', 'tGravityAccMag-mad()', 'tGravityAccMag-max()', 'tGravityAccMag-min()', 'tGravityAccMag-sma()', 'tGravityAccMag-energy()', 'tGravityAccMag-iqr()', 'tGravityAccMag-entropy()', 'tGravityAccMag-arCoeff()1', 'tGravityAccMag-arCoeff()2', 'tGravityAccMag-arCoeff()3', 'tGravityAccMag-arCoeff()4', 'tBodyAccJerkMag-mean()', 'tBodyAccJerkMag-std()', 'tBodyAccJerkMag-mad()', 'tBodyAccJerkMag-max()', 'tBodyAccJerkMag-min()', 'tBodyAccJerkMag-sma()', 'tBodyAccJerkMag-energy()', 'tBodyAccJerkMag-iqr()', 'tBodyAccJerkMag-entropy()', 'tBodyAccJerkMag-arCoeff()1', 'tBodyAccJerkMag-arCoeff()2', 'tBodyAccJerkMag-arCoeff()3', 'tBodyAccJerkMag-arCoeff()4', 'tBodyGyroMag-mean()', 'tBodyGyroMag-std()', 'tBodyGyroMag-mad()', 'tBodyGyroMag-max()', 'tBodyGyroMag-min()', 'tBodyGyroMag-sma()', 'tBodyGyroMag-energy()', 'tBodyGyroMag-iqr()', 'tBodyGyroMag-entropy()', 'tBodyGyroMag-arCoeff()1', 'tBodyGyroMag-arCoeff()2', 'tBodyGyroMag-arCoeff()3', 'tBodyGyroMag-arCoeff()4', 'tBodyGyroJerkMag-mean()', 'tBodyGyroJerkMag-std()', 'tBodyGyroJerkMag-mad()', 'tBodyGyroJerkMag-max()', 'tBodyGyroJerkMag-min()', 'tBodyGyroJerkMag-sma()', 'tBodyGyroJerkMag-energy()', 'tBodyGyroJerkMag-iqr()', 'tBodyGyroJerkMag-entropy()', 'tBodyGyroJerkMag-arCoeff()1', 'tBodyGyroJerkMag-arCoeff()2', 'tBodyGyroJerkMag-arCoeff()3', 'tBodyGyroJerkMag-arCoeff()4', 'fBodyAcc-mean()-X', 'fBodyAcc-mean()-Y', 'fBodyAcc-mean()-Z', 'fBodyAcc-std()-X', 'fBodyAcc-std()-Y', 'fBodyAcc-std()-Z', 'fBodyAcc-mad()-X', 'fBodyAcc-mad()-Y', 'fBodyAcc-mad()-Z', 'fBodyAcc-max()-X', 'fBodyAcc-max()-Y', 'fBodyAcc-max()-Z', 'fBodyAcc-min()-X', 'fBodyAcc-min()-Y', 'fBodyAcc-min()-Z', 'fBodyAcc-sma()', 'fBodyAcc-energy()-X', 'fBodyAcc-energy()-Y', 'fBodyAcc-energy()-Z', 'fBodyAcc-iqr()-X', 'fBodyAcc-iqr()-Y', 'fBodyAcc-iqr()-Z', 'fBodyAcc-entropy()-X', 'fBodyAcc-entropy()-Y', 'fBodyAcc-entropy()-Z', 'fBodyAcc-maxInds-X', 'fBodyAcc-maxInds-Y', 'fBodyAcc-maxInds-Z', 'fBodyAcc-meanFreq()-X', 'fBodyAcc-meanFreq()-Y', 'fBodyAcc-meanFreq()-Z', 'fBodyAcc-skewness()-X', 'fBodyAcc-kurtosis()-X', 'fBodyAcc-skewness()-Y', 'fBodyAcc-kurtosis()-Y', 'fBodyAcc-skewness()-Z', 'fBodyAcc-kurtosis()-Z', 'fBodyAcc-bandsEnergy()-1,8', 'fBodyAcc-bandsEnergy()-9,16', 'fBodyAcc-bandsEnergy()-17,24', 'fBodyAcc-bandsEnergy()-25,32', 'fBodyAcc-bandsEnergy()-33,40', 'fBodyAcc-bandsEnergy()-41,48', 'fBodyAcc-bandsEnergy()-49,56', 'fBodyAcc-bandsEnergy()-57,64', 'fBodyAcc-bandsEnergy()-1,16', 'fBodyAcc-bandsEnergy()-17,32', 'fBodyAcc-bandsEnergy()-33,48', 'fBodyAcc-bandsEnergy()-49,64', 'fBodyAcc-bandsEnergy()-1,24', 'fBodyAcc-bandsEnergy()-25,48', 'fBodyAcc-bandsEnergy()-1,8.1', 'fBodyAcc-bandsEnergy()-9,16.1', 'fBodyAcc-bandsEnergy()-17,24.1', 'fBodyAcc-bandsEnergy()-25,32.1', 'fBodyAcc-bandsEnergy()-33,40.1', 'fBodyAcc-bandsEnergy()-41,48.1', 'fBodyAcc-bandsEnergy()-49,56.1', 'fBodyAcc-bandsEnergy()-57,64.1', 'fBodyAcc-bandsEnergy()-1,16.1', 'fBodyAcc-bandsEnergy()-17,32.1', 'fBodyAcc-bandsEnergy()-33,48.1', 'fBodyAcc-bandsEnergy()-49,64.1', 'fBodyAcc-bandsEnergy()-1,24.1', 'fBodyAcc-bandsEnergy()-25,48.1', 'fBodyAcc-bandsEnergy()-1,8.2', 'fBodyAcc-bandsEnergy()-9,16.2', 'fBodyAcc-bandsEnergy()-17,24.2', 'fBodyAcc-bandsEnergy()-25,32.2', 'fBodyAcc-bandsEnergy()-33,40.2', 'fBodyAcc-bandsEnergy()-41,48.2', 'fBodyAcc-bandsEnergy()-49,56.2', 'fBodyAcc-bandsEnergy()-57,64.2', 'fBodyAcc-bandsEnergy()-1,16.2', 'fBodyAcc-bandsEnergy()-17,32.2', 'fBodyAcc-bandsEnergy()-33,48.2', 'fBodyAcc-bandsEnergy()-49,64.2', 'fBodyAcc-bandsEnergy()-1,24.2', 'fBodyAcc-bandsEnergy()-25,48.2', 'fBodyAccJerk-mean()-X', 'fBodyAccJerk-mean()-Y', 'fBodyAccJerk-mean()-Z', 'fBodyAccJerk-std()-X', 'fBodyAccJerk-std()-Y', 'fBodyAccJerk-std()-Z', 'fBodyAccJerk-mad()-X', 'fBodyAccJerk-mad()-Y', 'fBodyAccJerk-mad()-Z', 'fBodyAccJerk-max()-X', 'fBodyAccJerk-max()-Y', 'fBodyAccJerk-max()-Z', 'fBodyAccJerk-min()-X', 'fBodyAccJerk-min()-Y', 'fBodyAccJerk-min()-Z', 'fBodyAccJerk-sma()', 'fBodyAccJerk-energy()-X', 'fBodyAccJerk-energy()-Y', 'fBodyAccJerk-energy()-Z', 'fBodyAccJerk-iqr()-X', 'fBodyAccJerk-iqr()-Y', 'fBodyAccJerk-iqr()-Z', 'fBodyAccJerk-entropy()-X', 'fBodyAccJerk-entropy()-Y', 'fBodyAccJerk-entropy()-Z', 'fBodyAccJerk-maxInds-X', 'fBodyAccJerk-maxInds-Y', 'fBodyAccJerk-maxInds-Z', 'fBodyAccJerk-meanFreq()-X', 'fBodyAccJerk-meanFreq()-Y', 'fBodyAccJerk-meanFreq()-Z', 'fBodyAccJerk-skewness()-X', 'fBodyAccJerk-kurtosis()-X', 'fBodyAccJerk-skewness()-Y', 'fBodyAccJerk-kurtosis()-Y', 'fBodyAccJerk-skewness()-Z', 'fBodyAccJerk-kurtosis()-Z', 'fBodyAccJerk-bandsEnergy()-1,8', 'fBodyAccJerk-bandsEnergy()-9,16', 'fBodyAccJerk-bandsEnergy()-17,24', 'fBodyAccJerk-bandsEnergy()-25,32', 'fBodyAccJerk-bandsEnergy()-33,40', 'fBodyAccJerk-bandsEnergy()-41,48', 'fBodyAccJerk-bandsEnergy()-49,56', 'fBodyAccJerk-bandsEnergy()-57,64', 'fBodyAccJerk-bandsEnergy()-1,16', 'fBodyAccJerk-bandsEnergy()-17,32', 'fBodyAccJerk-bandsEnergy()-33,48', 'fBodyAccJerk-bandsEnergy()-49,64', 'fBodyAccJerk-bandsEnergy()-1,24', 'fBodyAccJerk-bandsEnergy()-25,48', 'fBodyAccJerk-bandsEnergy()-1,8.1', 'fBodyAccJerk-bandsEnergy()-9,16.1', 'fBodyAccJerk-bandsEnergy()-17,24.1', 'fBodyAccJerk-bandsEnergy()-25,32.1', 'fBodyAccJerk-bandsEnergy()-33,40.1', 'fBodyAccJerk-bandsEnergy()-41,48.1', 'fBodyAccJerk-bandsEnergy()-49,56.1', 'fBodyAccJerk-bandsEnergy()-57,64.1', 'fBodyAccJerk-bandsEnergy()-1,16.1', 'fBodyAccJerk-bandsEnergy()-17,32.1', 'fBodyAccJerk-bandsEnergy()-33,48.1', 'fBodyAccJerk-bandsEnergy()-49,64.1', 'fBodyAccJerk-bandsEnergy()-1,24.1', 'fBodyAccJerk-bandsEnergy()-25,48.1', 'fBodyAccJerk-bandsEnergy()-1,8.2', 'fBodyAccJerk-bandsEnergy()-9,16.2', 'fBodyAccJerk-bandsEnergy()-17,24.2', 'fBodyAccJerk-bandsEnergy()-25,32.2', 'fBodyAccJerk-bandsEnergy()-33,40.2', 'fBodyAccJerk-bandsEnergy()-41,48.2', 'fBodyAccJerk-bandsEnergy()-49,56.2', 'fBodyAccJerk-bandsEnergy()-57,64.2', 'fBodyAccJerk-bandsEnergy()-1,16.2', 'fBodyAccJerk-bandsEnergy()-17,32.2', 'fBodyAccJerk-bandsEnergy()-33,48.2', 'fBodyAccJerk-bandsEnergy()-49,64.2', 'fBodyAccJerk-bandsEnergy()-1,24.2', 'fBodyAccJerk-bandsEnergy()-25,48.2', 'fBodyGyro-mean()-X', 'fBodyGyro-mean()-Y', 'fBodyGyro-mean()-Z', 'fBodyGyro-std()-X', 'fBodyGyro-std()-Y', 'fBodyGyro-std()-Z', 'fBodyGyro-mad()-X', 'fBodyGyro-mad()-Y', 'fBodyGyro-mad()-Z', 'fBodyGyro-max()-X', 'fBodyGyro-max()-Y', 'fBodyGyro-max()-Z', 'fBodyGyro-min()-X', 'fBodyGyro-min()-Y', 'fBodyGyro-min()-Z', 'fBodyGyro-sma()', 'fBodyGyro-energy()-X', 'fBodyGyro-energy()-Y', 'fBodyGyro-energy()-Z', 'fBodyGyro-iqr()-X', 'fBodyGyro-iqr()-Y', 'fBodyGyro-iqr()-Z', 'fBodyGyro-entropy()-X', 'fBodyGyro-entropy()-Y', 'fBodyGyro-entropy()-Z', 'fBodyGyro-maxInds-X', 'fBodyGyro-maxInds-Y', 'fBodyGyro-maxInds-Z', 'fBodyGyro-meanFreq()-X', 'fBodyGyro-meanFreq()-Y', 'fBodyGyro-meanFreq()-Z', 'fBodyGyro-skewness()-X', 'fBodyGyro-kurtosis()-X', 'fBodyGyro-skewness()-Y', 'fBodyGyro-kurtosis()-Y', 'fBodyGyro-skewness()-Z', 'fBodyGyro-kurtosis()-Z', 'fBodyGyro-bandsEnergy()-1,8', 'fBodyGyro-bandsEnergy()-9,16', 'fBodyGyro-bandsEnergy()-17,24', 'fBodyGyro-bandsEnergy()-25,32', 'fBodyGyro-bandsEnergy()-33,40', 'fBodyGyro-bandsEnergy()-41,48', 'fBodyGyro-bandsEnergy()-49,56', 'fBodyGyro-bandsEnergy()-57,64', 'fBodyGyro-bandsEnergy()-1,16', 'fBodyGyro-bandsEnergy()-17,32', 'fBodyGyro-bandsEnergy()-33,48', 'fBodyGyro-bandsEnergy()-49,64', 'fBodyGyro-bandsEnergy()-1,24', 'fBodyGyro-bandsEnergy()-25,48', 'fBodyGyro-bandsEnergy()-1,8.1', 'fBodyGyro-bandsEnergy()-9,16.1', 'fBodyGyro-bandsEnergy()-17,24.1', 'fBodyGyro-bandsEnergy()-25,32.1', 'fBodyGyro-bandsEnergy()-33,40.1', 'fBodyGyro-bandsEnergy()-41,48.1', 'fBodyGyro-bandsEnergy()-49,56.1', 'fBodyGyro-bandsEnergy()-57,64.1', 'fBodyGyro-bandsEnergy()-1,16.1', 'fBodyGyro-bandsEnergy()-17,32.1', 'fBodyGyro-bandsEnergy()-33,48.1', 'fBodyGyro-bandsEnergy()-49,64.1', 'fBodyGyro-bandsEnergy()-1,24.1', 'fBodyGyro-bandsEnergy()-25,48.1', 'fBodyGyro-bandsEnergy()-1,8.2', 'fBodyGyro-bandsEnergy()-9,16.2', 'fBodyGyro-bandsEnergy()-17,24.2', 'fBodyGyro-bandsEnergy()-25,32.2', 'fBodyGyro-bandsEnergy()-33,40.2', 'fBodyGyro-bandsEnergy()-41,48.2', 'fBodyGyro-bandsEnergy()-49,56.2', 'fBodyGyro-bandsEnergy()-57,64.2', 'fBodyGyro-bandsEnergy()-1,16.2', 'fBodyGyro-bandsEnergy()-17,32.2', 'fBodyGyro-bandsEnergy()-33,48.2', 'fBodyGyro-bandsEnergy()-49,64.2', 'fBodyGyro-bandsEnergy()-1,24.2', 'fBodyGyro-bandsEnergy()-25,48.2', 'fBodyAccMag-mean()', 'fBodyAccMag-std()', 'fBodyAccMag-mad()', 'fBodyAccMag-max()', 'fBodyAccMag-min()', 'fBodyAccMag-sma()', 'fBodyAccMag-energy()', 'fBodyAccMag-iqr()', 'fBodyAccMag-entropy()', 'fBodyAccMag-maxInds', 'fBodyAccMag-meanFreq()', 'fBodyAccMag-skewness()', 'fBodyAccMag-kurtosis()', 'fBodyBodyAccJerkMag-mean()', 'fBodyBodyAccJerkMag-std()', 'fBodyBodyAccJerkMag-mad()', 'fBodyBodyAccJerkMag-max()', 'fBodyBodyAccJerkMag-min()', 'fBodyBodyAccJerkMag-sma()', 'fBodyBodyAccJerkMag-energy()', 'fBodyBodyAccJerkMag-iqr()', 'fBodyBodyAccJerkMag-entropy()', 'fBodyBodyAccJerkMag-maxInds', 'fBodyBodyAccJerkMag-meanFreq()', 'fBodyBodyAccJerkMag-skewness()', 'fBodyBodyAccJerkMag-kurtosis()', 'fBodyBodyGyroMag-mean()', 'fBodyBodyGyroMag-std()', 'fBodyBodyGyroMag-mad()', 'fBodyBodyGyroMag-max()', 'fBodyBodyGyroMag-min()', 'fBodyBodyGyroMag-sma()', 'fBodyBodyGyroMag-energy()', 'fBodyBodyGyroMag-iqr()', 'fBodyBodyGyroMag-entropy()', 'fBodyBodyGyroMag-maxInds', 'fBodyBodyGyroMag-meanFreq()', 'fBodyBodyGyroMag-skewness()', 'fBodyBodyGyroMag-kurtosis()', 'fBodyBodyGyroJerkMag-mean()', 'fBodyBodyGyroJerkMag-std()', 'fBodyBodyGyroJerkMag-mad()', 'fBodyBodyGyroJerkMag-max()', 'fBodyBodyGyroJerkMag-min()', 'fBodyBodyGyroJerkMag-sma()', 'fBodyBodyGyroJerkMag-energy()', 'fBodyBodyGyroJerkMag-iqr()', 'fBodyBodyGyroJerkMag-entropy()', 'fBodyBodyGyroJerkMag-maxInds', 'fBodyBodyGyroJerkMag-meanFreq()', 'fBodyBodyGyroJerkMag-skewness()', 'fBodyBodyGyroJerkMag-kurtosis()', 'angle(tBodyAccMean,gravity)', 'angle(tBodyAccJerkMean),gravityMean)', 'angle(tBodyGyroMean,gravityMean)', 'angle(tBodyGyroJerkMean,gravityMean)', 'angle(X,gravityMean)', 'angle(Y,gravityMean)', 'angle(Z,gravityMean)']
len(X)561
Data preprocessing for TabPFN classifier model
To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 561 to below the TabPFN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions.
# Data processing to reduce the features to 100 or less as required for TabPFN models
X = train_har_data.drop(columns=['Activity'])
y = train_har_data['Activity']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1))
X_reduced_lda = lda.fit_transform(X_scaled, y)
X_train_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])
X_train_lda_df['Activity'] = y.reset_index(drop=True)
X_train_lda_df.shape(1020, 6)
# Visualize the final processed training data columns
X_train_lda_df.columnsIndex(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')
We define the explanatory variables as follows: In the training dataframe above we use Activity as the target label to be predicted, using the rest of the features as explanatory variables X. We define the explanatory variables as follows:
# define the explanatory vairables
X = list(X_train_lda_df.columns)
X =X[:-1]
len(X)5
Once the explanatory variables X are defined, they are used as input in the prepare_tabulardata method from the tabular learner in arcgis.learn. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.
The input parameters required for the tool are used as follows:
data = prepare_tabulardata(X_train_lda_df, 'Activity', explanatory_variables=X)Visualize training data
To get a sense of what the training data looks like, the show_batch() method will randomly pick a few training samples and visualize them. The samples show the explanatory variables and the Activity target label to predict.
data.show_batch(rows=5)| Activity | LDA1 | LDA2 | LDA3 | LDA4 | LDA5 | |
|---|---|---|---|---|---|---|
| 28 | STANDING | -19.431367 | -11.174415 | 0.816325 | -0.687153 | 2.809134 |
| 130 | SITTING | -18.377005 | -9.429310 | 0.342490 | -0.757008 | -4.074501 |
| 311 | WALKING | 17.852727 | -1.390446 | -8.164604 | 4.176480 | -0.160302 |
| 734 | WALKING_UPSTAIRS | 23.633640 | 2.301121 | 2.475455 | -10.447339 | -0.302447 |
| 847 | LAYING | -26.620913 | 14.783496 | -0.705847 | 0.511383 | -0.568820 |
Model training
First, we initialize the model as follows:
Model initialization
The default, initialization of the TabPFN classifier model object is shown below:
tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier', device='cpu', N_ensemble_configurations=32)Fit the model
Next, we will train the model.
tabpfn_classifier.fit()tabpfn_classifier.score()0.9901960784313726
We can see the model score is showing excellent results.
Visualize results in validation set
It is a good practice to see the results of the model viz-a-viz ground truth. The code below picks random samples and shows us the Activity which is the ground truth or target state and model predicted Activity_results side by side. This enables us to preview the results of the model we trained.
tabpfn_classifier.show_results()| Activity | LDA1 | LDA2 | LDA3 | LDA4 | LDA5 | Activity_results | |
|---|---|---|---|---|---|---|---|
| 101 | LAYING | -27.925746 | 17.698939 | -0.211993 | 0.137931 | -0.872844 | LAYING |
| 299 | WALKING | 22.230659 | 0.572533 | -8.318169 | 1.275016 | 0.139479 | WALKING |
| 693 | SITTING | -18.977440 | -10.123049 | 0.551960 | -0.028800 | -3.010251 | SITTING |
| 884 | WALKING | 23.685543 | -0.261172 | -11.338481 | 3.464041 | -0.628183 | WALKING |
| 967 | SITTING | -18.285170 | -11.017798 | 1.873937 | -0.702550 | -1.983438 | SITTING |
Predict using the TabPFN classifier model
Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples.
Before using the trained TabPFN model to predict the classes of the test dataset, we will first apply Linear Discriminant Analysis (LDA) to reduce the test data to the same feature space as the training data. This ensures consistency between the training and test datasets, enabling the trained TabPFN model to effectively classify the larger test sample.
# Align tset data with the train data format
X = test_har_data.drop(columns=['Activity'])
y = test_har_data['Activity']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1))
X_reduced_lda = lda.fit_transform(X_scaled, y)
X_test_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])
X_test_lda_df['Activity'] = y.reset_index(drop=True)
print(X_test_lda_df.shape) (6332, 6)
X_test_lda_df.head(5)| LDA1 | LDA2 | LDA3 | LDA4 | LDA5 | Activity | |
|---|---|---|---|---|---|---|
| 0 | -10.188443 | -8.641377 | 0.606669 | 1.120983 | 3.836137 | STANDING |
| 1 | -9.735631 | -6.716675 | 0.537841 | -0.543385 | 2.295157 | STANDING |
| 2 | -8.954351 | -7.376296 | 0.798942 | -0.507465 | 2.508069 | STANDING |
| 3 | -10.400401 | -7.267321 | 1.035134 | 0.272738 | 2.034312 | STANDING |
| 4 | -9.596161 | -6.980061 | 0.480017 | -0.284537 | 1.103180 | STANDING |
Predict
activity_predicted_tabpfn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')activity_predicted_tabpfn.tail(5)| LDA1 | LDA2 | LDA3 | LDA4 | LDA5 | Activity | prediction_results | |
|---|---|---|---|---|---|---|---|
| 6327 | 18.095783 | 2.593775 | 8.704337 | 5.424257 | 0.443448 | WALKING_DOWNSTAIRS | WALKING_DOWNSTAIRS |
| 6328 | 17.094286 | 1.997284 | 5.270752 | 2.839847 | 0.550822 | WALKING_DOWNSTAIRS | WALKING_DOWNSTAIRS |
| 6329 | 15.909594 | 1.537803 | 4.887237 | 4.771153 | -0.157321 | WALKING_DOWNSTAIRS | WALKING_DOWNSTAIRS |
| 6330 | 11.944985 | 0.834660 | 0.116338 | -6.285236 | 0.045984 | WALKING_UPSTAIRS | WALKING_UPSTAIRS |
| 6331 | 14.575570 | 1.737412 | -0.866397 | -5.458011 | -1.150021 | WALKING_UPSTAIRS | WALKING_UPSTAIRS |
Accuracy assessment
Next, we will evaluate the model's performance. This will print out multiple model metrics that we can use to assess the model quality. These metrics include a combination of multiple evaluation criteria, such as accuracy, precision, recall and F1-Score, which collectively measure the model's performance on the validation set.
# Extract ground truth and predictions
y_true = activity_predicted_tabpfn['Activity']
y_pred = activity_predicted_tabpfn['prediction_results']
# Calculate Accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
# Calculate Precision
precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'Precision: {precision:.2f}')
# Calculate Recall
recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'Recall: {recall:.2f}')
# Calculate F1-Score
f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)
print(f'F1 Score: {f1:.2f}')
# classification_report
print("\nClassification Report:")
print(classification_report(y_true, y_pred))Accuracy: 96.83%
Precision: 0.97
Recall: 0.97
F1 Score: 0.97
Classification Report:
precision recall f1-score support
LAYING 1.00 1.00 1.00 1219
SITTING 1.00 0.83 0.91 1119
STANDING 0.86 0.99 0.93 1197
WALKING 1.00 1.00 1.00 1031
WALKING_DOWNSTAIRS 1.00 0.99 1.00 835
WALKING_UPSTAIRS 0.99 1.00 1.00 931
accuracy 0.97 6332
macro avg 0.97 0.97 0.97 6332
weighted avg 0.97 0.97 0.97 6332
The performance metrics obtained from the trained TabPFN model on the test dataset of 6,332 samples indicate excellent classification quality.
Accuracy (96.81%) : The model correctly classified approximately 97% of the samples, which is a strong indication of its ability to generalize well to unseen data, despite being trained on a smaller dataset of just 1,020 samples.
Precision (0.97) : Precision measures the proportion of true positive predictions among all positive predictions made by the model. A precision of 0.97 means that 97% of the predicted positive activity classes are correct, indicating that the model rarely makes false positive errors.
Recall (0.97) : Recall represents the model's ability to correctly identify all relevant instances of a class. A recall of 0.97 means that the model correctly identifies 97% of all actual positive instances, with minimal false negatives.
F1 Score (0.97) : The F1 Score is the harmonic mean of precision and recall, and a value of 0.97 shows that the model balances precision and recall very well. This indicates that the model is both highly accurate and sensitive in detecting the correct activity classes.
Overall, these metrics demonstrate that the TabPFN model performs exceptionally well, achieving near-perfect classification with minimal errors. This performance is particularly impressive given that it was trained on a relatively small sample size of 1,020 data points, highlighting its efficiency and effectiveness in handling human activity recognition tasks.
Conclusion
This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, and precision, recall, and F1 scores all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems.
TabPFN license information
| License Description |
|---|
| Built with TabPFN - tabpfn.TabPFNClassifier https://priorlabs.ai/tabpfn-license |