Sensor-Based Human Behavior Recognition Using Artificial Intelligence (AI) Models
Sensor-based human behavior recognition is a classic problem in ubiquitous computing, human-computer interactions, and ambient assisted living. Especially with the proliferation of sensor-equipped mobile devices, a multitude of ubiquitous sensors allow seamless monitoring of contextual information based on the multitude of digital traces people leave while interacting with Web applications, static infrastructure, and wearable devices. Sensor data are used in a range of applications, including fitness trackers, smart homes, and healthcare support, and the generalizability and portability of the algorithms for processing them are therefore essential. In real-world scenarios, researchers are faced with many challenges when developing sensor-based algorithms for recognizing human behavior. To be more specific, the same behavior can be performed differently by different individuals or even by the same person, while some others are fundamentally different but show very similar characteristics in the sensor data. Early studies utilized the manual feature engineering approaches to extract important features to quantify human behaviors. For example, brain activities with EEG waveforms are characterized by their location, amplitude, and frequency. Body movements with IMU data are typically represented in terms of both speed and direction changes in velocity with time. Although manual feature engineering can be an effective way to recognize human behavior, it is very time-consuming and heavily dependent on expert knowledge. Recent sensor-based human behavior recognition algorithms use deep neural networks for learning the important sensor features automatically. Architectures that are typically used include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and or hybrid models containing both architectures. Since attention mechanisms are developed, attention-based models have been widely adopted for representing sensor data.
The thesis presents substantial contributions to the development of generalizable and portable frameworks for recognizing human behavior.
The first key contribution of the thesis is creating a pilot study on manual feature engineering approach, where a multiscale feature engineering method is developed to enhance the sensor based feature representation. For further exploration, a real-world case study is conducted, which indicates that the important features in biological signals (e.g., fatigue in the human body) differ greatly from those observed in physiological signals (such as human movements) .
In the second contribution, the thesis conducts a comprehensive discussion on deep neural network based human behavior recognition methods, pointing out that while the state-of-the-art (SOTA) methods have well studied the inter relationships between sensors, the intra relationships in the sensors are overlooked. Based on this finding, PearNet, a Pearson correlation-based graph attention neural network is proposed to model the intra spatial-temporal relationships in the sensor.
The third major contribution in this thesis is the further exploration of intra-relationship learning methods in the sensor. To improve the efficiency, a convolution-based multi-head attention is developed, which allows the attention heads to be expanded with less parameters and training time than the traditional multi-head approach.
The fourth significant contribution is the development of an adaptive attention convolutional neural network (SENSORNET) that learns both the inter- and intra-relationships between (in) sensor(s). The SENSORNET addresses the poor portability for a pre-trained neural network on a new pervasive application where sensor data is limited. To solve the problem, the model integrates the flexibility of self-attention with multi-scale feature locality of convolution. Moreover, a patch-wise self-attention is invented with stacked multi-heads to enrich the sensor feature representation. SENSORNET is generalizable to pervasive application with any number of sensor inputs and is much smaller than the state-of-the-art self-attention and convolution hybrid baseline with similar performance.
The thesis makes four noteworthy contributions. Initially, it introduces a multiscale feature engineering method in a pilot study that improves the sensor-based feature representation. Subsequently, a real-world case study is conducted to highlight the differences between essential features in biological and physiological signals. Secondly, the thesis discusses existing human behavior recognition methods based on deep neural networks and proposes PearNet, a Pearson correlation-based graph attention neural network that models intra spatial-temporal relationships in sensors. Thirdly, to enhance the efficiency of intra-relationship learning in sensors, a convolution-based multi-head attention is developed. Finally, the thesis presents SENSORNET, an adaptive attention convolutional neural network that addresses the portability issue of pre-trained networks in new pervasive applications with limited sensor data. SENSORNET integrates self-attention with multi-scale feature locality of convolution and has a smaller size than state-of-the-art hybrid models with similar performance. It is also generalizable to various pervasive applications with any number of sensor inputs.