A position and orientationaware one-shot learning framework for medical action recognition from signal data
In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), and dynamic time warping (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. The proposed SIG method aims to transform the raw skeleton data into privacy-preserved features for training. The CsA module is developed to guide the network in reducing medical action recognition bias and more focusing on important human body parts for each specific action, aimed at addressing similar medical action related issues. Moreover, the DTW module is employed to minimize temporal mismatching between instances and further improve model performance. Furthermore, the proposed privacy-preserved orientation-level features are utilized to assist the position-level features in both of the two stages for enhancing medical action recognition performance. Extensive experimental results on the widely-used and well-known NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets all demonstrate the effectiveness of the proposed method, which outperforms the other state-of-the-art methods with general dataset partitioning by 2.7%, 6.2% and 4.1%, respectively.
In our proposed POA-OSL, the position and orientation features are firstly extracted from the raw skeleton sequences for a better representation of human actions. The SIG module is then utilized for mitigating the privacy information. After that, the CsA and the DTW modules are applied to address the similar action and temporal mismatching issue, respectively. Subsequently, the orientation feature-assisted training method is introduced which consists of multiple-level fusion. Finally, the ProtoNet based model is selected for obtaining the experimental results.
In this section, we conduct the experiments on three public datasets which are most widely used and well-known for action recognition tasks including NTU-60, NTU-120 and PKU-MMD. The quantitative results are also presented to compare with the other SOTA one-shot learning methods for human action recognition. Moreover, we design experiments for specific medical actions analyzing as well as the result visualisation. Furthermore, we carry out ablation studies to demonstrate the effectiveness of transformed features, the proposed CsA and DTW modules. Finally, the experiments for different parameter settings are presented.
As we can see from Table V, both cough and chest pain are the most difficult medical actions to recognize. The reason for this is that both the movements of these two medical actions are slight in both spatial and temporal dimensions. In contrast, the staggering and falling achieve the most and the second- most promising accuracy on the NTU-120 dataset, which are 95.7% and 99.5%, respectively. It could be observed that the performance of headache from PKU-MMD is remarkably improved from 50.1% to 69.7% by applying the proposed POA-OSL (MF), which further verifies that POA-OSL could enhance the discriminating ability on different datasets..
@ARTICLE{10814994, author={Xie, Leiyu and Yang, Yuxing and Fu, Zeyu and Naqvi, Syed Mohsen}, journal={IEEE Transactions on Multimedia}, title={Position and Orientation Aware One-Shot Learning for Medical Action Recognition from Signal Data}, year={2024}, volume={}, number={}, pages={1-14}, keywords={Feature extraction;Biomedical imaging;Skeleton;One shot learning;Training;Human activity recognition;Privacy;Data models;Protection;Data privacy;One-shot learning;medical action recognition;attention mechanism;feature fusion;healthcare}, doi={10.1109/TMM.2024.3521703}}