4.1. ECG Data Acquisition and Analysis
The ECG data acquisition process, signal noise removal, analysis, and assessment of the proposed AF detection algorithms used in the developed embedded system are described in the subsequent paragraphs. Typical ECG signals mainly consist of several types of waves, one complex, as well as intervals and segments, as seen in
Figure 12.
P Wave: Represents atrial depolarization.
Q Wave: A slight downward deflection after the P wave.
R Wave: A large upward deflection following the Q wave.
S Wave: A slight downward deflection following the R wave.
T Wave: Represents ventricular repolarization.
U Wave: Sometimes present but usually not seen due to its low peak value.
The intervals of these waves are used to diagnose several heart diseases [
25].
Interval: Represents the time between two specific ECG waves. The intervals commonly measured on an ECG are the PR interval, QT interval, and RR interval.
Segment: Represents the length between two specific points on an ECG signal that is supposed to be at the baseline amplitude. The segments on an ECG signal include the PR segment and the ST segment.
Complex: The only complex on an ECG signal is the QRS complex.
We used the AD8232 single-lead ECG sensor that extracts, amplifies, and filters small bioelectric signals from the human heart. When the three electrodes are typically placed on the right arm (RA), left arm (LA), and right leg (RL), we obtain a single-lead ECG waveform. The ECG waveform we received from our experiment contains several key components (PR interval, QRS complex, and QT interval) that represent different electrical activities of the heart, as illustrated in
Figure 13.
RR interval: Indicates the time interval between two adjacent R waves. In arrhythmias, these intervals may become irregular.
PR interval: This measures the time between the beginnings of the P wave and the QRS complex.
QRS complex: Represents ventricular depolarization, which consists of three essential waves, i.e., Q wave, R wave, and S wave. By analyzing the QRS complex, certain diseases, such as drug toxicity and electrolyte imbalance, are likely to be detected.
QT interval: This represents the time between the beginning of the Q wave and the end of the T wave, which is related to ventricular depolarization and repolarization [
25]. If the QT interval exceeds the normal value, there is an increased risk of ventricular fibrillation.
We collected the raw ECG signal values from the Arduino IDE serial monitor for 20.21 s at a sampling rate of 99 Hz with a total of 2001 raw ADC values.
Figure 14 illustrates the raw and filtered ECG signals, with the horizontal axis representing time and the vertical axis representing voltage. The upper panel shows the raw ECG signal recorded over approximately 20.21 s, highlighting various peaks of the QRS complex, noise, and baseline wandering. The lower panel displays the filtered ECG signal, which was processed using a band-pass filter. This filter effectively removes high-frequency noise and low-frequency baseline drifts, resulting in a cleaner signal that retains the essential characteristics of the ECG waveform, including the P-waves, QRS complexes, and T-waves. Additionally, the previously created time vector can be used to convert the sample numbers into actual time in seconds.
The hardware sample was initially collected as a CSV file and then prepared for input into the trained model. Specifically, the CSV was converted into a 3D array of series values. This input was subsequently passed through the preprocessing pipeline to be prepared for the applied model. The Raspberry Pi 4B receives ECG data from the ESP8266 via a serial connection and processes the signals to execute the preloaded multimodal deep learning model (VGG16 + distilled CNN-BiLSTM) for atrial fibrillation (AF) diagnosis. This model, developed using Python 3.12 with the Keras library and TensorFlow as the backend, analyzes the data and provides diagnostic outcomes. The Raspberry Pi displays the results upon the completion of the testing phase. The latency of detection for the proposed hardware device, comprising a Raspberry Pi 4B, an ESP8266 microcontroller, and an AD8232 ECG sensor, is approximately 0.85 s.
4.2. Model Evaluation on ECG Dataset
The three trained models were thoroughly assessed in the form of the ECG dataset in the test data, which was 20% of the entire dataset.
Table 2 presents the accuracy, precision, recall, and F1 score of these models after 80 epochs of training, which gives a quantitative measure of the efficiency of the proposed models. The table compares the performance metrics of three applied models, LSTM, CNN-LSTM, and CNN-BiLSTM, on ECG data, showing that the CNN-BiLSTM model achieves the highest accuracy (92.73%) and F1 score (0.93). In contrast, the LSTM model performs the least effectively, with an accuracy of 71.24% and an F1 score of 0.71.
Table 3 summarizes the performance metrics of three models, i.e., ViT, ResNet32, and VGG16, on ECG image data, with VGG16 achieving the highest accuracy (89.97%) and F1 score (0.90). Despite having significantly fewer parameters (165.5k), VGG16 outperforms the other models, with ResNet32 showing slightly lower performance and the ViT model having the lowest accuracy (83.57%) and F1 score (0.82).
4.6. Model’s Detection Interpretation Using XAI
An advanced explainable AI method, Local Interpretable Model-Agnostic Explanations (LIME), has been employed to gain deeper insight into the model’s focus and decision-making process. LIME facilitates the interpretation of individual predictions by determining whether our VGG16 model focuses on the relevant areas of the ECG images. It treats the model as a black box and only requires access to its predictions while creating easily interpretable visualizations such as heatmaps.
Figure 16 demonstrates the heatmap generated by LIME, applied to the VGG16 model, where the yellow regions highlight the portions of the ECG signal that the model considered most significant in making its prediction. From the heatmap, we can conclude that the VGG16 model adequately focused on the relevant areas. Therefore, the model’s internal processes were functioning effectively during training. A performance assessment of machine learning and deep neural network models involves multiple perspectives. We measured the models’ performance using metrics such as accuracy and precision. By analyzing metrics like accuracy, precision, recall, and the confusion matrix, we gained insights into how well the model identified true positives, performed overall classification, and predicted unseen data, ultimately leading to improved model selection and performance for specific tasks.
Table 6 compares various atrial fibrillation (AF) detection studies, highlighting the best-performing deep learning model, datasets, accuracy, F1 scores, and hardware devices. The proposed work achieved a 94.07% accuracy and an F1 score of 0.94 using a multimodal approach (CNN-BiLSTM + VGG16) on the PTB-XL ECG dataset, and it was implemented on an SP8266- and Raspberry Pi 4B-embedded device. In contrast, other works have achieved higher accuracy using different models and datasets, such as the Multimodal Feature Fusion Framework in [
9], with a 99.70% accuracy on the MIT-BIH arrhythmia dataset. This study is the first to use only three ECG leads to identify atrial fibrillation, outperforming previous research that utilized the PTB-XL dataset. Furthermore, we investigated a multimodal approach, knowledge distillation for edge devices, and an embedded system for AF detection from ECG.