https://itc.ktu.lt/index.php/ITC/issue/feed Information Technology and Control 2024-09-25T08:13:44+03:00 Prof. Rytis Maskeliūnas [email protected] Open Journal Systems <p>Periodical journal <em>Information Technology and Control / Informacinės technologijos ir valdymas</em> covers a wide field of computer science and control systems related problems. All articles should be prepared considering the requirements of the journal. Please use <a style="font-size: normal; text-decoration: underline;" href="https://itc.ktu.lt/public/journals/13/Guidelines for Preparing a Paper for Information Technology and Control (5).doc.rtf">„Article Template“</a> to prepare your paper properly. Together with your article, please submit a signed <a href="https://itc.ktu.lt/public/journals/13/info/Authors_Guarantee_Form_ITC.DOCX">Author's Guarantee Form</a>.</p> https://itc.ktu.lt/index.php/ITC/article/view/35112 BiLSTM-Attention-CNN Model Based on ISSA Optimization for Cyberbullying Detection in Chinese Text 2023-11-24T07:01:15+02:00 Wenting Fan [email protected] <p>Cyberbullying has become increasingly common due to the extent and anonymity afforded to users by online social media, and poses a significant risk to the physical and mental health of people. In this study, we propose an ISSA-based model to detect cyberbullying in Chinese text (ISSA-BiLSTM-Attention-CNN) that can determine whether a given comment reflects cyberbullying. The model contains an attention mechanism and the improved sparrow search algorithm (ISSA) for optimization that enables it to focus on important textual information and make full use of the optimal hyperparameters. Before applying the CNN to collect and learn a sufficient number of local features, the model initially uses the bidirectional LSTM (BiLSTM) to concatenate the results of forward and backward processing of the given text. The results of experiments showed that the proposed method can outperform baseline methods, with an accuracy of 90.2% and an f-measure of 89.9%.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35588 AMF-SparseInst: Attention-guided Multi-Scale Feature fusion network based on SparseInst 2024-04-09T23:55:44+03:00 Yiyi Chen [email protected] Liang Wan [email protected] Shusheng Li [email protected] Liang Liao [email protected] <p><em>SparseInst </em>will generate redundant background features in multi-scale feature fusion, which will cause feature loss for small objects with low resolution and similar pixels to the background. To address the issue, we propose a real-time instance segmentation model named AMF-SparseInst (Attention-guided Multi-Scale Feature SparseInst), which can effectively highlight the most critical features of small objects from cluttered backgrounds. Firstly, we design a pyramid pooling module (called SimAM-ASPP), which consists of some depthwise separable convolutions with three different expansion rates and a 3D attention mechanism (called SimAM). It can capture contextual information from different receptive fields and focus on small object features. Secondly, we designed the Lite -BiFPN module to associate and integrate different levels of semantic information from top to bottom and from bottom to bottom. Finally, we propose a feature enhancement module FEM, which uses N3 and N5 respectively to reweight fusion features in spatial and channel dimensions to enhance the effective information of multi-scale fusion features. Experimental results demonstrate the superiority of AMF-SparseInst over the benchmark on COCO 2017 test-dev. Specifically, the AMF-SparseInst makes a 3.6% improvement in overall segmentation accuracy, while increasing speed by 2.5 FPS. Moreover, it achieves a favorable balance between accuracy and speed on the Cityscapes validation set.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36121 Six-Degree-of-Freedom Pose Estimation of Class-Level Objects Based on P2T-Net 2024-04-26T08:56:44+03:00 Guanjin Li [email protected] <p>6D pose estimation of objects is widely used in the fields of augmented reality, robot operation, and unmanned driving. Due to the complexity and variability of real application scenarios, its task needs to deal with the interference such as light change, distance change, sensor noise, and mutual occlusion of chaotic placement. In application scenarios, the implementation of methods with low hardware cost and also high efficiency on accuracy and time cost is still a challenging problem. At this time, it is important to recognize the class of the object, determine the area of the object in the image, and estimate the 6D pose of the object that are still challenging problems. In this paper, we proposed a conceptually simple and data-efficient category-level 6 Degree-of-Freedom pose estimation network using Pyramid Pooling Transformer as the foundation network to enhance the accuracy in image classification, semantic segmentation, object detection, and instance segmentation with low hardware cost application background. In the cross-modal fusion phase, the implicit Deep recovery technique is used to improve the RGB-D feature representation capability, and the compact pyramid refinement operation can efficiently fuse multiple layers of features with high speed and few parameters. Compared with traditional methods, the methods we proposed have better resistance to occlusion, MAP of 10° 2cm and 10° 5cm can reach 81.4% and 87.1%, and MAP of 5° 2cm and 5° 5cm can reach 69.2% and 72.9%, which is ahead of NOCS and SPD in comparison test of public data set CAMERA and REAL. It has obvious advantages especially under the situation that large hardware and data base is not feasible.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35741 Design Path of Robot Motion Control System Based on Graphic Element Information 2024-03-12T07:58:45+02:00 Zhongshi Huang [email protected] Binbin Liang [email protected] <p>As the core of the numerical control system, the motion control system of robots determines the machining efficiency and quality of the numerical control machine tool. This study aims to improve the stability of robot motion control systems, thereby further enhancing the efficiency and economic benefits of numerical control machine tools. Therefore, the robot motion control system based on the graphic element information and machining path is designed. Combined with simulated annealing algorithm and S-type acceleration and deceleration control algorithm, the robot motion control system is more accurate and efficient. From the findings, compared with the traditional motion control system, the improved system significantly reduced the empty stroke length by more than 50%. The S-type acceleration and deceleration control algorithm effectively improved the stability of the swing arm and reduced the contour error. On the premise of ensuring the cutting accuracy, the improved method could improve the smoothness of the part, reduce burrs, and make the part more suitable. The proposed method can effectively improve the performance of robot motion control system and meet the requirements for product quality in practical applications.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36432 A Semi-supervised Generative Adversarial Network Algorithm for Alzheimer's Disease Analysis 2024-04-02T17:19:08+03:00 Jian Yan [email protected] Renzhou Gui [email protected] Hao Liang [email protected] <p>Currently, the study of Alzheimer's disease (AD) imaging classification based on deep learning has become a research hotspot. But due to the characteristics of AD samples with lack of labels and small samples, there are some difficulties in classifying task. In this paper, Semi-supervised generative adversarial network algorithm is designed. Firstly, an improved generative network algorithm is designed to extract and inherit features related to AD, while ignoring non-disease related variations of AD to the disease to generate new samples, achieving sample size expansion and data enhancement. Then, an unsupervised clustering algorithm is constructed to generate sample clustering categories, so that the new samples have different types of AD brain atrophy labels .The test results show that the algorithm achieves good and stable clustering on the real sample test dataset (ADNI-1), and identifies four types of AD brain atrophy patterns. The Calinski-Harabasz Index of the algorithm is calculated about 2388, and the Silhouette Coefficient Index is calculated about 0.588. With these cluster indexes, the algorithm has better clustering performance than traditional clustering methods such as k-means. These research results will contribute to further studying the classification of AD, and contribute to the analysis and diagnosis of the etiology of AD.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36021 Agricultural UAV Obstacle Avoidance System Based on a Depth Image Inverse Projection Algorithm and B-Spline Curve Trajectory Optimization Algorithm 2024-04-17T08:21:48+03:00 Zhanlin Liu [email protected] Sheng Wen [email protected] Gaofeng Huang [email protected] Sipei Li [email protected] Zhiyin Deng [email protected] <p>As multirotor plant protection drones evolve, using them for pest control has emerged as a mainstream trend. Nevertheless, lemon orchards often feature trees planned in an unstructured manner, with irregularly distributed obstacles such as utility poles posing safety threats to the operational tasks of unmanned aerial vehicles (UAVs) used for plant protection. To address such issues, this study proposes a novel UAV obstacle avoidance system designed for lemon orchards. The system has two components: a sensing and mapping subsystem using a depth image inverse projection algorithm, and a path planning subsystem that utilizes B-spline curve trajectory optimization. The system comprises a hardware description and software integration of the UAV, a map construction algorithm to sense obstacles in front of the UAV, and a path planning algorithm for obstacle avoidance. Two experimental scenarios were developed to evaluate the system’s flight performance: a flight test using the Gazebo simulation platform and a real-world test in a lemon orchard. In the simulation results, the flight trajectory’s average deviation from the original path was 2.77 m, and the maximum yaw angular velocity reached 1.001 rad/s. In the real-flight experiments, the flight trajectory’s average deviation from the original path was 2.90 m, and the maximum yaw angular velocity reached 1.545 rad/s. Both the simulation and real-flight experiments demonstrate that the system is effective in avoiding obstacles and planning paths in lemon orchards, providing a safe, smooth, and stable flight trajectory that meets the operational safety requirements for plant protection UAVs.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35101 Detecting the Medical Plant Association from PubMed Using Hypergraph-based Clustering with Dominating Set 2023-09-15T14:12:59+03:00 Pradeepa Sampath [email protected] Elizabeth Jomy [email protected] Ramya Kalyanaraman [email protected] Vimal Shamuganathan [email protected] Ruben Gonzalez Crespo [email protected] Prasun Chakrabarti [email protected] <p>Medicinal plants provide immunity against diseases and can also be taken in a precautionary sense against them. It is pivotal to know the benefits of these plants against various ailments. The identification of these plants’ essential properties can give a great impact on medicinal research and practice. This research focuses on identifying the cardinal properties of five plants namely- Aloe Vera, Fennel, Fenugreek, Mint, and Tulsi by using the concept of text analytic features and NLP functions. Text data on medicinal plants are extracted from the biomedical literature dataset. Text mining is used for the extraction of the implicit relations between medicinal plants and their biomedical properties. The intricate relationship between the keywords and the medicinal plants is captured using hypergraph clustering and dominating sets. The visualization of the correlation between the keywords and the plants is carried out by clustering. With an emphasis on their potential in preventative and medical care, this model lists the common characteristics and health advantages of medicinal plants. Strong clustering is indicated by the modularity score of 0.577, with five separate communities each reflecting a unique set of features. In order to facilitate future studies, these findings offer a methodical and data-driven viewpoint.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36602 A Hybrid Summarization Model for Legal Judgment Document Based on Domain Knowledge 2024-05-08T07:13:19+03:00 Yumei Song [email protected] Ruizhang Huang [email protected] Yanping Chen [email protected] Chuan Lin [email protected] Shuai Yu [email protected] Ruixue Tang [email protected] Yongbin Qin [email protected] <p>Legal judgment document summarization, as a task specific to the legal domain, involves automatically generating a fluent, informative, and well-organized summary from the original legal judgment document. Unlike traditional text summarization tasks, this domain-specific task places higher demands on content accuracy and completeness in the summary, while also requiring the preservation of the professional expression found in the original text. Consequently, conventional summarization methods often struggle to perform effectively in the legal domain. In response to this challenge, this paper introduces a hybrid summarization model tailored for legal judgment documents. Our model harnesses the strengths of both extractive and abstractive summarization methods, incorporating domain knowledge to enhance the summary generation process. We conduct extensive experiments to verify the effectiveness of our proposed method and compare the results with a baseline using ROUGE evaluation metrics. The experimental findings highlight that our model excels in providing more accurate and readable summarizations compared to traditional methods.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35805 Cross-supervised Crowd Counting via Multi-scale Channel Attention 2023-12-06T14:36:11+02:00 Kexin Yan [email protected] Fangjun Luan [email protected] Shuai Yuan [email protected] Guoqi Liu [email protected] <p>Due to the challenges posed by large-scale variability in crowd images and overlapping and occlusion of people in high-density regions, traditional CNNs with fixed-size convolution kernels or transformers lacking 2D locality and channel adaptation need to struggle to cope with this challenge. While Transformers have a global receptive field for long sequence tasks, CNNs exhibit better generalization and 2D locality. In order to combine the advantages of both approaches, this paper proposes a dual-branch multi-scale attention network (DBMSA-Net). First of all, we propose a multi-scale channel attention convolution module to extract features at different scales while enhancing channel adaptation. Furtherly, local features are augmented using a feed-forward neural network that is more suitable for visual tasks. Then an efficient lightweight multi-scale regression head is employed to predict density maps. Finally, progressive cross-head supervision is introduced as a loss function to dynamically supervise instance labels noise and mitigate its effect. Extensive experiments are conducted on three crowd counting datasets (ShanghaiTech Part A, ShanghaiTech Part B, UCF‐QNRF) to validate the effectiveness of the proposed method and the results show that DBMSA-Net outperforms state-of-the-art methods.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/34856 Effect Analysis of Malicious Flow Classification Model Based on Representation Learning on Network Flow Anomaly Detection 2024-04-13T17:50:09+03:00 Yan Hu [email protected] Xiaole Duan [email protected] Yuan Chen [email protected] Zhu Zhao [email protected] <p>Network traffic anomaly detection, as a key link of network security, has been paid more and more attention in recent years. Aiming at abnormal flow caused by improper network usage, this paper proposes a network flow anomaly detection model using representation learning. In this model, the study treats raw flow data as images directly through representation learning, and then classifies malicious flow by performing image classification tasks. The study is tested using the USTC-TFC2016 dataset. The experimental results show that the model exhibits excellent classification accuracy of 0.9990 both in the characterization of flow sessions and total flow, and PR and F1 values are all above 0.9907. In addition, the classification accuracy of the three classifiers for flow data is more than 98%, and the classification accuracy of normal flow and malicious flow is 100%. The experimental results show that the proposed method meets the needs of practical applications and has excellent classification performance. This provides a new research angle and direction for network flow anomaly detection.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35704 Multi-Scale Temporal Convolutional Networks and Multi-Head Attention for Robust Log Anomaly Detection 2023-11-23T11:41:50+02:00 Zhigang Zhang [email protected] Wei Li [email protected] Yizhe Wang [email protected] Zhe Wang [email protected] Xiang Sheng [email protected] Tianxiang Zhou [email protected] <p>System logs are instrumental in understanding computer system behavior and ensuring system stability and reliability, making anomaly detection in system logs crucial. However, with the increasing scale and complexity of modern software systems, log data is growing exponentially, rendering traditional manual log inspection methods inefficient. Moreover, the evolution of log messages over time results in a lower accuracy rate for anomaly detection. To address these issues, this paper proposes a log anomaly detection method based on multi-scale temporal convolution networks and multi-head attention. This method utilizes temporal convolution networks to extract temporal information from log data and extracts hidden features of logs through different receptive fields of multi-scale convolution kernels. By integrating the multi-head attention mechanism, the sequential dependencies of logs can be better captured. We conducted repeated experiments on the authoritative public HDFS and BGL log datasets to evaluate their detection accuracy and robustness. The experiments demonstrate that MTCNLog outperforms existing anomaly detection methods and is robust to the continuous evolution of logs.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36943 Few-Shot Learning on Edge Devices Using CLIP: A Resource-Efficient Approach for Image Classification 2024-05-14T19:44:11+03:00 Jin Lu [email protected] <p>In the field of deep learning, traditional image classification tasks typically require extensive annotated datasets and complex model training processes, which pose significant challenges for deployment on resource-constrained edge devices. To address these challenges, this study introduces a few-shot learning method based on OpenAI's CLIP model that significantly reduces computational demands by eliminating the need to run a text encoder at the inference stage. By pre-computing the embedding centers of classification text with a small set of image-text data, our approach enables the direct use of CLIP’s image encoder and pre-calculated text embeddings for efficient image classification. This adaptation not only allows for high-precision classification tasks on edge devices with limited computing capabilities but also achieves accuracy and recall rates that closely approximate those of the pre-trained ResNet approach while using far less data. Furthermore, our method halves the memory usage compared to other large-scale visual models of similar capacity by avoiding the use of a text encoder during inference, making it particularly suitable for low-resource environments. This comparative advantage underscores the efficiency of our approach in handling few-shot image classification tasks, demonstrating both competitive accuracy and practical viability in resource-limited settings. The outcomes of this research not only highlight the potential of the CLIP model in few-shot learning scenarios but also pave a new path for efficient, low-resource deep learning applications in edge computing environments.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35062 A CEEMDAN-based Stacking Ensemble Learning Method for SO2 Emission Forecast in a Wet FGD Process 2023-10-26T09:40:32+03:00 Xiaoli Li [email protected] Quanbo Liu [email protected] Kang Wang [email protected] <p>There has recently been increasing attention paid to sulphur dioxide (SO<sub>2</sub>) pollution owing to its hazardous effect on both human health and atmospheric environment. To handle this problem, the wet flue gas desulphurization (FGD) system has found wide applications in SO<sub>2</sub> emitting industries. Accurate prediction of SO<sub>2</sub> emissions in treated flue gas serves the purpose of providing timely operating guidance for the FGD system. However, the wet FGD process is characterized by highly nonlinear dynamics and non-stationarity, which poses significant difficulties and limitations for traditional modeling methods. To address above issues, in this article, an integrated model is proposed to perform SO<sub>2</sub> emission forecasting for an FGD process. Our integrated model comprises a multiplicity of techniques, including complete ensemble empirical mode decomposition with adaptive noise stacking ensemble learning (SEL) and permutation-based entropy (PEN). The serves as decomposing SO<sub>2</sub> emission signal, then the complexity of each decomposed sub-series is analyzed by PEN and ones with similar scores are combined, finally a stacking-based ensemble learning model which incorporates different types of member models are developed for modeling purposes. The proposed method was validated and evaluated by measurements of a real FGD system in a 600MW coal-fired unit, and experimental results illustrate the superiority of our method.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/34530 An Efficient Framework for Sensor Data Collection by UAV Based on Clustering, Two-Fold Ant Colony Optimization and Node Grouping 2023-11-22T07:22:40+02:00 Magdy Shayboub [email protected] Eman Reda [email protected] Hassan Al-Mahdi [email protected] Hamed Nassar [email protected] <p>Unmanned Aerial Vehicles (UAVs) are a promising solution for sensor data collection (DC) in large-scale area. The challenge is to minimize the DC route, which will reduce UAV energy consumption and data latency. The novelty of this paper lies in its innovative approach to optimizing sensor data collection by UAVs. It combines Ant Colony Optimization (ACO) and K-means algorithms to establish an initial shortest route and introduces a unique method for grouping sensor nodes (SNs) along the route based on the UAV's footprint, reducing data latency and energy consumption for both UAV and sensors. First, an initial shortest route that traverses all SNs is established based on the ACO and the K-means algorithms. Second, we group the sensor nodes (SNs) along the initial route using the footprint of the UAV, so that the latter can collect the data of the group in one stop, instead of stopping at each SN. By sequencing the hovering locations, we obtain a (shorter) intermediate route. Third, we shorten this route even further, by applying ACO to the set of hovering locations of the intermediate route. The solution has been implemented fully in Python. The results show that the route length gets shorter progressively with each phase. To evaluate the performance of the solution objectively, we have compared it with four states of the art solutions. The results show vividly that the proposed solution produces a DC route 19.28% shorter than the shortest route produced by the four competitive solutions. Moreover, it demonstrates a remarkable improvement by retaining 44% of energy in most SNs while over 99% energy depletion observed in the five state-of-the-art competitive solutions.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/33935 Real-time Interpreter for Short Sentences in Indian Sign Language Using MediaPipe and Deep Learning 2023-05-03T16:11:43+03:00 Suguna Mariappan [email protected] Ponmalar Murugesan [email protected] Hemapriya Muthamil Selvan [email protected] <div id="m#msg-a:r5241784768236898804" class="mail-message expanded"> <div class="mail-message-content collapsible zoom-normal mail-show-images "> <div class="clear"> <div dir="ltr"><span class="TextRun SCXW211746634 BCX0" lang="EN-GB" xml:lang="EN-GB" data-contrast="auto"><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">The expression of thoughts and feelings through communication plays a major part of human life in building relationship among others. Most of the population with hearing ability expresses their thoughts in their own or known language through voice-oriented communication. The people belonging to deaf-mute community uses hand movement gestures and expressions of face for communication which is called sign language. There exists a difficulty in building a conversation between the hearing community and non-hearing community. To mak</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">e easy conversation of deaf-</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">mute people with the external world and to connect the gap for communication between the hearing people and non-hearing people, we developed an interpreter that translates sign language to text. Most system developed for the recognition of Indian </span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">S</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">ign </span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">L</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">anguage is built for alphabets and numbers. We attempted in building a model for 15 meaningful short sentences of Indian sign gestures using</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">,</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text"> custom built video datasets captured using OpenCV, keypoints of hands, pose and face extracted using </span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">MediaPipe</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">. The model is trained using LSTM and achieved training and testing accuracy of 99.17% and 97.78%</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text"> respectively</span><span class="NormalTextRun SCXW211746634 BCX0" data-ccp-parastyle="Body Text">.</span></span><span class="EOP SCXW211746634 BCX0" data-ccp-props="{&quot;134245417&quot;:false,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:861,&quot;335559738&quot;:131,&quot;335559739&quot;:0,&quot;335559740&quot;:276}"> </span></div> </div> </div> </div> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/34807 FHPE-Net: Pedestrian Intention Prediction Using Fusion with Head Pose Estimation Based on RNN 2023-12-01T15:56:27+02:00 Zhiyong Yang [email protected] Zihang Guo [email protected] Ruixiang Zhang [email protected] Jieru Guo [email protected] Yu Zhou [email protected] <p>Accurate real-time prediction of pedestrian crossing intent during the autonomous driving process is crucial for ensuring the safety of both pedestrians and passengers, as well as improving riding comfort. However, existing methods for pedestrian crossing intent detection mostly rely on extracting complete pose information of pedestrians, leading to reduced accuracy when pedestrians are occluded. To address this issue, this paper proposes FHPE-Net: a lightweight, multi-branch prediction model that utilizes only the head pose features of pedestrians. In pedestrian crossing scenarios, pedestrian behavior is highly influenced by surrounding vehicles and the environment. FHPE-Net encodes pedestrian head poses and global context semantic image sequences to comprehensively capture spatiotemporal interaction features between pedestrians, vehicles, and the environment, thereby enhancing the accuracy of pedestrian crossing intent prediction. To improve the robustness of the FHPE-Net method, this study further processes bounding box positions and vehicle velocity features, making it more stable and reliable in complex traffic scenarios. Finally, a novel U-BiGRUs module is introduced for feature fusion, and an optimal fusion strategy is employed to achieve the best predictive performance in terms of F1 score and accuracy (ACC). Extensive ablation experiments are conducted on the PIE dataset, and performance analysis demonstrates that FHPE-Net achieves an accuracy of 90%, outperforming baseline methods such as PCPA and Multi-RNN, while using only pedestrian head pose features. This research holds significant guidance in enhancing traffic safety and optimizing urban traffic management. Furthermore, it provides essential technological support for advancing the commercialization of autonomous driving.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/36925 Preclinical Computer Virtual Reduction of Fracture Surgical Robot Based on Iterative Closest Point Algorithm 2024-05-17T08:15:29+03:00 Xinxing Zhang [email protected] Jun Qian [email protected] Yunsheng Mao [email protected] Yingqi Zhang [email protected] Juncai Ye [email protected] Yi Xun [email protected] Qinghua Yang [email protected] <p>Reduction is a crucial stage in the surgical treatment of bone fractures.The detailed fracture information of the patient can be obtained from computed tomography (CT) scans before surgery and enable physicians to plan preoperative reduction, to reduce the operation time and thus increase the probability of getting satisfactory results. The primary purpose of this paper is to design a computer-aided automatic registration method of fracture point cloud data, so as to simplify the fracture reduction process. In this paper, we propose an integrated fracture reduction system was introduced. The system enables direct semi-automatic processing from CT images to fracture reduction. First, a 3D fracture models is reconstructed from CT images by using the modified Marching Cube (MC) algorithm and is discretized to generate a point cloud. Second, the K-dimensional (KD) tree algorithm is used to cluster and segment the point clouds to identify different fracture fragments. Last, through the combination algorithm of Normal Distributions Transform(NDT) and modified Iterative Closest Point(ICP), the coarse alignment and fine registration of point clouds are achieved step by step. This method has been successfully applied to the reduction of tibial fracture. In the tests performed, the processing time of each step, the point cloud and the 3D model after registration are displayed. The semi-automatic integrated system based on preoperative CT scanning is used to realize fracture reduction,which provides a feasible foundation for minimally invasive and accurate fracture reduction surgery.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35708 A Lung Image Deep Learning Detection Model Based on Cross Residual Attention and Multi-feature Fusion 2024-02-01T13:16:32+02:00 Haosong Gou [email protected] Fanjie Zhao [email protected] Mingwei Tang [email protected] Gaoyi Zhang [email protected] Mingfeng Zhao [email protected] <p>Deep learning has become one of the hottest topics in medical image processing due to the development of deep learning technology. Currently, medical image research and applications suffer from two problems: a lack of data sets and an imbalance of classification categories. To solve these problems, we propose a method of residual attention and multi-feature fusion for lung image detection. Firstly, to integrate micro- and macro-feature extraction for medical image processing, two independent residual fusion strategies are designed, namely the Cross Residual Feature Extraction module (CRFE) and the Residual Attention Module (RAM). Secondly, a three-channel mechanism is designed for the Image Compensation Model (IFM). Using three channels and two residual fusion strategies, a multi-composite fusion architecture is produced to improve classifier performance. Finally, experimental results demonstrate that the proposed model performs better than the latest algorithms when compared with other medical image compensation methods.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/34619 Research on Intelligent Translation Method for Short Texts Based on Improved RNN Algorithm 2023-11-19T10:34:03+02:00 Yan Wang [email protected] Ying Wang [email protected] <p>As the trend towards internationalization accelerates and communication between countries and peoples becomes more important, the need for language translation becomes more urgent. Machine translation has received much attention as it is more labor and material efficient than human translation. However, current machine translation is still far from being fully automated and of high quality. The CRNN-embed model uses characters as input to the translation model, and proposes a word vector generation method with embedded CRNN, namely CRNN-embed. The model adopts a bidirectional GRU structure and introduces two attention mechanisms, CA-Cross Att and MC-SefAtt. The BLEU value of the CRNN-embed model improved by 2.57 percentage points compared to the baseline system after the attention mechanism was introduced. The BLEU values of the study model were higher than both the RNN-search and RNN-embed models, by 0.43 percentage points and 0.96 percentage points in char1, 2.02 percentage points and 3.06 percentage points in char2, respectively. As the size of the dataset increased, the model’s BLEU values and n-word accuracy also increased, and its translations improved significantly. The accuracy and fluency of this model are higher than those of the traditional neural machine translation model. The study model had better translation results and was superior among similar translation models.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control https://itc.ktu.lt/index.php/ITC/article/view/35213 Face Attribute Transfer Fusing Feature Enhancement and Structural Diversity Loss Function 2024-01-11T08:29:14+02:00 Yulin Sun [email protected] Chao Zhang [email protected] Fudong Yu [email protected] Haonan Xu [email protected] Qunqin Pan [email protected] <p>In the process of face attribute transfer, non-frontal and occluded face images often suffer from low generation quality, missing facial edges, and a lack of diversity. To address these challenges, we present the FES-StarGANv2, an unsupervised multi-domain face attribute transfer network. In the feature extraction phase, we incorporate an attention-guided feature fusion module aimed at enhancing image details while preserving the overall integrity of the transferred images. Moreover, a style code extraction module is presented, refining the style code of the target domain, enhancing the learning capabilities of the generator. To further augment image diversity and authenticity, a face image optimization module and a structural diversity loss function are integrated. Experimental results reveal that, in comparison with the baseline StarGANv2, our approach achieves substantial improvements of 23% and 3.9% in FID and LPIPS metrics, respectively, attaining optimal 13 and 0.453. Notably, in terms of visual quality, significant enhancements were observed, particularly in addressing issues of low image quality and edge deficiencies. The FES-StarGANv2 approach effectively addresses the challenges associated with non-frontal and occluded facial images.</p> 2024-09-25T00:00:00+03:00 Copyright (c) 2024 Information Technology and Control