Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
Exploring Spatial Frequency Information for Enhanced Video Prediction Quality
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8955–8968https://doi.org/10.1109/TMM.2024.3384062Video prediction is a challenging spatiotemporal prediction task that generates future frames based on historical observations. Although recently proposed deep learning-based methods significantly outperform legacy approaches, there still exist gaps ...
- research-articleJuly 2024
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9657–9670https://doi.org/10.1109/TMM.2024.3396272Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal ...
- research-articleJune 2024
Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9839–9853https://doi.org/10.1109/TMM.2024.3400675Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and ...
- research-articleMay 2024
A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10225–10236https://doi.org/10.1109/TMM.2024.3405718The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on ...
- research-articleMay 2024
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10395–10405https://doi.org/10.1109/TMM.2024.3407679The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, ...
-
- research-articleMay 2024
Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10367–10380https://doi.org/10.1109/TMM.2024.3407677Recently, person re-identification has gained significant attention from both academic and industry fields due to its potential applications in surveillance and security. However, the security of re-identification systems has not been widely investigated, ...
- research-articleMay 2024
MuJo-SF: Multimodal Joint Slot Filling for Attribute Value Prediction of E-Commerce Commodities
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10354–10366https://doi.org/10.1109/TMM.2024.3407667Supplementing product attribute information is a critical step for E-commerce platforms, which further benefits various downstream tasks, including product recommendation, product search, and product knowledge graph construction. Intuitively, the visual ...
- research-articleMay 2024
Self-Similarity Prior Distillation for Unsupervised Remote Physiological Measurement
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10290–10305https://doi.org/10.1109/TMM.2024.3405720Remote photoplethysmography (rPPG) is a non-invasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the ...
- research-articleMay 2024
SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10331–10341https://doi.org/10.1109/TMM.2024.3406196Currently, there is a growing scholarly and industrial interest in micro-video-centric research. Within these domains, multi-label learning has emerged as a fundamental yet attractive subject. Existing methods primarily place emphasis on feature ...
- research-articleMay 2024
Split Computing With Scalable Feature Compression for Visual Analytics on the Edge
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10121–10133https://doi.org/10.1109/TMM.2024.3406165Running deep visual analytics models for real-time applications is challenging for mobile devices. Offloading the computation to edge server can mitigate computation bottleneck at the mobile device, but may decrease the analytics performance due to the ...
- research-articleMay 2024
Pyramid Fusion Transformer for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9630–9643https://doi.org/10.1109/TMM.2024.3396281The recently proposed MaskFormer [Cheng et al. (2021)] gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates ...
- research-articleMay 2024
Cross-Modal Quantization for Co-Speech Gesture Generation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10251–10263https://doi.org/10.1109/TMM.2024.3405743Learning proper representations for speech and gesture is essential for co-speech gesture generation. Existing approaches either utilize direct representations or independently encode the speech and gesture, which neglect the joint representation to ...
- research-articleMay 2024
Progressive Diversity Generation for Single Domain Generalization
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10200–10210https://doi.org/10.1109/TMM.2024.3405732Single domain generalization (single-DG) is a realistic yet challenging domain generalization scenario where a model trained on a single domain generalization scenario where a model trained on a single domain generalizes well to multiple unseen domains. ...
- research-articleMay 2024
Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10211–10224https://doi.org/10.1109/TMM.2024.3405729Deeplearning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-...
- research-articleMay 2024
Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10264–10278https://doi.org/10.1109/TMM.2024.3405713Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground truth. However, some fine-grained categories in the real world have few available images, and the existing few-shot models have difficulty ...
- research-articleMay 2024
Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action Recognition
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10189–10199https://doi.org/10.1109/TMM.2024.3405712Self-supervised skeleton action recognition has gained notable attention for its reduced reliance on annotated data. Contrastive learning methods, in particular, have emerged as prominent approaches. These works typically utilize a spatial-temporal ...
- research-articleMay 2024
Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9949–9961https://doi.org/10.1109/TMM.2024.3405662Image-Text Sentiment Analysis task has garnered increased attention in recent years due to the surge in user-generated content on social media platforms. Previous research efforts have made noteworthy progress by leveraging the affective concepts shared ...
- research-articleMay 2024
Manifold-Based Incomplete Multi-View Clustering via Bi-Consistency Guidance
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10001–10014https://doi.org/10.1109/TMM.2024.3405650Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete ...
- research-articleMay 2024
Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10081–10093https://doi.org/10.1109/TMM.2024.3405648Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn ...
- research-articleMay 2024
Difference-Aware Distillation for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10069–10080https://doi.org/10.1109/TMM.2024.3405619In recent years, various distillation methods for semantic segmentation have been proposed. However, these methods typically train the student model to imitate the intermediate features or logits of the teacher model directly, thereby overlooking the high-...