Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
Exploring Spatial Frequency Information for Enhanced Video Prediction Quality
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8955–8968https://doi.org/10.1109/TMM.2024.3384062Video prediction is a challenging spatiotemporal prediction task that generates future frames based on historical observations. Although recently proposed deep learning-based methods significantly outperform legacy approaches, there still exist gaps ...
- research-articleAugust 2024
Multi-Space Point Geometry Compression With Progressive Relation-Aware Transformer
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8969–8980https://doi.org/10.1109/TMM.2024.3384057Deep network-based point cloud geometry compression is becoming more crucial and attractive due to constantly expanding 3D applications. The current strategy employing holistic point clouds as input imposes limitations on the compressed point cloud size ...
- research-articleJuly 2024
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9657–9670https://doi.org/10.1109/TMM.2024.3396272Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal ...
- research-articleJune 2024
Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10134–10144https://doi.org/10.1109/TMM.2024.3405724Micro-videos, as an increasingly popular form of user-generated content (UGC), naturally include diverse multimodal cues. However, in pursuit of consistent representations, existing methods neglect the simultaneous consideration of exploring modality ...
- research-articleJune 2024
Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9936–9948https://doi.org/10.1109/TMM.2024.3417694We study the task of image retrieval with text feedback, where a reference image and modification text are composed to retrieve the desired target image. To accomplish this goal, existing methods always get the multimodal representations through different ...
-
- research-articleJune 2024
Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9839–9853https://doi.org/10.1109/TMM.2024.3400675Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and ...
- research-articleJune 2024
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10041–10054https://doi.org/10.1109/TMM.2024.3405660Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel ...
- research-articleJune 2024
Width-Adaptive CNN: Fast CU Partition Prediction for VVC Screen Content Coding
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9372–9382https://doi.org/10.1109/TMM.2024.3410116Screen content coding (SCC) in Versatile Video Coding (VVC) improves the coding efficiency of screen content videos (SCVs) significantly but results in high computational complexity due to the quad-tree plus multi-type tree (QTMT) structure of the coding ...
- research-articleJune 2024
Estimating the Semantics via Sector Embedding for Image-Text Retrieval
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10342–10353https://doi.org/10.1109/TMM.2024.3407664Based on deterministic single-point embedding, most extant image-text retrieval methods only focus on the match of ground truth while suffering from one-to-many correspondence, where besides annotated positives, many similar instances of another modality ...
- research-articleMay 2024
A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10225–10236https://doi.org/10.1109/TMM.2024.3405718The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on ...
- research-articleMay 2024
PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10029–10040https://doi.org/10.1109/TMM.2024.3405649Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion ...
- research-articleMay 2024
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10395–10405https://doi.org/10.1109/TMM.2024.3407679The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, ...
- research-articleMay 2024
Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10367–10380https://doi.org/10.1109/TMM.2024.3407677Recently, person re-identification has gained significant attention from both academic and industry fields due to its potential applications in surveillance and security. However, the security of re-identification systems has not been widely investigated, ...
- research-articleMay 2024
RUN: Rethinking the UNet Architecture for Efficient Image Restoration
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10381–10394https://doi.org/10.1109/TMM.2024.3407656Recent advanced image restoration (IR) methods typically stack homogeneous operators hierarchically in the UNet architecture. To achieve higher accuracy, these models are now going deeper and more complex, making them resource-intensive. After ...
- research-articleMay 2024
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10015–10028https://doi.org/10.1109/TMM.2024.3405622Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image. However, this ...
- research-articleMay 2024
SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10331–10341https://doi.org/10.1109/TMM.2024.3406196Currently, there is a growing scholarly and industrial interest in micro-video-centric research. Within these domains, multi-label learning has emerged as a fundamental yet attractive subject. Existing methods primarily place emphasis on feature ...
- research-articleMay 2024
Cross-Modality Vessel Re-Identification With Deep Alignment Decomposition Network
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10318–10330https://doi.org/10.1109/TMM.2024.3406193Cross-modality vessel re-identification (ReID) presents a formidable challenge in the domain of maritime surveillance, necessitating the development of robust methodologies to accurately match vessels across disparate imaging modalities. This paper ...
- research-articleMay 2024
Pyramid Fusion Transformer for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9630–9643https://doi.org/10.1109/TMM.2024.3396281The recently proposed MaskFormer [Cheng et al. (2021)] gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates ...
- research-articleMay 2024
Progressive Diversity Generation for Single Domain Generalization
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10200–10210https://doi.org/10.1109/TMM.2024.3405732Single domain generalization (single-DG) is a realistic yet challenging domain generalization scenario where a model trained on a single domain generalization scenario where a model trained on a single domain generalizes well to multiple unseen domains. ...
- research-articleMay 2024
Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10211–10224https://doi.org/10.1109/TMM.2024.3405729Deeplearning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-...