Computer vision

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,777,522 records)|Limit your search to The ACM Full-Text Collection (762,673 records)

Showing 1 - 20of1,420 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
August 2024
Exploring Spatial Frequency Information for Enhanced Video Prediction Quality
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8955–8968https://doi.org/10.1109/TMM.2024.3384062
Video prediction is a challenging spatiotemporal prediction task that generates future frames based on historical observations. Although recently proposed deep learning-based methods significantly outperform legacy approaches, there still exist gaps ...
0
Metrics
Total Citations0
research-article
August 2024
Multi-Space Point Geometry Compression With Progressive Relation-Aware Transformer
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8969–8980https://doi.org/10.1109/TMM.2024.3384057
Deep network-based point cloud geometry compression is becoming more crucial and attractive due to constantly expanding 3D applications. The current strategy employing holistic point clouds as input imposes limitations on the compressed point cloud size ...
0
Metrics
Total Citations0
research-article
July 2024
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9657–9670https://doi.org/10.1109/TMM.2024.3396272
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal ...
0
Metrics
Total Citations0
research-article
June 2024
Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification
- Peiguang Jing,
- Xuan Zhao,
- Fugui Fan,
- Fan Yang,
- Yun Li,
- Yuting Su
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10134–10144https://doi.org/10.1109/TMM.2024.3405724
Micro-videos, as an increasingly popular form of user-generated content (UGC), naturally include diverse multimodal cues. However, in pursuit of consistent representations, existing methods neglect the simultaneous consideration of exploring modality ...
0
Metrics
Total Citations0
research-article
June 2024
Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9936–9948https://doi.org/10.1109/TMM.2024.3417694
We study the task of image retrieval with text feedback, where a reference image and modification text are composed to retrieve the desired target image. To accomplish this goal, existing methods always get the multimodal representations through different ...
0
Metrics
Total Citations0
research-article
June 2024
Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9839–9853https://doi.org/10.1109/TMM.2024.3400675
Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and ...
0
Metrics
Total Citations0
research-article
June 2024
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10041–10054https://doi.org/10.1109/TMM.2024.3405660
Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel ...
0
Metrics
Total Citations0
research-article
June 2024
Width-Adaptive CNN: Fast CU Partition Prediction for VVC Screen Content Coding
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9372–9382https://doi.org/10.1109/TMM.2024.3410116
Screen content coding (SCC) in Versatile Video Coding (VVC) improves the coding efficiency of screen content videos (SCVs) significantly but results in high computational complexity due to the quad-tree plus multi-type tree (QTMT) structure of the coding ...
0
Metrics
Total Citations0
research-article
June 2024
Estimating the Semantics via Sector Embedding for Image-Text Retrieval
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10342–10353https://doi.org/10.1109/TMM.2024.3407664
Based on deterministic single-point embedding, most extant image-text retrieval methods only focus on the match of ground truth while suffering from one-to-many correspondence, where besides annotated positives, many similar instances of another modality ...
0
Metrics
Total Citations0
research-article
May 2024
A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10225–10236https://doi.org/10.1109/TMM.2024.3405718
The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on ...
0
Metrics
Total Citations0
research-article
May 2024
PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10029–10040https://doi.org/10.1109/TMM.2024.3405649
Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion ...
0
Metrics
Total Citations0
research-article
May 2024
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
- Shan Li,
- Lu Yang,
- Pu Cao,
- Liulei Li,
- Huadong Ma
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10395–10405https://doi.org/10.1109/TMM.2024.3407679
The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, ...
0
Metrics
Total Citations0
research-article
May 2024
Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10367–10380https://doi.org/10.1109/TMM.2024.3407677
Recently, person re-identification has gained significant attention from both academic and industry fields due to its potential applications in surveillance and security. However, the security of re-identification systems has not been widely investigated, ...
0
Metrics
Total Citations0
research-article
May 2024
RUN: Rethinking the UNet Architecture for Efficient Image Restoration
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10381–10394https://doi.org/10.1109/TMM.2024.3407656
Recent advanced image restoration (IR) methods typically stack homogeneous operators hierarchically in the UNet architecture. To achieve higher accuracy, these models are now going deeper and more complex, making them resource-intensive. After ...
0
Metrics
Total Citations0
research-article
May 2024
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
- Chen Liu,
- Peike Li,
- Hu Zhang,
- Lincheng Li,
- Zi Huang,
- Dadong Wang,
- Xin Yu
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10015–10028https://doi.org/10.1109/TMM.2024.3405622
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image. However, this ...
0
Metrics
Total Citations0
research-article
May 2024
SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10331–10341https://doi.org/10.1109/TMM.2024.3406196
Currently, there is a growing scholarly and industrial interest in micro-video-centric research. Within these domains, multi-label learning has emerged as a fundamental yet attractive subject. Existing methods primarily place emphasis on feature ...
0
Metrics
Total Citations0
research-article
May 2024
Cross-Modality Vessel Re-Identification With Deep Alignment Decomposition Network
- Zaidao Wen,
- Jinhui Wu,
- Yafei Lv,
- Qian Wu
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10318–10330https://doi.org/10.1109/TMM.2024.3406193
Cross-modality vessel re-identification (ReID) presents a formidable challenge in the domain of maritime surveillance, necessitating the development of robust methodologies to accurately match vessels across disparate imaging modalities. This paper ...
0
Metrics
Total Citations0
research-article
May 2024
Pyramid Fusion Transformer for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9630–9643https://doi.org/10.1109/TMM.2024.3396281
The recently proposed MaskFormer [Cheng et al. (2021)] gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates ...
0
Metrics
Total Citations0
research-article
May 2024
Progressive Diversity Generation for Single Domain Generalization
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10200–10210https://doi.org/10.1109/TMM.2024.3405732
Single domain generalization (single-DG) is a realistic yet challenging domain generalization scenario where a model trained on a single domain generalization scenario where a model trained on a single domain generalizes well to multiple unseen domains. ...
0
Metrics
Total Citations0
research-article
May 2024
Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10211–10224https://doi.org/10.1109/TMM.2024.3405729
Deeplearning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-...
0
Metrics
Total Citations0

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

All Publications

Content Type

Publisher

Publication Date

Exploring Spatial Frequency Information for Enhanced Video Prediction Quality

Multi-Space Point Geometry Compression With Progressive Relation-Aware Transformer

Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings

Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification

Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback

Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification

Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments

Width-Adaptive CNN: Fast CU Partition Prediction for VVC Screen Content Coding

Estimating the Semantics via Sector Embedding for Image-Text Retrieval

A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance

PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders

Frequency-Based Matcher for Long-Tailed Semantic Segmentation

Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy

RUN: Rethinking the UNet Architecture for Efficient Image Restoration

BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge

SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification

Cross-Modality Vessel Re-Identification With Deep Alignment Decomposition Network

Pyramid Fusion Transformer for Semantic Segmentation

Progressive Diversity Generation for Single Domain Generalization

Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics