Machine learning

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,777,522 records)|Limit your search to The ACM Full-Text Collection (762,694 records)

Showing 1 - 20of1,045 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
August 2024
Exploring Spatial Frequency Information for Enhanced Video Prediction Quality
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8955–8968https://doi.org/10.1109/TMM.2024.3384062
Video prediction is a challenging spatiotemporal prediction task that generates future frames based on historical observations. Although recently proposed deep learning-based methods significantly outperform legacy approaches, there still exist gaps ...
0
Metrics
Total Citations0
research-article
July 2024
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9657–9670https://doi.org/10.1109/TMM.2024.3396272
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal ...
0
Metrics
Total Citations0
research-article
June 2024
Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9839–9853https://doi.org/10.1109/TMM.2024.3400675
Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and ...
0
Metrics
Total Citations0
research-article
May 2024
A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10225–10236https://doi.org/10.1109/TMM.2024.3405718
The Diffusion model has a strong ability to generate wild images. However, the model can just generate inaccurate images with the guidance of text, which makes it very challenging to directly apply the text-guided generative model for virtual try-on ...
0
Metrics
Total Citations0
research-article
May 2024
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
- Shan Li,
- Lu Yang,
- Pu Cao,
- Liulei Li,
- Huadong Ma
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10395–10405https://doi.org/10.1109/TMM.2024.3407679
The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, ...
0
Metrics
Total Citations0
research-article
May 2024
Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10367–10380https://doi.org/10.1109/TMM.2024.3407677
Recently, person re-identification has gained significant attention from both academic and industry fields due to its potential applications in surveillance and security. However, the security of re-identification systems has not been widely investigated, ...
0
Metrics
Total Citations0
research-article
May 2024
MuJo-SF: Multimodal Joint Slot Filling for Attribute Value Prediction of E-Commerce Commodities
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10354–10366https://doi.org/10.1109/TMM.2024.3407667
Supplementing product attribute information is a critical step for E-commerce platforms, which further benefits various downstream tasks, including product recommendation, product search, and product knowledge graph construction. Intuitively, the visual ...
0
Metrics
Total Citations0
research-article
May 2024
Self-Similarity Prior Distillation for Unsupervised Remote Physiological Measurement
- Xinyu Zhang,
- Weiyu Sun,
- Hao Lu,
- Ying Chen,
- Yun Ge,
- Xiaolin Huang,
- Jie Yuan,
- Yingcong Chen
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10290–10305https://doi.org/10.1109/TMM.2024.3405720
Remote photoplethysmography (rPPG) is a non-invasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the ...
0
Metrics
Total Citations0
research-article
May 2024
SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10331–10341https://doi.org/10.1109/TMM.2024.3406196
Currently, there is a growing scholarly and industrial interest in micro-video-centric research. Within these domains, multi-label learning has emerged as a fundamental yet attractive subject. Existing methods primarily place emphasis on feature ...
0
Metrics
Total Citations0
research-article
May 2024
Split Computing With Scalable Feature Compression for Visual Analytics on the Edge
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10121–10133https://doi.org/10.1109/TMM.2024.3406165
Running deep visual analytics models for real-time applications is challenging for mobile devices. Offloading the computation to edge server can mitigate computation bottleneck at the mobile device, but may decrease the analytics performance due to the ...
0
Metrics
Total Citations0
research-article
May 2024
Pyramid Fusion Transformer for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9630–9643https://doi.org/10.1109/TMM.2024.3396281
The recently proposed MaskFormer [Cheng et al. (2021)] gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates ...
0
Metrics
Total Citations0
research-article
May 2024
Cross-Modal Quantization for Co-Speech Gesture Generation
- Zheng Wang,
- Wei Zhang,
- Long Ye,
- Dan Zeng,
- Tao Mei
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10251–10263https://doi.org/10.1109/TMM.2024.3405743
Learning proper representations for speech and gesture is essential for co-speech gesture generation. Existing approaches either utilize direct representations or independently encode the speech and gesture, which neglect the joint representation to ...
0
Metrics
Total Citations0
research-article
May 2024
Progressive Diversity Generation for Single Domain Generalization
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10200–10210https://doi.org/10.1109/TMM.2024.3405732
Single domain generalization (single-DG) is a realistic yet challenging domain generalization scenario where a model trained on a single domain generalization scenario where a model trained on a single domain generalizes well to multiple unseen domains. ...
0
Metrics
Total Citations0
research-article
May 2024
Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10211–10224https://doi.org/10.1109/TMM.2024.3405729
Deeplearning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-...
0
Metrics
Total Citations0
research-article
May 2024
Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10264–10278https://doi.org/10.1109/TMM.2024.3405713
Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground truth. However, some fine-grained categories in the real world have few available images, and the existing few-shot models have difficulty ...
0
Metrics
Total Citations0
research-article
May 2024
Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action Recognition
- Xinghan Wang,
- Yadong Mu
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10189–10199https://doi.org/10.1109/TMM.2024.3405712
Self-supervised skeleton action recognition has gained notable attention for its reduced reliance on annotated data. Contrastive learning methods, in particular, have emerged as prominent approaches. These works typically utilize a spatial-temporal ...
0
Metrics
Total Citations0
research-article
May 2024
Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9949–9961https://doi.org/10.1109/TMM.2024.3405662
Image-Text Sentiment Analysis task has garnered increased attention in recent years due to the surge in user-generated content on social media platforms. Previous research efforts have made noteworthy progress by leveraging the affective concepts shared ...
0
Metrics
Total Citations0
research-article
May 2024
Manifold-Based Incomplete Multi-View Clustering via Bi-Consistency Guidance
- Huibing Wang,
- Mingze Yao,
- Yawei Chen,
- Yunqiu Xu,
- Haipeng Liu,
- Wei Jia,
- Xianping Fu,
- Yang Wang
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10001–10014https://doi.org/10.1109/TMM.2024.3405650
Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete ...
0
Metrics
Total Citations0
research-article
May 2024
Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10081–10093https://doi.org/10.1109/TMM.2024.3405648
Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn ...
0
Metrics
Total Citations0
research-article
May 2024
Difference-Aware Distillation for Semantic Segmentation
- Jianping Gou,
- Xiabin Zhou,
- Lan Du,
- Yibing Zhan,
- Wu Chen,
- Zhang Yi
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10069–10080https://doi.org/10.1109/TMM.2024.3405619
In recent years, various distillation methods for semantic segmentation have been proposed. However, these methods typically train the student model to imitate the intermediate features or logits of the teacher model directly, thereby overlooking the high-...
0
Metrics
Total Citations0

Applied Filters

People

Names

Institutions

Authors

Publications

All Publications

Content Type

Publisher

Publication Date

Exploring Spatial Frequency Information for Enhanced Video Prediction Quality

Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings

Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification

A Two-Stage Personalized Virtual Try-On Framework With Shape Control and Texture Guidance

Frequency-Based Matcher for Long-Tailed Semantic Segmentation

Towards Robust Person Re-Identification by Adversarial Training With Dynamic Attack Strategy

MuJo-SF: Multimodal Joint Slot Filling for Attribute Value Prediction of E-Commerce Commodities

Self-Similarity Prior Distillation for Unsupervised Remote Physiological Measurement

SADCMF: Self-Attentive Deep Consistent Matrix Factorization for Micro-Video Multi-Label Classification

Split Computing With Scalable Feature Compression for Visual Analytics on the Edge

Pyramid Fusion Transformer for Semantic Segmentation

Cross-Modal Quantization for Co-Speech Gesture Generation

Progressive Diversity Generation for Single Domain Generalization

Opinion-Unaware Blind Image Quality Assessment Using Multi-Scale Deep Feature Statistics

Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation

Localized Linear Temporal Dynamics for Self-Supervised Skeleton Action Recognition

Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis

Manifold-Based Incomplete Multi-View Clustering via Bi-Consistency Guidance

Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering

Difference-Aware Distillation for Semantic Segmentation