short-paper

Cross-modal Retrieval by Real Label Partial Least Squares

Authors:

Qingming HuangAuthors Info & Claims

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 227 - 231

https://doi.org/10.1145/2964284.2967216

Published: 01 October 2016 Publication History

Abstract

This paper proposes a novel method named Real Label Partial Least Squares (RL-PLS) for the task of cross-modal retrieval. Pervious works just take the texts and images as two modalities in PLS. But in RL-PLS, considering that the class label is more related to the semantics directly, we take the class label as the assistant modality. Specially, we build two KPLS models and project both images and texts into the label space. Then, the similarity of images and texts can be measured more accurately in the label space. Furthermore, we do not restrict the label indicator values as the binary values as the traditional methods. By contraries, in RL-PLS, the label indicator values are set to the real values. Specially, the label indicator values are comprised by two parts: positive or negative represents the sample class while the absolute value represents the local structure in the class. By this way, the discriminate ability of RL-PLS is improved greatly. To show the effectiveness of RL-PLS, the experiments are conducted on two cross-modal retrieval tasks (Wiki and Pascal Voc2007), on which the competitive results are obtained.

References

[1]

J. Costa P, E. Coviello, G. Doyle, N. Rasiwasia, G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):521--535, 2014.

Digital Library

[2]

B. Ding and R. Gentleman. Classification using generalized partial least squares. Journal of Computational and Graphical Statistics, pages 280--298, 2012.

[3]

M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge 2007 (voc 2007) results. 2008.

[4]

Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 817--824, 2011.

Digital Library

[5]

G. Guo and X. Wang. A study on human age estimation under facial expression changes. IEEE Conference on Computer Vision and Pattern Recognition, pages 2547--2553, 2012.

Digital Library

[6]

M. Haj, J. Gonzalez, and L. Davis. On partial least squares in head pose estimation: How to simultaneously deal with misalignment. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 2602--2609, 2012.

Digital Library

[7]

D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639--2664, 2004.

Digital Library

[8]

Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. Proc. IEEE International Conference on Computer Vision, pages 2407--2414, 2011.

Digital Library

[9]

C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Transactions on Multimedia, 17(3):370--381, 2015.

Digital Library

[10]

A. Li, S. Shan, X. Chen, and W. Gao. Maximizing intra-individual correlations for face recognition across pose differences. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 605--611, 2009.

[11]

A. Li, S. Shan, X. Chen, and W. Gao. Cross-pose face recognition based on partial least squares. Pattern Recognition Letters, 32(15):1948--1955, 2011.

Digital Library

[12]

X. Mao, B. Lin, D. Cai, X. He, and J. Pei. Parallel field alignment for cross media retrieval. ACM Multimedia, pages 897--906, 2013.

Digital Library

[13]

N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. ACM Multimedia, pages 251--260, 2010.

Digital Library

[14]

R. Rosipal and N. Kr\"amer. Overview and recent advances in partial least squares. Subspace, latent structure and feature selection, pages 34--51, 2006.

Digital Library

[15]

R. Rosipal and L. J. Trejo. Kernel partial least squares regression in reproducing kernel hilbert space. The Journal of Machine Learning Research, 2:97--123, 2002.

Digital Library

[16]

A. Sharma and D. W. Jacobs. Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 593--600, 2011.

Digital Library

[17]

A. Sharma, A. Kumar, H. D. III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 2160--2167, 2012.

Digital Library

[18]

J. Tang, H. Wang, and Y. Yan. Learning hough regression models via bridge partial least squares for object detection. Neurocomputing, 152:236--249, 2015.

Digital Library

[19]

J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural computation, 12(6):1247--1283, 2000.

Digital Library

[20]

Y. Verma and C. Jawahar. Im2text and text2im: Associating images and texts for cross-modal retrieval. Proc. British Machine Vision Conference, 2014.

[21]

J. Wang, S. Kumar, and S. Chang. Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2393--2406, 2012.

Digital Library

[22]

K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. Proc. IEEE International Conference on Computer Vision, pages 2088--2095.

Digital Library

[23]

X. Wang, V. Ly, G. Guo, and C. Kambhamettu. A new approach for 2d-3d heterogeneous face recognition. IEEE International Symposium on Multimedia, pages 301--304, 2013.

Digital Library

[24]

L. Xie, P. Pan, and Y. Lu. A semantic model for cross-modal and multi-modal retrieval. ACM International Conference on Multimedia Retrieval, pages 175--182, 2013.

Digital Library

[25]

Y. Zhuang, Y. F. Wang, F. Wu, Y. Zhang, and W. Lu. Supervised coupled dictionary learning with group structures for multi-modal retrieval. AAAI Conference on Artificial Intelligence, 2013.

Digital Library

Cited By

Zhang ZHe JKumar ARahman S(2024)AI-based Space Occupancy Estimation Using Environmental Sensor Data2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)10.1109/ISGT59692.2024.10454176(1-5)Online publication date: 19-Feb-2024
https://doi.org/10.1109/ISGT59692.2024.10454176
Song GWang SHuang QTian Q(2021)Learning Feature Representation and Partial Correlation for Multimodal Multi-Label DataIEEE Transactions on Multimedia10.1109/TMM.2020.300496323(1882-1894)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3004963
Wu YWang SHuang Q(2020)Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.294249422:5(1310-1322)Online publication date: May-2020
https://doi.org/10.1109/TMM.2019.2942494
Show More Cited By

Index Terms

Cross-modal Retrieval by Real Label Partial Least Squares
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Cross-modal Retrieval with Label Completion
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Cross-modal retrieval has been attracting increasing attention because of the explosion of multi-modal data, e.g., texts and images. Most supervised cross-modal retrieval methods learn discriminant common subspaces minimizing the heterogeneity of ...
Multi-label double-layer learning for cross-modal retrieval

This paper proposes a novel method named Multi-label Double-layer Learning (MDLL) for multi-label cross-modal retrieval task. MDLL includes two stages (layers): L2C (Label to Common) and C2L (Common to Label). In the L2C stage, considering that labels ...
Two-stage zero-shot sparse hashing with missing labels for cross-modal retrieval
Abstract
Recently, zero-shot cross-modal hashing has gained significant popularity due to its ability to effectively realize the retrieval of emerging concepts within multimedia data. Although the existing approaches have shown impressive results, the ...
Highlights
- Each stage operates independently, resulting in a concise and efficient algorithm.
- Our approach emphasizes mitigating the impact of missing labels.
- We train a classifier to predict labels for instances with missing labels.
- We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '16: Proceedings of the 24th ACM international conference on Multimedia

October 2016

1542 pages

ISBN:9781450336031

DOI:10.1145/2964284

General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft

Copyright � 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

863 Program of China
Natural Science Foundation of China (NSFC)
National Basic Research Program of China (973 Program)

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 15 - 19, 2016

Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
297
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZHe JKumar ARahman S(2024)AI-based Space Occupancy Estimation Using Environmental Sensor Data2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)10.1109/ISGT59692.2024.10454176(1-5)Online publication date: 19-Feb-2024
https://doi.org/10.1109/ISGT59692.2024.10454176
Song GWang SHuang QTian Q(2021)Learning Feature Representation and Partial Correlation for Multimodal Multi-Label DataIEEE Transactions on Multimedia10.1109/TMM.2020.300496323(1882-1894)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3004963
Wu YWang SHuang Q(2020)Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.294249422:5(1310-1322)Online publication date: May-2020
https://doi.org/10.1109/TMM.2019.2942494
Qi YZhang H(2019)Joint Graph Regularization in a Homogeneous Subspace for Cross-Media RetrievalJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2019.p093923:5(939-946)Online publication date: 20-Sep-2019
https://doi.org/10.20965/jaciii.2019.p0939
Yu ESun JLi JChang XHan XHauptmann A(2019)Adaptive Semi-Supervised Feature Selection for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2018.287712721:5(1276-1288)Online publication date: May-2019
https://doi.org/10.1109/TMM.2018.2877127
Shang FZhang HSun JLiu LZeng H(2018)A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative RepresentationJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2018.p028022:2(280-289)Online publication date: 20-Mar-2018
https://doi.org/10.20965/jaciii.2018.p0280
Yu ESun JWang LWan WZhang H(2018)Coupled feature selection based semi-supervised modality-dependent cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-5958-9Online publication date: 21-Apr-2018
https://doi.org/10.1007/s11042-018-5958-9
Dong XYu EGao MZhu LSun JZhang HLiu XMu YJiang YLuo J(2017)Semi-supervised Distance Consistent Cross-modal RetrievalProceedings of the Workshop on Visual Analysis in Smart and Connected Communities10.1145/3132734.3132735(25-31)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3132734.3132735
Shao JZhao ZSu FYue TWu WYang JTian QZimmermann R(2017)Towards Improving Canonical Correlation Analysis for Cross-modal RetrievalProceedings of the on Thematic Workshops of ACM Multimedia 201710.1145/3126686.3126726(332-339)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3126686.3126726
Song GWang SHuang QTian Q(2017)Multimodal Similarity Gaussian Process Latent Variable ModelIEEE Transactions on Image Processing10.1109/TIP.2017.271304526:9(4168-4181)Online publication date: Sep-2017
https://doi.org/10.1109/TIP.2017.2713045
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents