skip to main content
10.1145/2964284.2967216acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Cross-modal Retrieval by Real Label Partial Least Squares

Published: 01 October 2016 Publication History

Abstract

This paper proposes a novel method named Real Label Partial Least Squares (RL-PLS) for the task of cross-modal retrieval. Pervious works just take the texts and images as two modalities in PLS. But in RL-PLS, considering that the class label is more related to the semantics directly, we take the class label as the assistant modality. Specially, we build two KPLS models and project both images and texts into the label space. Then, the similarity of images and texts can be measured more accurately in the label space. Furthermore, we do not restrict the label indicator values as the binary values as the traditional methods. By contraries, in RL-PLS, the label indicator values are set to the real values. Specially, the label indicator values are comprised by two parts: positive or negative represents the sample class while the absolute value represents the local structure in the class. By this way, the discriminate ability of RL-PLS is improved greatly. To show the effectiveness of RL-PLS, the experiments are conducted on two cross-modal retrieval tasks (Wiki and Pascal Voc2007), on which the competitive results are obtained.

References

[1]
J. Costa P, E. Coviello, G. Doyle, N. Rasiwasia, G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):521--535, 2014.
[2]
B. Ding and R. Gentleman. Classification using generalized partial least squares. Journal of Computational and Graphical Statistics, pages 280--298, 2012.
[3]
M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge 2007 (voc 2007) results. 2008.
[4]
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 817--824, 2011.
[5]
G. Guo and X. Wang. A study on human age estimation under facial expression changes. IEEE Conference on Computer Vision and Pattern Recognition, pages 2547--2553, 2012.
[6]
M. Haj, J. Gonzalez, and L. Davis. On partial least squares in head pose estimation: How to simultaneously deal with misalignment. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 2602--2609, 2012.
[7]
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639--2664, 2004.
[8]
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. Proc. IEEE International Conference on Computer Vision, pages 2407--2414, 2011.
[9]
C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Transactions on Multimedia, 17(3):370--381, 2015.
[10]
A. Li, S. Shan, X. Chen, and W. Gao. Maximizing intra-individual correlations for face recognition across pose differences. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 605--611, 2009.
[11]
A. Li, S. Shan, X. Chen, and W. Gao. Cross-pose face recognition based on partial least squares. Pattern Recognition Letters, 32(15):1948--1955, 2011.
[12]
X. Mao, B. Lin, D. Cai, X. He, and J. Pei. Parallel field alignment for cross media retrieval. ACM Multimedia, pages 897--906, 2013.
[13]
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. ACM Multimedia, pages 251--260, 2010.
[14]
R. Rosipal and N. Kr\"amer. Overview and recent advances in partial least squares. Subspace, latent structure and feature selection, pages 34--51, 2006.
[15]
R. Rosipal and L. J. Trejo. Kernel partial least squares regression in reproducing kernel hilbert space. The Journal of Machine Learning Research, 2:97--123, 2002.
[16]
A. Sharma and D. W. Jacobs. Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 593--600, 2011.
[17]
A. Sharma, A. Kumar, H. D. III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 2160--2167, 2012.
[18]
J. Tang, H. Wang, and Y. Yan. Learning hough regression models via bridge partial least squares for object detection. Neurocomputing, 152:236--249, 2015.
[19]
J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural computation, 12(6):1247--1283, 2000.
[20]
Y. Verma and C. Jawahar. Im2text and text2im: Associating images and texts for cross-modal retrieval. Proc. British Machine Vision Conference, 2014.
[21]
J. Wang, S. Kumar, and S. Chang. Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2393--2406, 2012.
[22]
K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. Proc. IEEE International Conference on Computer Vision, pages 2088--2095.
[23]
X. Wang, V. Ly, G. Guo, and C. Kambhamettu. A new approach for 2d-3d heterogeneous face recognition. IEEE International Symposium on Multimedia, pages 301--304, 2013.
[24]
L. Xie, P. Pan, and Y. Lu. A semantic model for cross-modal and multi-modal retrieval. ACM International Conference on Multimedia Retrieval, pages 175--182, 2013.
[25]
Y. Zhuang, Y. F. Wang, F. Wu, Y. Zhang, and W. Lu. Supervised coupled dictionary learning with group structures for multi-modal retrieval. AAAI Conference on Artificial Intelligence, 2013.

Cited By

View all
  • (2024)AI-based Space Occupancy Estimation Using Environmental Sensor Data2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)10.1109/ISGT59692.2024.10454176(1-5)Online publication date: 19-Feb-2024
  • (2021)Learning Feature Representation and Partial Correlation for Multimodal Multi-Label DataIEEE Transactions on Multimedia10.1109/TMM.2020.300496323(1882-1894)Online publication date: 2021
  • (2020)Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.294249422:5(1310-1322)Online publication date: May-2020
  • Show More Cited By

Index Terms

  1. Cross-modal Retrieval by Real Label Partial Least Squares

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '16: Proceedings of the 24th ACM international conference on Multimedia
    October 2016
    1542 pages
    ISBN:9781450336031
    DOI:10.1145/2964284
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modal retrieval
    2. images and documents
    3. multimedia
    4. partial least squares

    Qualifiers

    • Short-paper

    Funding Sources

    • 863 Program of China
    • Natural Science Foundation of China (NSFC)
    • National Basic Research Program of China (973 Program)

    Conference

    MM '16
    Sponsor:
    MM '16: ACM Multimedia Conference
    October 15 - 19, 2016
    Amsterdam, The Netherlands

    Acceptance Rates

    MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AI-based Space Occupancy Estimation Using Environmental Sensor Data2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)10.1109/ISGT59692.2024.10454176(1-5)Online publication date: 19-Feb-2024
    • (2021)Learning Feature Representation and Partial Correlation for Multimodal Multi-Label DataIEEE Transactions on Multimedia10.1109/TMM.2020.300496323(1882-1894)Online publication date: 2021
    • (2020)Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.294249422:5(1310-1322)Online publication date: May-2020
    • (2019)Joint Graph Regularization in a Homogeneous Subspace for Cross-Media RetrievalJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2019.p093923:5(939-946)Online publication date: 20-Sep-2019
    • (2019)Adaptive Semi-Supervised Feature Selection for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2018.287712721:5(1276-1288)Online publication date: May-2019
    • (2018)A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative RepresentationJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2018.p028022:2(280-289)Online publication date: 20-Mar-2018
    • (2018)Coupled feature selection based semi-supervised modality-dependent cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-5958-9Online publication date: 21-Apr-2018
    • (2017)Semi-supervised Distance Consistent Cross-modal RetrievalProceedings of the Workshop on Visual Analysis in Smart and Connected Communities10.1145/3132734.3132735(25-31)Online publication date: 23-Oct-2017
    • (2017)Towards Improving Canonical Correlation Analysis for Cross-modal RetrievalProceedings of the on Thematic Workshops of ACM Multimedia 201710.1145/3126686.3126726(332-339)Online publication date: 23-Oct-2017
    • (2017)Multimodal Similarity Gaussian Process Latent Variable ModelIEEE Transactions on Image Processing10.1109/TIP.2017.271304526:9(4168-4181)Online publication date: Sep-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media