skip to main content
research-article

Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction

Published: 10 February 2021 Publication History

Abstract

Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surveys 51, 4 (2018), 81.
[2]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning. 2091--2100.
[3]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=H1gKYo09tX.
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proc. ACM Program. Lang. 3 (2019), 40.
[5]
Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32.
[6]
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press, Boca Raton, FL.
[7]
Minghao Chen and Xiaojun Wan. 2019. Neural comment generation for source code with auxiliary code classification task. In Proceedings of the 26th Asia-Pacific Software Engineering Conference (APSEC’19). IEEE, 522--529.
[8]
Qiuyuan Chen, Han Hu, and Zhaoyi Liu. 2019. Code summarization with abstract syntax tree. In Proceedings of the International Conference on Neural Information Processing. Springer, 652--660.
[9]
Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 826--831.
[10]
Sergio Cozzetti B. de Souza, Nicolas Anquetil, and Káthia M. de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting and Designing for Pervasive Information. ACM, 68--75.
[11]
Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, 13--22.
[12]
Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2019. The impact of changes mislabeled by SZZ on just-in-time defect prediction. IEEE Trans. Softw. Eng. (2019), 1--1. https://ieeexplore.ieee.org/abstract/document/8765743.
[13]
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters.Psychol. Bull. 76, 5 (1971), 378.
[14]
Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. 2016. TASSAL: Autofolding for source code summarization. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C’16). IEEE, 649--652.
[15]
Georgia Frantzeskou, Stephen MacDonell, Efstathios Stamatatos, and Stefanos Gritzalis. 2008. Examining the significance of high-level programming features in source code author classification. J. Syst. Softw. 81, 3 (2008), 447--460.
[16]
Emanuel Giger, Martin Pinzger, and Harald C. Gall. 2012. Can we predict types of code changes? An empirical analysis. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12). IEEE, 217--226.
[17]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE’10), Vol. 2. ACM, 223.
[18]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Proceedings of the 17th Working Conference on Reverse Engineering. IEEE, 35--44.
[19]
Han Hu, Qiuyuan Chen, and Zhaoyi Liu. 2019. Code generation from supervised code embeddings. In Proceedings of the International Conference on Neural Information Processing. Springer, 388--396.
[20]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.
[21]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2019. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2019), 1--39. https://link.springer.com/article/10.1007/s10664-019-09730-9.
[22]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI, 2269--2275.
[23]
Walid M. Ibrahim, Nicolas Bettenburg, Bram Adams, and Ahmed E. Hassan. 2012. On the relationship between comment update practices and software bugs. J. Syst. Softw. 85, 10 (2012), 2293--2304.
[24]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2073--2083.
[25]
Mira Kajko-Mattsson. 2005. A survey of documentation practice within corrective maintenance. Empir. Softw. Eng. 10, 1 (2005), 31--55.
[26]
Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. 2012. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data. 6, 4 (2012), 1--21.
[27]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. MIT Press, 3146--3154.
[28]
Alexander LeClair and Collin McMillan. 2019. Recommendations for datasets for source code summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 3931--3937.
[29]
Yuding Liang and Kenny Qili Zhu. 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
[30]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summar. Branches Out (2004), 74--81. https://www.aclweb.org/anthology/W04-1013/.
[31]
Mario Linares-Vasquez, Collin McMillan, Denys Poshyvanyk, and Mark Grechanik. 2014. On using machine learning to automatically classify software applications into domain categories. Empir. Softw. Eng. 19, 3 (2014), 582--618.
[32]
Bohong Liu, Tao Wang, Xunhui Zhang, Qiang Fan, Gang Yin, and Jinsheng Deng. 2019. A neural-network-based code summarization approach by using source code and its call dependencies. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (Internetware’19). ACM, New York, NY, 12:1--12:10.
[33]
Zhongxin Liu, Xin Xia, Ahmed E. Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: How far are we? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 373--384.
[34]
Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, and Shanping Li. 2019. Automatic generation of pull request descriptions. Retrieved from https://Arxiv:1909.06987.
[35]
Annie Louis, Santanu Kumar Dash, Earl T. Barr, Michael D. Ernst, and Charles Sutton. 2020. Where should I comment my code? A dataset and model for predicting locations that need comments. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. 21--24.
[36]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https://Arxiv:1508.04025.
[37]
Walid Maalej and Martin P. Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Trans. Softw. Eng. 39, 9 (2013), 1264--1282.
[38]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch�tze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.
[39]
Paul W. McBurney. 2015. Automatic documentation generation via source code summarization. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 903--906.
[40]
Paul W. McBurney, Cheng Liu, Collin McMillan, and Tim Weninger. 2014. Improving topic model source code summarization. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 291--294.
[41]
Paul W. McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 279--290.
[42]
Paul W. McBurney and Collin McMillan. 2016. Automatic source code summarization of context for java methods. IEEE Trans. Softw. Eng. 42, 2 (2016), 103--119.
[43]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, 23--32.
[44]
Dana Movshovitz-Attias and William Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 35--40.
[45]
Gail C. Murphy, Mik Kersten, and Leah Findlater. 2006. How are java software developers using the elipse IDE? IEEE Softw. 23, 4 (2006), 76--83.
[46]
Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, Xinyi Wang, and John Wieting. 2019. compare-mt: A tool for holistic comparison of language generation systems. Retrieved from https://Arxiv:1903.07926.
[47]
Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search. IEEE Trans. Services Comput. 9, 5 (2016), 771--783.
[48]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311--318.
[49]
Luca Pascarella and Alberto Bacchelli. 2017. Classifying code comments in java open-source software systems. In Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, 227--237.
[50]
Luca Pascarella, Magiel Bruntink, and Alberto Bacchelli. 2019. Classifying code comments in java software systems. Empir. Softw. Eng. 24, 3 (2019), 1--39. https://link.springer.com/article/10.1007/s10664-019-09694-w.
[51]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825--2830.
[52]
Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas Guibas. 2015. Learning program embeddings to propagate feedback on student code. In Proceedings of the 32nd International Conference on International Conference on Machine Learning. JMLR. org, 1093--1102.
[53]
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney D’Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th International Conference on Software Engineering. ACM, 390--401.
[54]
Carolyn B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Trans. Softw. Eng. 4 (1999), 557--572.
[55]
Lea Sgier. 2012. Qualitative data analysis. An Initiat. Gebert Ruf Stift 19 (2012), 19--21. http://politicalscience.ceu.edu/sites/politicalscience.ceu.hu/files/attachment/course/733/sgierqualitativedataanalysis_40.pdf.
[56]
Shikhar Sharma, Layla El Asri, Hannes Schulz, and Jeremie Zumer. 2017. Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. Retrieved from https://Arxiv:1706.09799.
[57]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. ACM, 43--52.
[58]
Giriprasad Sridhara, Lori Pollock, and K. Vijay-Shanker. 2011. Generating parameter comments and integrating with method summaries. In Proceedings of the IEEE 19th International Conference on Program Comprehension. IEEE, 71--80.
[59]
Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). Ieee, 83--92.
[60]
Secil Ugurel, Robert Krovetz, and C. Lee Giles. 2002. What’s the code? automatic classification of source code archives. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 632--638.
[61]
Juriaan Kennedy van Dam and Vadim Zaytsev. 2016. Software language identification with natural language classifiers. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 624--628.
[62]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is all you need. Retrieved from http://arxiv.org/abs/1706.03762.
[63]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 397--407.
[64]
Zhiyuan Wan, Xin Xia, Ahmed E. Hassan, David Lo, Jianwei Yin, and Xiaohu Yang. 2018. Perceptions, expectations, and challenges in defect prediction. IEEE Trans. Softw. Eng. 46, 11 (2018), 1241--1266. https://ieeexplore.ieee.org/abstract/document/8502824.
[65]
Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: Bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 708--719.
[66]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip Yu, and Guandong Xu. 2020. Reinforcement-learning-guided source code summarization via hierarchical attention. IEEE Trans. Softw. Eng. (2020), 1--1. https://ieeexplore.ieee.org/abstract/document/9031440.
[67]
Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A large-scale empirical study on code-comment inconsistencies. In Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC’19). IEEE, 53--64.
[68]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). IEEE, 380--389.
[69]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Trans. Softw. Eng. 44, 10 (2017), 951--976.
[70]
Xin Xia, Zhiyuan Wan, Pavneet Singh Kochhar, and David Lo. 2019. How practitioners perceive coding proficiency. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 924--935.
[71]
Di Yang, Aftab Hussain, and Cristina Videira Lopes. 2016. From query to usable code: An analysis of stack overflow code snippets. In Proceedings of the IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 391--401.
[72]
Juan Zhai, Xiangzhe Xu, Yu Shi, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, and Xiangyu Zhang. 2020. CPC: Automatically classifying and propagating natural language comments via program analysis. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). IEEE/ACM. Retrieved from https://rucore.libraries.rutgers.edu/rutgers-lib/61591/.

Cited By

View all
  • (2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024
  • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
  • (2024)MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment GenerationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644401(74-86)Online publication date: 15-Apr-2024
  • Show More Cited By

Index Terms

  1. Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 30, Issue 2
    Continuous Special Section: AI and SE
    April 2021
    463 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3446657
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 February 2021
    Accepted: 01 November 2020
    Revised: 01 November 2020
    Received: 01 May 2020
    Published in�TOSEM�Volume 30, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code summarization
    2. code comment
    3. comment classification

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Australian Research Council's Discovery Early Career Researcher Award (DECRA)
    • National Key R8D Program of China
    • NSFC Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)135
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated MonkeyACM Transactions on Software Engineering and Methodology10.1145/366481033:6(1-31)Online publication date: 27-Jun-2024
    • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
    • (2024)MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment GenerationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644401(74-86)Online publication date: 15-Apr-2024
    • (2024)An Extractive-and-Abstractive Framework for Source Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363274233:3(1-39)Online publication date: 14-Mar-2024
    • (2024)Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363197533:3(1-37)Online publication date: 15-Mar-2024
    • (2024)Esale: Enhancing Code-Summary Alignment Learning for Source Code SummarizationIEEE Transactions on Software Engineering10.1109/TSE.2024.342227450:8(2077-2095)Online publication date: Aug-2024
    • (2024)On the Effectiveness of Large Language Models in Statement-level Code Summarization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00030(216-227)Online publication date: 1-Jul-2024
    • (2024)iiPCS: Intent-Based In-Context Learning for Project-Specific Code Summarization2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650776(1-8)Online publication date: 30-Jun-2024
    • (2024)Enhancing source code classification effectiveness via prompt learning incorporating knowledge featuresScientific Reports10.1038/s41598-024-69402-714:1Online publication date: 30-Aug-2024
    • (2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media