skip to main content
research-article

Snippet Comment Generation Based on Code Context Expansion

Published: 23 November 2023 Publication History

Abstract

Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment aims to describe the functionality of the entire method, thereby providing a general comment at the beginning of the method. The snippet comment appears at multiple code segments in the body of a method, where a code segment is called a code snippet. Both of them help developers quickly understand code semantics, thereby improving code readability and code maintainability. However, existing automatic comment generation models mainly focus more on header comments, because there are public datasets to validate the performance. By contrast, it is challenging to collect datasets for snippet comments, because it is difficult to determine their scope. Even worse, code snippets are often too short to capture complete syntax and semantic information. To address this challenge, we propose a novel Snippet Comment Generation approach called SCGen. First, we utilize the context of the code snippet to expand the syntax and semantic information. Specifically, 600,243 snippet code-comment pairs are collected from 959 Java projects. Then, we capture variables from code snippets and extract variable-related statements from the context. After that, we devise an algorithm to parse and traverse abstract syntax tree (AST) information of code snippets and corresponding context. Finally, SCGen generates snippet comments after inputting the source code snippet and corresponding AST information into a sequence-to-sequence-based model. We conducted extensive experiments on the dataset we collected to evaluate our SCGen. Our approach obtains 18.23 in BLEU-4 metrics, 18.83 in METEOR, and 23.65 in ROUGE-L, which outperforms state-of-the-art comment generation models.

References

[1]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the 33rd International Conference on Machine Learning (IMCL’16), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, NY, 2091–2100. Retrieved from http://proceedings.mlr.press/v48/allamanis16.html
[2]
Oliver Arafat and Dirk Riehle. 2009. The comment density of open source software code. In Proceedings of the 31st International Conference on Software Engineering-Companion Volume (ICSE-Companion’09). IEEE, 195–198. DOI:
[3]
Oliver Arafat and Dirk Riehle. 2009. The commenting practice of open source. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications (OOPSLA’09). ACM, New York, NY, 857–864. DOI:
[4]
Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15). DOI:
[5]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72. Retrieved from https://aclanthology.info/papers/W05-0909/w05-0909
[6]
Huanchao Chen, Yuan Huang, Zhiyong Liu, Xiangping Chen, Fan Zhou, and Xiaonan Luo. 2019. Automatically detecting the scopes of source code comments. J. Syst. Softw. 153 (2019), 45–63. DOI:
[7]
Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Trans. Softw. Eng. Methodol. 30, 2, Article 25 (2021), 29 pages. DOI:
[8]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). ACL. Retrieved from https://arxiv.org/abs/2002.08155
[9]
Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for source code summarization. IEEE Trans. Softw. Eng. 43, 12 (2017), 1095–1109. DOI:
[10]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to comment “Translation”: Data, metrics, baselining & Evaluation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 746–757. DOI:
[11]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2 (ICSE’10). IEEE, New York, NY, 223–226. DOI:
[12]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Proceedings of the 17th Working Conference on Reverse Engineering (WCRE’10). IEEE, New York, NY, 35–44. DOI:
[13]
Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved automatic summarization of subroutines via attention to file context. In Proceedings of the 17th International Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, 300–310. Retrieved from
[14]
Xing Hu, Zhipeng Gao, Xin Xia, David Lo, and Xiaohu Yang. 2021. Automating user notice generation for smart contract functions. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). 5–17. DOI:
[15]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In IEEE/ACM 26th International Conference on Program Comprehension (ICPC’18). ACM, New York, NY, 200–210. DOI:
[16]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2020. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2020), 2179–2217.
[17]
Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Yingfei Xiong, and Zibin Zheng. 2020. Learning code context information to predict comment locations. IEEE Trans. Reliab. 69, 1 (2020), 88–105. DOI:
[18]
Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Zibin Zheng, and Xiapu Luo. 2020. CommtPst: Deep learning source code for commenting positions prediction. J. Syst. Softw. 170 (2020), 110754. DOI:
[19]
Yuan Huang, Shaohao Huang, Huanchao Chen, Xiangping Chen, Zibin Zheng, Xiapu Luo, Nan Jia, Xinyu Hu, and Xiaocong Zhou. 2020. Towards automatically generating block comments for code snippets. Inf. Softw. Technol. 127 (2020), 106373. DOI:
[20]
Yuan Huang, Nan Jia, Junhuai Shu, Xinyu Hu, Xiangping Chen, and Qiang Zhou. 2020. Does your code need comment? Softw.: Pract. Exper. 50, 3 (2020), 227–245.
[21]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). ACL, 2073–2083. DOI:
[22]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1412.6980
[23]
Philipp Koehn. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA’04), Robert E. Frederking and Kathryn B. Taylor (Eds.). Springer Berlin, 115–124. DOI:
[24]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20). ACM, New York, NY, 184–195. DOI:
[25]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, New York, NY, 795–806. DOI:
[26]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[27]
Boao Li, Meng Yan, Xin Xia, Xing Hu, Ge Li, and David Lo. 2020. DeepCommenter: A deep code comment generation tool with hybrid lexical and syntactical information. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’20). ACM, New York, NY, 1571–1575. DOI:
[28]
Yuzheng Li, Chuan Chen, Nan Liu, Huawei Huang, Zibin Zheng, and Qiang Yan. 2021. A blockchain-based decentralized federated learning framework with committee consensus. IEEE Netw. 35, 1 (2021), 234–241. DOI:
[29]
Zheng Li, Yonghao Wu, Bin Peng, Xiang Chen, Zeyu Sun, Yong Liu, and Deli Yu. 2021. SeCNN: A semantic CNN parser for code comment generation. J. Syst. Softw. 181 (2021), 111036. DOI:
[30]
Yuding Liang and Kenny Zhu. 2018. Automatic generation of text descriptive comments for code blocks. Proc. AAAI Conf. Artif. Intell. 32, 1 (2018). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/11963
[31]
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In Proceedings of the IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). 184–195. DOI:
[32]
Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization Branches Out, Post-conference Workshop of ACL. ACL, 74–81.
[33]
Peng-fei Liu and Xiao-meng Wang. 2020. Utilizing keywords in source code to improve code summarization. In Proceedings of the IEEE 6th International Conference on Computer and Communications (ICCC’20). IEEE, New York, NY, 664–668. DOI:
[34]
Zhongxin Liu, Xin Xia, Meng Yan, and Shanping Li. 2020. Automating just-in-time comment updating. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). ACM, New York, NY, 585–597. DOI:
[35]
Paul W. McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14). Association for Computing Machinery, New York, NY, 279–290. DOI:
[36]
Paul W. McBurney and Collin McMillan. 2016. Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42, 2 (2016), 103–119. DOI:
[37]
Paul W. McBurney and Collin McMillan. 2016. An empirical study of the textual similarity between source code and source code summaries. Empir. Softw. Eng. 21, 1 (2016), 17–42.
[38]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, New York, NY, 23–32. DOI:
[39]
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2017. ARENA: An approach for the automated generation of release notes. IEEE Trans. Softw. Eng. 43, 2 (2017), 106–127. DOI:
[40]
Laura Moreno and Andrian Marcus. 2017. Automatic software summarization: The state of the art. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C’17). 511–512. DOI:
[41]
Paul Oman and Jack Hagemeister. 1992. Metrics for assessing a software system’s maintainability. In Proceedings of the Conference on Software Maintenance. IEEE, 337–344. DOI:
[42]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). ACL, 311–318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf
[43]
Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, and Anjuli Kannan. 2018. Minimum word error rate training for attention-based sequence-to-sequence models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, New York, NY, 4839–4843. DOI:
[44]
Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2021. CAST: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. EMNLP.
[45]
Xiaotao Song, Sakib Haque Sun, Xu Wang, and Jiafei Yan. 2019. A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access 7 (2019), 111411–111428. DOI:
[46]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, 43–52. DOI:
[47]
Giriprasad Sridhara, Lori Pollock, and K. Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). ACM, New York, NY, 101–110. DOI:
[48]
Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang. 2020. A human study of comprehension and code summarization. In Proceedings of the 28th International Conference on Program Comprehension. ACM, New York, NY, 2–13. Retrieved from
[49]
Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). IEEE, New York, NY, 83–92. DOI:
[50]
Hieu Tran, Ngoc Tran, Son Nguyen, Hoan Nguyen, and Tien N. Nguyen. 2019. Recovering variable names for minified code with usage contexts. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, New York, NY, 1165–1175.
[51]
Carmine Vassallo, Sebastiano Panichella, Massimiliano Di Penta, and Gerardo Canfora. 2014. CODES: Mining source code descriptions from developers discussions. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14). ACM, New York, NY, 106–109. DOI:
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’17), I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[53]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18). ACM, New York, NY, 397–407. DOI:
[54]
Deze Wang, Yong Guo, Wei Dong, Zhiming Wang, Haoran Liu, and Shanshan Li. 2019. Deep code-comment understanding and assessment. IEEE Access 7 (2019), 174200–174209. DOI:
[55]
Bolin Wei. 2019. Retrieve and refine: Exemplar-based neural comment generation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, New York, NY, 1250–1252. DOI:
[56]
Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A large-scale empirical study on code-comment inconsistencies. In Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC’19). IEEE, New York, NY, 53–64.
[57]
Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied transformers: Neural machine translation with shared encoder and decoder. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19), Vol. 33. AAAI, 5466–5473. DOI:
[58]
Guang Yang, Ke Liu, Xiang Chen, Yanlin Zhou, Chi Yu, and Hao Lin. 2022. CCGIR: Information retrieval-based code comment generation method for smart contracts. Knowl.-based Syst. 237 (2022), 107858. DOI:
[59]
Yatao Yang, Zibin Zheng, Xiangdong Niu, Mingdong Tang, Yutong Lu, and Xiangke Liao. 2021. A location-based factorization machine model for web service QoS prediction. IEEE Trans. Serv. Comput. 14, 5 (2021), 1264–1277. DOI:
[60]
Annie T. T. Ying and Martin P. Robillard. 2013. Code fragment summarization. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’13). ACM, New York, NY, 655–658. DOI:
[61]
Le Yu, Tao Zhang, Xiapu Luo, Lei Xue, and Henry Chang. 2017. Toward automatically generating privacy policy for Android apps. IEEE Trans. Inf. Forens. Secur. 12, 4 (2017), 865–880. DOI:
[62]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, New York, NY, 1385–1397.
[63]
Xiaoqin Zhang, Runhua Jiang, Tao Wang, and Jinxin Wang. 2021. Recursive neural network for video deblurring. IEEE Trans. Circ. Syst. Vid. Technol. 31, 8 (2021), 3025–3036. DOI:
[64]
Zibin Zheng, Xiaoli Li, Mingdong Tang, Fenfang Xie, and Michael R. Lyu. 2022. Web service QoS prediction via collaborative filtering: A survey. IEEE Trans. Serv. Comput. 15, 4 (2022), 2455–2472. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 1
January 2024
933 pages
EISSN:1557-7392
DOI:10.1145/3613536
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 November 2023
Online AM: 31 July 2023
Accepted: 03 July 2023
Revised: 22 May 2023
Received: 22 August 2022
Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Snippet comment generation
  2. code summarization
  3. neural machine translation
  4. contextual information

Qualifiers

  • Research-article

Funding Sources

  • Key-Area Research and Development Program of Guangdong Province
  • National Natural Science Foundation of China
  • Guangdong Basic and Applied Basic Research Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 615
    Total Downloads
  • Downloads (Last 12 months)502
  • Downloads (Last 6 weeks)64
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media