skip to main content
research-article

Mutual-Assistance Learning for Object Detection

Published: 27 September 2023 Publication History

Abstract

Object detection is a fundamental yet challenging task in computer vision. Despite the great strides made over recent years, modern detectors may still produce unsatisfactory performance due to certain factors, such as non-universal object features and single regression manner. In this paper, we draw on the idea of mutual-assistance (MA) learning and accordingly propose a robust one-stage detector, referred as MADet, to address these weaknesses. First, the spirit of MA is manifested in the head design of the detector. Decoupled classification and regression features are reintegrated to provide shared offsets, avoiding inconsistency between feature-prediction pairs induced by zero or erroneous offsets. Second, the spirit of MA is captured in the optimization paradigm of the detector. Both anchor-based and anchor-free regression fashions are utilized jointly to boost the capability to retrieve objects with various characteristics, especially for large aspect ratios, occlusion from similar-sized objects, etc. Furthermore, we meticulously devise a quality assessment mechanism to facilitate adaptive sample selection and loss term reweighting. Extensive experiments on standard benchmarks verify the effectiveness of our approach. On MS-COCO, MADet achieves 42.5% AP with vanilla ResNet50 backbone, dramatically surpassing multiple strong baselines and setting a new state of the art.

References

[1]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[2]
C. Lang, G. Cheng, B. Tu, C. Li, and J. Han, “Base and meta: A new perspective on few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 10 669–10 686, Sep. 2023.
[3]
Y. Zhang et al., “ByteTrack: Multi-object tracking by associating every detection box,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 1–21.
[4]
S. Karen and Z. Andrew, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–13.
[5]
C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
[6]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[7]
L. Liu et al., “Deep learning for generic object detection: A survey,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, 2020.
[8]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[9]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788.
[10]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 318–327.
[11]
H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 734–750.
[12]
Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9626–9635.
[13]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 213–229.
[14]
G. Cheng et al., “Towards large-scale small object detection: Survey and benchmarks,” IEEE Trans. Pattern Anal. Mach. Intell., early access, Jun. 29, 2023.
[15]
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, pp. 303–308, 2009.
[16]
T.-Y. Lin et al., “Microsoft coco: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[17]
S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement neural network for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4203–4212.
[18]
Z. Yang, S. Liu, H. Hu, L. Wang, and S. Lin, “RepPoints: Point set representation for object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9657–9666.
[19]
H. Zhang, Y. Wang, F. Dayoub, and N. Sunderhauf, “VarifocalNet: An IoU-aware dense object detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8514–8523.
[20]
Z. Chen, C. Yang, Q. Li, F. Zhao, Z.-J. Zha, and F. Wu, “Disentangle your dense object detector,” in Proc. ACM Int. Conf. Multimedia, 2021, pp. 4939–4948.
[21]
J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 764–773.
[22]
W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 21–37.
[23]
T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, “FoveaBox: Beyound anchor-based object detection,” IEEE Trans. Image Process., vol. 29, pp. 7389–7398, 2020.
[24]
S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9756–9765.
[25]
X. Li et al., “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” in Proc. Adv. Neural Inform. Process. Syst., 2020, pp. 21 002–21 012.
[26]
C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, “TOOD: Task-aligned one-stage object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 3490–3499.
[27]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988.
[28]
Y. Chen, C. Han, N. Wang, and Z. Zhang, “Revisiting feature alignment for one-stage object detection,” 2019,.
[29]
J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, “Region proposal by guided anchoring,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2965–2974.
[30]
T. Vu, H. Jang, T. X. Pham, and C. Yoo, “Cascade RPN: Delving into high-quality region proposal network with adaptive convolution,” in Proc. Adv. Neural Inform. Process. Syst., 2019, pp. 1432–1442.
[31]
Y. Wu et al., “Rethinking classification and localization for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10 186–10 195.
[32]
G. Song, Y. Liu, and X. Wang, “Revisiting the sibling head in object detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 563–11 572.
[33]
X. Wang et al., “A weakly-supervised framework for COVID-19 classification and lesion localization from Chest CT,” IEEE Trans. Med. Imag., vol. 39, no. 8, pp. 2615–2625, Aug. 2020.
[34]
G. Cheng, Q. Li, G. Wang, X. Xie, L. Min, and J. Han, “SFRNet: Fine-grained oriented object recognition via separate feature refinement,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–10, 2023.
[35]
S. Huang, Z. Lu, R. Cheng, and C. He, “FaPN: Feature-aligned pyramid network for dense image prediction,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 864–873.
[36]
Z. Huang, Y. Wei, X. Wang, W. Liu, T. S. Huang, and H. Shi, “AlignSeg: Feature-aligned segmentation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 550–557, Jan. 2022.
[37]
G. Cheng, C. Lang, and J. Han, “Holistic prototype activation for few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4650–4666, Apr. 2023.
[38]
T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2025–2033.
[39]
C. Lang, G. Cheng, B. Tu, and J. Han, “Few-shot segmentation via divide-and-conquer proxies,” Int. J. Comput. Vis., pp. 1–23, 2023.
[40]
Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6154–6162.
[41]
X. Xie, G. Cheng, Q. Li, S. Miao, K. Li, and J. Han, “Fewer is more: Efficient object detection in large aerial images,” Sci. China Inf. Sci., pp. 1–18, 2023.
[42]
R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.
[43]
H. Qiu, H. Li, Q. Wu, and H. Shi, “Offset bin classification network for accurate object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2020, pp. 13 188–13 197.
[44]
J. Wang et al., “Side-aware boundary localization for more precise object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 403–419.
[45]
A. Newell, Z. Huang, and J. Deng, “Associative embedding: End-to-end learning for joint detection and grouping,” in Proc. Adv. Neural Inform. Process. Syst., 2017, pp. 2274–2284.
[46]
X. Zhou, J. Zhuo, and P. Krahenbuhl, “Bottom-up object detection by grouping extreme and center points,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 850–859.
[47]
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 658–666.
[48]
K. Kim and H. S. Lee, “Probabilistic anchor assignment with iou prediction for object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 355–371.
[49]
J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra R-CNN: Towards balanced learning for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 821–830.
[50]
X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid R-CNN,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7363–7372.
[51]
Y. Li, Y. Chen, N. Wang, and Z. Zhang, “Scale-aware trident networks for object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6054–6063.
[52]
P. Sun et al., “Sparse R-CNN: End-to-end object detection with learnable proposals,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 454–14 463.
[53]
X. Zhang, F. Wan, C. Liu, R. Ji, and Q. Ye, “FreeAnchor: Learning to match anchors for visual object detection,” in Proc. Adv. Neural Inform. Process. Syst., 2019, pp. 147–155.
[54]
Y. Chen, Z. Zhang, Y. Cao, L. Wang, S. Lin, and H. Hu, “Reppoints V2: Verification meets regression for object detection,” in Proc. Adv. Neural Inform. Process. Syst., 2020, pp. 5621–5631.
[55]
H. Qiu, Y. Ma, Z. Li, S. Liu, and J. Sun, “BorderDet: Border feature for dense object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 549–564.
[56]
Z. Yang et al., “Dense reppoints: Representing visual objects with dense point sets,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 227–244.
[57]
X. Li, W. Wang, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss V2: Learning reliable localization quality estimation for dense object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11 632–11 641.
[58]
Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun, “OTA: Optimal transport assignment for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 303–312.
[59]
X. Dai et al., “Dynamic head: Unifying object detection heads with attentions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7373–7382.
[60]
H. Qiu et al., “CrossDet: Crossline representation for object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 3195–3204.
[61]
C. H. Nguyen, T. C. Nguyen, T. N. Tang, and N. L. Phan, “Improving object detection by label assignment distillation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2022, pp. 1005–1014.
[62]
S. Li, C. He, R. Li, and L. Zhang, “A dual weighting label assignment scheme for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 9387–9396.
[63]
Z. Zheng et al., “Localization distillation for dense object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 9407–9416.
[64]
S. Shao et al., “CrowdHuman: A benchmark for detecting human in a crowd,” 2018,.
[65]
J. Wang, L. Song, Z. Li, H. Sun, J. Sun, and N. Zheng, “End-to-end object detection with fully convolutional network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15 849–15 858.
[66]
X. Li, C. Lv, W. Wang, G. Li, L. Yang, and J. Yang, “Generalized focal loss: Towards efficient representation learning for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3139–3153, Mar. 2023.
[67]
P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, pp. 743–761, Apr. 2012.
[68]
K. Chen et al., “MMDetection: Open MMLab detection toolbox and benchmark,” 2019,.
[69]
Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 10 012–10 022.
[70]
S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5987–5995.
[71]
D. Bolya, S. Foley, J. Hays, and J. Hoffman, “TIDE: A general toolbox for identifying object detection errors,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 558–573.

Cited By

View all
  • (2024)Understanding Negative Proposals in Generic Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336766634:7(5818-5829)Online publication date: 1-Jul-2024
  • (2024)Retentive Compensation and Personality Filtering for Few-Shot Remote Sensing Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336716834:7(5805-5817)Online publication date: 1-Jul-2024
  • (2024)Cross-Level Attentive Feature Aggregation for Change DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334409234:7(6051-6062)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Mutual-Assistance Learning for Object Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
        IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 45, Issue 12
        Dec. 2023
        1966 pages

        Publisher

        IEEE Computer Society

        United States

        Publication History

        Published: 27 September 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Understanding Negative Proposals in Generic Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336766634:7(5818-5829)Online publication date: 1-Jul-2024
        • (2024)Retentive Compensation and Personality Filtering for Few-Shot Remote Sensing Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336716834:7(5805-5817)Online publication date: 1-Jul-2024
        • (2024)Cross-Level Attentive Feature Aggregation for Change DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334409234:7(6051-6062)Online publication date: 1-Jul-2024
        • (2024)SparseSwinNeurocomputing10.1016/j.neucom.2024.127433580:COnline publication date: 1-May-2024
        • (2024)Smelly, dense, and spreadedExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124576255:PBOnline publication date: 1-Dec-2024
        • (2024)Oriented R-CNN and BeyondInternational Journal of Computer Vision10.1007/s11263-024-01989-w132:7(2420-2442)Online publication date: 1-Jul-2024
        • (2023)Boosting Knowledge Distillation via Intra-Class Logit Distribution SmoothingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.332711334:6(4190-4201)Online publication date: 23-Oct-2023

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media