Skip to main content

Showing 1–50 of 238 results for author: Bai, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.10312  [pdf, other

    cs.IT

    Achievable Second-Order Asymptotics for MAC and RAC with Additive Non-Gaussian Noise

    Authors: Yiming Wang, Lin Bai, Zhuangfei Wu, Lin Zhou

    Abstract: We first study the two-user additive noise multiple access channel (MAC) where the noise distribution is arbitrary. For such a MAC, we use spherical codebooks and either joint nearest neighbor (JNN) or successive interference cancellation (SIC) decoding. Under both decoding methods, we derive second-order achievable rate regions and compare the finite blocklength performance between JNN and SIC de… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  2. arXiv:2410.09420  [pdf, ps, other

    math.OC cs.LG math.NA

    Anderson Acceleration in Nonsmooth Problems: Local Convergence via Active Manifold Identification

    Authors: Kexin Li, Luwei Bai, Xiao Wang, Hao Wang

    Abstract: Anderson acceleration is an effective technique for enhancing the efficiency of fixed-point iterations; however, analyzing its convergence in nonsmooth settings presents significant challenges. In this paper, we investigate a class of nonsmooth optimization algorithms characterized by the active manifold identification property. This class includes a diverse array of methods such as the proximal p… ▽ More

    Submitted 15 October, 2024; v1 submitted 12 October, 2024; originally announced October 2024.

  3. arXiv:2410.09111  [pdf, other

    physics.ao-ph cs.AI cs.LG

    IceDiff: High Resolution and High-Quality Sea Ice Forecasting with Generative Diffusion Prior

    Authors: Jingyi Xu, Siwei Tu, Weidong Yang, Shuhao Li, Keyi Liu, Yeqi Luo, Lipeng Ma, Ben Fei, Lei Bai

    Abstract: Variation of Arctic sea ice has significant impacts on polar ecosystems, transporting routes, coastal communities, and global climate. Tracing the change of sea ice at a finer scale is paramount for both operational applications and scientific studies. Recent pan-Arctic sea ice forecasting methods that leverage advances in artificial intelligence has made promising progress over numerical models.… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 9 pages, 4 figures

  4. arXiv:2410.08531  [pdf, other

    cs.CV

    Diffusion Models Need Visual Priors for Image Generation

    Authors: Xiaoyu Yue, Zidong Wang, Zeyu Lu, Shuyang Sun, Meng Wei, Wanli Ouyang, Lei Bai, Luping Zhou

    Abstract: Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and limited conditional information. To address this issue, we propose Diffusion on Diffusion (DoD), an innovative multi-stage generation framework that first extract… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: Preprint

  5. arXiv:2410.07540  [pdf, other

    cs.CV

    CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

    Authors: Guankun Wang, Han Xiao, Huxin Gao, Renrui Zhang, Long Bai, Xiaoxiao Yang, Zhen Li, Hongsheng Li, Hongliang Ren

    Abstract: submucosal dissection (ESD) enables rapid resection of large lesions, minimizing recurrence rates and improving long-term overall survival. Despite these advantages, ESD is technically challenging and carries high risks of complications, necessitating skilled surgeons and precise instruments. Recent advancements in Large Visual-Language Models (LVLMs) offer promising decision support and predictiv… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  6. arXiv:2410.05805  [pdf, other

    cs.CV cs.AI

    PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling

    Authors: Junchao Gong, Siwei Tu, Weidong Yang, Ben Fei, Kun Chen, Wenlong Zhang, Xiaokang Yang, Wanli Ouyang, Lei Bai

    Abstract: Precipitation nowcasting plays a pivotal role in socioeconomic sectors, especially in severe convective weather warnings. Although notable progress has been achieved by approaches mining the spatiotemporal correlations with deep learning, these methods still suffer severe blurriness as the lead time increases, which hampers accurate predictions for extreme precipitation. To alleviate blurriness, r… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  7. arXiv:2410.04171  [pdf, other

    cs.CV cs.AI

    IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis

    Authors: Shitong Shao, Zikai Zhou, Lichen Bai, Haoyi Xiong, Zeke Xie

    Abstract: The multi-step sampling mechanism, a key feature of visual diffusion models, has significant potential to replicate the success of OpenAI's Strawberry in enhancing performance by increasing the inference computational cost. Sufficient prior studies have demonstrated that correctly scaling up computation in the sampling process can successfully lead to improved generation quality, enhanced image ed… ▽ More

    Submitted 7 October, 2024; v1 submitted 5 October, 2024; originally announced October 2024.

  8. arXiv:2409.16321  [pdf, other

    cs.AI cs.LG physics.ao-ph

    WeatherFormer: Empowering Global Numerical Weather Forecasting with Space-Time Transformer

    Authors: Junchao Gong, Tao Han, Kang Chen, Lei Bai

    Abstract: Numerical Weather Prediction (NWP) system is an infrastructure that exerts considerable impacts on modern society.Traditional NWP system, however, resolves it by solving complex partial differential equations with a huge computing cluster, resulting in tons of carbon emission. Exploring efficient and eco-friendly solutions for NWP attracts interest from Artificial Intelligence (AI) and earth scien… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  9. arXiv:2409.12467  [pdf, other

    cs.CV cs.AI cs.LG

    SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference

    Authors: Zhen Chen, Xingjian Luo, Jinlin Wu, Long Bai, Zhen Lei, Hongliang Ren, Sebastien Ourselin, Hongbin Liu

    Abstract: Surgical phase recognition is critical for assisting surgeons in understanding surgical videos. Existing studies focused more on online surgical phase recognition, by leveraging preceding frames to predict the current frame. Despite great progress, they formulated the task as a series of frame-wise classification, which resulted in a lack of global context of the entire procedure and incoherent pr… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  10. arXiv:2409.07253  [pdf, other

    cs.LG cs.CV

    Alignment of Diffusion Models: Fundamentals, Challenges, and Future

    Authors: Buhua Liu, Shitong Shao, Bao Li, Lichen Bai, Zhiqiang Xu, Haoyi Xiong, James Kwok, Sumi Helal, Zeke Xie

    Abstract: Diffusion models have emerged as the leading paradigm in generative modeling, excelling in various applications. Despite their success, these models often misalign with human intentions, generating outputs that may not match text prompts or possess desired properties. Inspired by the success of alignment in tuning large language models, recent studies have investigated aligning diffusion models wi… ▽ More

    Submitted 12 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 35 pages, 5 figures, 3 tables

  11. GOPT: Generalizable Online 3D Bin Packing via Transformer-based Deep Reinforcement Learning

    Authors: Heng Xiong, Changrong Guo, Jian Peng, Kai Ding, Wenjie Chen, Xuchong Qiu, Long Bai, Jianfeng Xu

    Abstract: Robotic object packing has broad practical applications in the logistics and automation industry, often formulated by researchers as the online 3D Bin Packing Problem (3D-BPP). However, existing DRL-based methods primarily focus on enhancing performance in limited packing environments while neglecting the ability to generalize across multiple environments characterized by different bin dimensions.… ▽ More

    Submitted 12 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 8 pages, 6 figures. This paper has been accepted by IEEE Robotics and Automation Letters

  12. arXiv:2409.01392  [pdf, other

    cs.CL cs.AI cs.CV

    GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI

    Authors: Xiangyuan Xue, Zeyu Lu, Di Huang, Wanli Ouyang, Lei Bai

    Abstract: Much previous AI research has focused on developing monolithic models to maximize their intelligence and capability, with the primary goal of enhancing performance on specific tasks. In contrast, this paper explores an alternative approach: collaborative AI systems that use workflows to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  13. arXiv:2408.11438  [pdf, other

    cs.LG cs.CV physics.ao-ph

    DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation

    Authors: Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren

    Abstract: Recent advancements in deep learning (DL) have led to the development of several Large Weather Models (LWMs) that rival state-of-the-art (SOTA) numerical weather prediction (NWP) systems. Up to now, these models still rely on traditional NWP-generated analysis fields as input and are far from being an autonomous system. While researchers are exploring data-driven data assimilation (DA) models to g… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 37pages, 12 figures, 6 tables

  14. arXiv:2408.10854  [pdf, other

    physics.ao-ph cs.AI cs.CV

    MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CN… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.06629  [pdf, other

    cs.CV

    Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning

    Authors: Tianning Zhang, Feng Liu, Yuming Yuan, Rui Su, Wanli Ouyang, Lei Bai

    Abstract: Existing EEW approaches often treat phase picking, location estimation, and magnitude estimation as separate tasks, lacking a unified framework. Additionally, most deep learning models in seismology rely on full three-component waveforms and are not suitable for real-time streaming data. To address these limitations, we propose a novel unified seismic neural network called Fast Information Streami… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  16. arXiv:2408.04958  [pdf, other

    cs.CV cs.RO

    Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

    Authors: Long Bai, Guankun Wang, Mobarakol Islam, Lalithkumar Seenivasan, An Wang, Hongliang Ren

    Abstract: Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicat… ▽ More

    Submitted 1 September, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by Information Fusion. Code and data availability: https://github.com/longbai1006/Surgical-VQLAPlus

  17. arXiv:2408.04593  [pdf, other

    cs.CV cs.RO eess.IV

    SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

    Authors: Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren

    Abstract: The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-sh… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Empirical study. Previous work "SAM Meets Robotic Surgery" is accessible at: arXiv:2308.07156

  18. arXiv:2408.04426  [pdf, other

    cs.CV cs.RO

    A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery

    Authors: Mengya Xu, Ziqi Guo, An Wang, Long Bai, Hongliang Ren

    Abstract: As a crucial and intricate task in robotic minimally invasive surgery, reconstructing surgical scenes using stereo or monocular endoscopic video holds immense potential for clinical applications. NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly. On the other hand, Gaussian splatting-based 3D-GS represents scenes explicitly using 3D Gaussians a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in MICCAI 2024 EARTH Workshop. Code availability: https://github.com/Epsilon404/surgicalnerf

  19. arXiv:2408.03877  [pdf, other

    cs.LG cs.AI

    Knowledge Probing for Graph Representation Learning

    Authors: Mingyu Zhao, Xingyu Huang, Ziyu Lyu, Yanlin Wang, Lixin Cui, Lu Bai

    Abstract: Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  20. arXiv:2407.20213  [pdf, other

    cs.RO cs.CV

    Registering Neural 4D Gaussians for Endoscopic Surgery

    Authors: Yiming Huang, Beilei Cui, Ikemura Kei, Jiekai Zhang, Long Bai, Hongliang Ren

    Abstract: The recent advance in neural rendering has enabled the ability to reconstruct high-quality 4D scenes using neural networks. Although 4D neural reconstruction is popular, registration for such representations remains a challenging task, especially for dynamic scene registration in surgical planning and simulation. In this paper, we propose a novel strategy for dynamic surgical neural scene registra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  21. arXiv:2407.19435  [pdf, other

    cs.CV cs.AI cs.CL cs.HC cs.RO

    ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding

    Authors: Zhen Chen, Zongming Zhang, Wenwu Guo, Xingjian Luo, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu

    Abstract: Surgical instrument segmentation is crucial in surgical scene understanding, thereby facilitating surgical safety. Existing algorithms directly detected all instruments of pre-defined categories in the input image, lacking the capability to segment specific instruments according to the surgeon's intention. During different stages of surgery, surgeons exhibit varying preferences and focus toward di… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This work is accepted by IROS 2024 (Oral)

  22. arXiv:2407.14041  [pdf, other

    cs.CV

    Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization

    Authors: Zipeng Qi, Lichen Bai, Haoyi Xiong, Zeke Xie

    Abstract: Diffusion models that can generate high-quality data from randomly sampled Gaussian noises have become the mainstream generative method in both academia and industry. Are randomly sampled Gaussian noises equally good for diffusion models? While a large body of works tried to understand and improve diffusion models, previous works overlooked the possibility to select or optimize the sampled noise t… ▽ More

    Submitted 27 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  23. arXiv:2407.12592  [pdf, other

    cs.CV

    VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: In the context of global climate change and frequent extreme weather events, forecasting future geospatial vegetation states under these conditions is of significant importance. The vegetation change process is influenced by the complex interplay between dynamic meteorological variables and static environmental variables, leading to high levels of uncertainty. Existing deterministic methods are in… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  24. arXiv:2407.10047  [pdf, other

    cs.CV

    HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

    Authors: Chengjie Jiang, Xiaowen Liu, Bowen Zheng, Lu Bai, Jing Li

    Abstract: Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  25. arXiv:2407.08418  [pdf, other

    cs.LG cs.CV

    PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

    Authors: ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  26. arXiv:2407.06317  [pdf, other

    cs.AI cs.CV cs.RO

    Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

    Authors: Detian Chu, Linyuan Bai, Jianuo Huang, Zhenlong Fang, Peng Zhang, Wei Kang, Haifeng Lin

    Abstract: With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based… ▽ More

    Submitted 17 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  27. arXiv:2407.02816  [pdf, other

    cs.IT eess.SP math.ST

    Large and Small Deviations for Statistical Sequence Matching

    Authors: Lin Zhou, Qianyun Wang, Jingjing Wang, Lin Bai, Alfred O. Hero

    Abstract: We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Extended version of ISIT paper

  28. arXiv:2406.14399  [pdf, other

    cs.LG cs.CV physics.ao-ph stat.ML

    How far are today's time-series models from real-world weather forecasting applications?

    Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

    Abstract: The development of Time-Series Forecasting (TSF) techniques is often hindered by the lack of comprehensive datasets. This is particularly problematic for time-series weather forecasting, where commonly used datasets suffer from significant limitations such as small size, limited temporal coverage, and sparse spatial distribution. These constraints severely impede the optimization and evaluation of… ▽ More

    Submitted 11 October, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 29 pages, 14 figures

  29. arXiv:2406.14191  [pdf, other

    cs.CL cs.AI cs.LG

    Temporal Knowledge Graph Question Answering: A Survey

    Authors: Miao Su, Zixuan Li, Zhuo Chen, Long Bai, Xiaolong Jin, Jiafeng Guo

    Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  30. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  31. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  32. arXiv:2406.10508  [pdf, other

    cs.CV

    Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis

    Authors: Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren

    Abstract: Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model fo… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in ICBIR 2024

  33. arXiv:2406.01645  [pdf, other

    cs.LG cs.AI

    FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

    Authors: Kun Chen, Tao Chen, Peng Ye, Hao Chen, Kang Chen, Tao Han, Wanli Ouyang, Lei Bai

    Abstract: Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  34. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  35. arXiv:2405.15412  [pdf, other

    physics.ao-ph cs.AI cs.LG

    ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

    Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  36. arXiv:2405.15151  [pdf, other

    cs.CV cs.GR cs.RO

    NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes

    Authors: Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Weijian Liang

    Abstract: Neural implicit representations have recently demonstrated considerable potential in the field of visual simultaneous localization and mapping (SLAM). This is due to their inherent advantages, including low storage overhead and representation continuity. However, these methods necessitate the size of the scene as input, which is impractical for unknown scenes. Consequently, we propose NeB-SLAM, a… ▽ More

    Submitted 7 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.14742  [pdf, other

    cs.LG cs.AI

    HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning

    Authors: Zhuo Xu, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Edwin R. Hancock

    Abstract: Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compre… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.13796  [pdf, other

    cs.LG cs.AI

    Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

    Authors: Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets preven… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  39. arXiv:2405.13711  [pdf, other

    cs.LG cs.AI math.DS physics.ao-ph

    VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation

    Authors: Yi Xiao, Qilong Jia, Wei Xue, Lei Bai

    Abstract: Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data. Variational assimilation methods rely on the maximum likelihood approach to formulate a variational cost, with the optimal state estimate derived by minimizing this cost. Although traditional variational meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  40. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  41. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  42. arXiv:2405.10218  [pdf, other

    cs.LG cs.AI

    ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks

    Authors: Zhehan Zhao, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Lixiang Xu, Edwin R. Hancock

    Abstract: Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike t… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  43. arXiv:2405.08672  [pdf, other

    eess.IV cs.CV

    EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

    Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: early accepted by MICCAI 2024

  44. arXiv:2405.03376  [pdf, other

    cs.LG cs.CV

    CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

    Authors: Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai

    Abstract: The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Main text and supplementary, 22 pages, 13 figures

  45. arXiv:2405.00216  [pdf, other

    cs.CL cs.AI cs.LG

    Graphical Reasoning: LLM-based Semi-Open Relation Extraction

    Authors: Yicheng Tao, Yiqun Wang, Longju Bai

    Abstract: This paper presents a comprehensive exploration of relation extraction utilizing advanced language models, specifically Chain of Thought (CoT) and Graphical Reasoning (GRE) techniques. We demonstrate how leveraging in-context learning with GPT-3.5 can significantly enhance the extraction process, particularly through detailed example-based reasoning. Additionally, we introduce a novel graphical re… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  46. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  47. arXiv:2404.02668  [pdf, other

    cs.CV

    RS-Mamba for Large Remote Sensing Image Dense Prediction

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional… ▽ More

    Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages,8 figures

  48. arXiv:2404.01767  [pdf, other

    cs.CL

    Class-Incremental Few-Shot Event Detection

    Authors: Kailin Zhao, Xiaolong Jin, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  49. arXiv:2404.01695  [pdf, other

    cs.LG

    Selective Temporal Knowledge Graph Reasoning

    Authors: Zhongni Hou, Xiaolong Jin, Zixuan Li, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Temporal Knowledge Graph (TKG), which characterizes temporally evolving facts in the form of (subject, relation, object, timestamp), has attracted much attention recently. TKG reasoning aims to predict future facts based on given historical ones. However, existing TKG reasoning models are unable to abstain from predictions they are uncertain, which will inevitably bring risks in real-world applica… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  50. arXiv:2403.16407  [pdf, other

    cs.CV

    A Survey on Long Video Generation: Challenges, Methods, and Prospects

    Authors: Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai

    Abstract: Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications. One critical aspect of this field is the generation of long-duration videos, which presents unique challenges and opportunities. This paper presents the first survey of recent advancements in long video generation and summarises them into two key paradigms: divide and conq… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.