Skip to main content

Showing 1–50 of 2,124 results for author: Kim, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.13765  [pdf, other

    cs.CL cs.IR

    Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

    Authors: Yu Xia, Junda Wu, Sungchul Kim, Tong Yu, Ryan A. Rossi, Haoliang Wang, Julian McAuley

    Abstract: Large language models (LLMs) have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking d… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2410.13685  [pdf, other

    cs.CV

    Label-free prediction of fluorescence markers in bovine satellite cells using deep learning

    Authors: Sania Sinha, Aarham Wasit, Won Seob Kim, Jongkyoo Kim, Jiyoon Yi

    Abstract: Assessing the quality of bovine satellite cells (BSCs) is essential for the cultivated meat industry, which aims to address global food sustainability challenges. This study aims to develop a label-free method for predicting fluorescence markers in isolated BSCs using deep learning. We employed a U-Net-based CNN model to predict multiple fluorescence signals from a single bright-field microscopy i… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 11 pages, 4 figures

  3. arXiv:2410.13250  [pdf

    cs.HC cs.AI cs.CY

    Perceptions of Discriminatory Decisions of Artificial Intelligence: Unpacking the Role of Individual Characteristics

    Authors: Soojong Kim

    Abstract: This study investigates how personal differences (digital self-efficacy, technical knowledge, belief in equality, political ideology) and demographic factors (age, education, and income) are associated with perceptions of artificial intelligence (AI) outcomes exhibiting gender and racial bias and with general attitudes towards AI. Analyses of a large-scale experiment dataset (N = 1,206) indicate t… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  4. arXiv:2410.13232  [pdf, other

    cs.CL

    Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

    Authors: Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, Jinyoung Yeo

    Abstract: Large language models (LLMs) have recently gained much attention in building autonomous agents. However, the performance of current LLM-based web agents in long-horizon tasks is far from optimal, often yielding errors such as repeatedly buying a non-refundable flight ticket. By contrast, humans can avoid such an irreversible mistake, as we have an awareness of the potential outcomes (e.g., losing… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Work in progress

  5. arXiv:2410.12692  [pdf, other

    cs.CV cs.LG

    Machine Learning Approach to Brain Tumor Detection and Classification

    Authors: Alice Oh, Inyoung Noh, Jian Choo, Jihoo Lee, Justin Park, Kate Hwang, Sanghyeon Kim, Soo Min Oh

    Abstract: Brain tumor detection and classification are critical tasks in medical image analysis, particularly in early-stage diagnosis, where accurate and timely detection can significantly improve treatment outcomes. In this study, we apply various statistical and machine learning models to detect and classify brain tumors using brain MRI images. We explore a variety of statistical models including linear,… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 7 pages, 2 figures, 2 tables

  6. arXiv:2410.12268  [pdf, other

    cs.HC

    VisAnatomy: An SVG Chart Corpus with Fine-Grained Semantic Labels

    Authors: Chen Chen, Hannah K. Bako, Peihong Yu, John Hooker, Jeffrey Joyal, Simon C. Wang, Samuel Kim, Jessica Wu, Aoxue Ding, Lara Sandeep, Alex Chen, Chayanika Sinha, Zhicheng Liu

    Abstract: Chart corpora, which comprise data visualizations and their semantic labels, are crucial for advancing visualization research. However, the labels in most existing chart corpora are high-level (e.g., chart types), hindering their utility for broader interactive applications like chart reuse, animation, and accessibility. In this paper, we contribute VisAnatomy, a chart corpus containing 942 real-w… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  7. arXiv:2410.11381  [pdf, other

    cs.LG cs.AI cs.CL

    Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations

    Authors: Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, Joon-Sung Yang

    Abstract: The advent of the Attention mechanism and Transformer architecture enables contextually natural text generation and compresses the burden of processing entire source information into singular vectors. Based on these two main ideas, model sizes gradually increases to accommodate more precise and comprehensive information, leading to the current state-of-the-art LLMs being very large, with parameter… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 13 pages and 16 figures

    MSC Class: 68T50 ACM Class: I.2.7

  8. arXiv:2410.11338  [pdf, other

    cs.LG cs.AI cs.RO

    DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

    Authors: Jaehyun Park, Yunho Kim, Sejin Kim, Byung-Jun Lee, Sundong Kim

    Abstract: We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Preprint, under review. Comments welcome

  9. arXiv:2410.11324  [pdf, other

    cs.AI cs.CV cs.LG

    Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task

    Authors: Yunho Kim, Jaehyun Park, Heejun Kim, Sejin Kim, Byung-Jun Lee, Sundong Kim

    Abstract: Effective long-term strategies enable AI systems to navigate complex environments by making sequential decisions over extended horizons. Similarly, reinforcement learning (RL) agents optimize decisions across sequences to maximize rewards, even without immediate feedback. To verify that Latent Diffusion-Constrained Q-learning (LDCQ), a prominent diffusion-based offline RL method, demonstrates stro… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Preprint, Under review. Comments welcome

  10. arXiv:2410.10166  [pdf, other

    cs.LG cs.AI

    Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

    Authors: Yongjin Yang, Sihyeon Kim, Hojung Jung, Sangmin Bae, SangMook Kim, Se-Young Yun, Kimin Lee

    Abstract: Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion mode… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  11. arXiv:2410.09908  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Retrieval Instead of Fine-tuning: A Retrieval-based Parameter Ensemble for Zero-shot Learning

    Authors: Pengfei Jin, Peng Shu, Sekeun Kim, Qing Xiao, Sifan Song, Cheng Chen, Tianming Liu, Xiang Li, Quanzheng Li

    Abstract: Foundation models have become a cornerstone in deep learning, with techniques like Low-Rank Adaptation (LoRA) offering efficient fine-tuning of large models. Similarly, methods such as Retrieval-Augmented Generation (RAG), which leverage vectorized databases, have further improved model performance by grounding outputs in external information. While these approaches have demonstrated notable succe… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  12. arXiv:2410.09771  [pdf, other

    cs.CV

    Magnituder Layers for Implicit Neural Representations in 3D

    Authors: Sang Min Kim, Byeongchan Kim, Arijit Sehanobish, Krzysztof Choromanski, Dongseok Shim, Avinava Dubey, Min-hwan Oh

    Abstract: Improving the efficiency and performance of implicit neural representations in 3D, particularly Neural Radiance Fields (NeRF) and Signed Distance Fields (SDF) is crucial for enabling their use in real-time applications. These models, while capable of generating photo-realistic novel views and detailed 3D reconstructions, often suffer from high computational costs and slow inference times. To addre… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  13. arXiv:2410.09489  [pdf, other

    cs.CL

    Towards Efficient Visual-Language Alignment of the Q-Former for Visual Reasoning Tasks

    Authors: Sungkyung Kim, Adam Lee, Junyoung Park, Andrew Chung, Jusang Oh, Jay-Yoon Lee

    Abstract: Recent advancements in large language models have demonstrated enhanced capabilities in visual reasoning tasks by employing additional encoders for aligning different modalities. While the Q-Former has been widely used as a general encoder for aligning several modalities including image, video, audio, and 3D with large language models, previous works on its efficient training and the analysis of i… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024 Findings

  14. arXiv:2410.08559  [pdf, other

    cs.LG cs.AI

    Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive architecture

    Authors: Sehun Kim

    Abstract: We propose a self-supervised learning method for 12-lead Electrocardiogram (ECG) analysis, named ECG Joint Embedding Predictive Architecture (ECG-JEPA). ECG-JEPA employs a masking strategy to learn semantic representations of ECG data. Unlike existing methods, ECG-JEPA predicts at the hidden representation level rather than reconstructing raw data. This approach offers several advantages in the EC… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  15. arXiv:2410.07866  [pdf, ps, other

    cs.AI

    System-2 Reasoning via Generality and Adaptation

    Authors: Sejin Kim, Sundong Kim

    Abstract: While significant progress has been made in task-specific applications, current models struggle with deep reasoning, generality, and adaptation -- key components of System-2 reasoning that are crucial for achieving Artificial General Intelligence (AGI). Despite the promise of approaches such as program synthesis, language models, and transformers, these methods often fail to generalize beyond thei… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024 Workshop on System 2 Reasoning At Scale

  16. arXiv:2410.07663  [pdf, other

    eess.IV cs.CV

    TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution

    Authors: Sohwi Kim, Tae-Kyun Kim

    Abstract: Super-resolution methods are increasingly being specialized for both real-world and face-specific tasks. However, many existing approaches rely on simplistic degradation models, which limits their ability to handle complex and unknown degradation patterns effectively. While diffusion-based super-resolution techniques have recently shown impressive results, they are still constrained by the need fo… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  17. arXiv:2410.06523  [pdf, other

    hep-th cond-mat.dis-nn cond-mat.supr-con cs.AI

    Phase Diagram from Nonlinear Interaction between Superconducting Order and Density: Toward Data-Based Holographic Superconductor

    Authors: Sejin Kim, Kyung Kiu Kim, Yunseok Seo

    Abstract: We address an inverse problem in modeling holographic superconductors. We focus our research on the critical temperature behavior depicted by experiments. We use a physics-informed neural network method to find a mass function $M(F^2)$, which is necessary to understand phase transition behavior. This mass function describes a nonlinear interaction between superconducting order and charge carrier d… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 22 pages, 20 figures

  18. arXiv:2410.06504  [pdf, other

    cs.IT eess.SP

    Transformer-assisted Parametric CSI Feedback for mmWave Massive MIMO Systems

    Authors: Hyungyu Ju, Seokhyun Jeong, Seungnyun Kim, Byungju Lee, Byonghyo Shim

    Abstract: As a key technology to meet the ever-increasing data rate demand in beyond 5G and 6G communications, millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems have gained much attention recently.To make the most of mmWave massive MIMO systems, acquisition of accurate channel state information (CSI) at the base station (BS) is crucial. However, this task is by no means easy due… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 14 pages, 13 figures, accepted to IEEE Transactions on Wireless Communications

  19. Viscoelasticity Estimation of Sports Prosthesis by Energy-minimizing Inverse Kinematics and Its Validation by Forward Dynamics

    Authors: Yuta Shimane, Taiki Ishigaki, Sunghee Kim, Ko Yamamoto

    Abstract: In this study, we present a method for estimating the viscoelasticity of a leaf-spring sports prosthesis using advanced energy minimizing inverse kinematics based on the Piece-wise Constant Strain (PCS) model to reconstruct the three-dimensional dynamic behavior. Dynamic motion analysis of the athlete and prosthesis is important to clarify the effect of prosthesis characteristics on foot function.… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Journal ref: Advanced Robotics, 2024, 1-13

  20. arXiv:2410.05560  [pdf

    cs.CR

    Cyber Threats to Canadian Federal Election: Emerging Threats, Assessment, and Mitigation Strategies

    Authors: Nazmul Islam, Soomin Kim, Mohammad Pirooz, Sasha Shvetsov

    Abstract: As Canada prepares for the 2025 federal election, ensuring the integrity and security of the electoral process against cyber threats is crucial. Recent foreign interference in elections globally highlight the increasing sophistication of adversaries in exploiting technical and human vulnerabilities. Such vulnerabilities also exist in Canada's electoral system that relies on a complex network of IT… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  21. arXiv:2410.04771  [pdf, ps, other

    cs.FL

    On the Complexity of Computing the Co-lexicographic Width of a Regular Language

    Authors: Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Tomasz Kociumaka, Bojana Kodric, Alberto Policriti, Nicola Prezza

    Abstract: Co-lex partial orders were recently introduced in (Cotumaccio et al., SODA 2021 and JACM 2023) as a powerful tool to index finite state automata, with applications to regular expression matching. They generalize Wheeler orders (Gagie et al., Theoretical Computer Science 2017) and naturally reflect the co-lexicographic order of the strings labeling source-to-node paths in the automaton. Briefly, th… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  22. arXiv:2410.04749  [pdf, other

    cs.CV

    LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies

    Authors: Ameer Hamza, Abdullah, Yong Hyun Ahn, Sungyoung Lee, Seong Tae Kim

    Abstract: Generating Natural Language Explanations (NLEs) for model predictions on medical images, particularly those depicting thoracic pathologies, remains a critical and challenging task. Existing methodologies often struggle due to general models' insufficient domain-specific medical knowledge and privacy concerns associated with retrieval-based augmentation techniques. To address these issues, we propo… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  23. arXiv:2410.04690  [pdf, other

    eess.AS cs.LG

    SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

    Authors: Minchan Kim, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

    Abstract: We present SegINR, a novel approach to neural Text-to-Speech (TTS) that addresses sequence alignment without relying on an auxiliary duration predictor and complex autoregressive (AR) or non-autoregressive (NAR) frame-level sequence modeling. SegINR simplifies the process by converting text sequences directly into frame-level features. It leverages an optimal text encoder to extract embeddings, tr… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  24. arXiv:2410.04364  [pdf, other

    cs.CV cs.AI cs.LG

    VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

    Authors: Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-to-image (T2I) diffusion models have revolutionized visual content creation, but extending these capabilities to text-to-video (T2V) generation remains a challenge, particularly in preserving temporal consistency. Existing methods that aim to improve consistency often cause trade-offs such as reduced imaging quality and impractical computational time. To address these issues we introduce Vide… ▽ More

    Submitted 8 October, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: 24 pages, 14 figures, Project Page: https://dohunlee1.github.io/videoguide.github.io/

  25. arXiv:2410.03741  [pdf, other

    cs.HC cs.AI

    Towards Democratization of Subspeciality Medical Expertise

    Authors: Jack W. O'Sullivan, Anil Palepu, Khaled Saab, Wei-Hung Weng, Yong Cheng, Emily Chu, Yaanik Desai, Aly Elezaby, Daniel Seung Kim, Roy Lan, Wilson Tang, Natalie Tapaskar, Victoria Parikh, Sneha S. Jain, Kavita Kulkarni, Philip Mansfield, Dale Webster, Juraj Gottweis, Joelle Barral, Mike Schaekermann, Ryutaro Tanno, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Euan Ashley , et al. (1 additional authors not shown)

    Abstract: The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI syst… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  26. arXiv:2410.03376  [pdf, other

    cs.LG cs.AI

    Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization

    Authors: Tung M. Luu, Thanh Nguyen, Tee Joshua Tian Jin, Sungwoon Kim, Chang D. Yoo

    Abstract: Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the de… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 8 pages, IROS 2024 (Code: https://github.com/tunglm2203/vq_robust_rl)

  27. arXiv:2410.03355  [pdf, other

    cs.CV cs.AI

    LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

    Authors: Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun, Souvik Kundu, Sung-Yub Kim, Eunho Yang

    Abstract: Auto-Regressive (AR) models have recently gained prominence in image generation, often matching or even surpassing the performance of diffusion models. However, one major limitation of AR models is their sequential nature, which processes tokens one at a time, slowing down generation compared to models like GANs or diffusion-based methods that operate more efficiently. While speculative decoding h… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  28. arXiv:2410.03143  [pdf, other

    eess.IV cs.CV cs.LG

    ECHOPulse: ECG controlled echocardio-grams video generation

    Authors: Yiwei Li, Sekeun Kim, Zihao Wu, Hanqi Jiang, Yi Pan, Pengfei Jin, Sifan Song, Yucheng Shi, Tianming Liu, Quanzheng Li, Xiang Li

    Abstract: Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices. ECHO video generation offers a solution by improving automated monitoring through synthetic data and generating high-quality videos from routine health data. However, existing models often face… ▽ More

    Submitted 11 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

  29. arXiv:2410.03061  [pdf, other

    cs.CV cs.CL

    DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

    Authors: Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto

    Abstract: Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new fra… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024

  30. arXiv:2410.02902  [pdf, other

    cs.CL cs.AI

    Better Instruction-Following Through Minimum Bayes Risk

    Authors: Ian Wu, Patrick Fernandes, Amanda Bertsch, Seungone Kim, Sina Pakazad, Graham Neubig

    Abstract: General-purpose LLM judges capable of human-level evaluation provide not only a scalable and accurate way of evaluating instruction-following LLMs but also new avenues for supervising and improving their performance. One promising way of leveraging LLM judges for supervision is through Minimum Bayes Risk (MBR) decoding, which uses a reference-based evaluator to select a high-quality output from am… ▽ More

    Submitted 7 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  31. arXiv:2410.02246  [pdf, other

    cs.LG cs.AI

    PFGuard: A Generative Framework with Privacy and Fairness Safeguards

    Authors: Soyeon Kim, Yuji Roh, Geon Heo, Steven Euijong Whang

    Abstract: Generative models must ensure both privacy and fairness for Trustworthy AI. While these goals have been pursued separately, recent studies propose to combine existing privacy and fairness techniques to achieve both goals. However, naively combining these techniques can be insufficient due to privacy-fairness conflicts, where a sample in a minority group may be amplified for fairness, only to be su… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  32. arXiv:2410.01729  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating Robustness of Reward Models for Mathematical Reasoning

    Authors: Sunghwan Kim, Dongjin Kang, Taeyoon Kwon, Hyungjoo Chae, Jungsoo Won, Dongha Lee, Jinyoung Yeo

    Abstract: Reward models are key in reinforcement learning from human feedback (RLHF) systems, aligning the model behavior with human preferences. Particularly in the math domain, there have been plenty of studies using reward models to align policies for improving reasoning capabilities. Recently, as the importance of reward models has been emphasized, RewardBench is proposed to understand their behavior. H… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Work in progress

  33. arXiv:2410.01531  [pdf, other

    cs.LG cs.AI

    TiVaT: Joint-Axis Attention for Time Series Forecasting with Lead-Lag Dynamics

    Authors: Junwoo Ha, Hyukjae Kwon, Sungsoo Kim, Kisu Lee, Ha Young Kim

    Abstract: Multivariate time series (MTS) forecasting plays a crucial role in various real-world applications, yet simultaneously capturing both temporal and inter-variable dependencies remains a challenge. Conventional Channel-Dependent (CD) models handle these dependencies separately, limiting their ability to model complex interactions such as lead-lag dynamics. To address these limitations, we propose Ti… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 15pages, 5 figures

    MSC Class: I.2.0

  34. arXiv:2410.01500  [pdf, other

    cs.LG cs.AI

    Discrete Diffusion Schr�dinger Bridge Matching for Graph Transformation

    Authors: Jun Hyeong Kim, Seonghwan Kim, Seokhyun Moon, Hyeongwoo Kim, Jeheon Woo, Woo Youn Kim

    Abstract: Transporting between arbitrary distributions is a fundamental goal in generative modeling. Recently proposed diffusion bridge models provide a potential solution, but they rely on a joint distribution that is difficult to obtain in practice. Furthermore, formulations based on continuous domains limit their applicability to discrete domains such as graphs. To overcome these limitations, we propose… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  35. arXiv:2410.01273  [pdf, other

    cs.RO cs.CV cs.LG

    CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

    Authors: Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu

    Abstract: Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and e… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: project page https://worv-ai.github.io/canvas

  36. arXiv:2410.01210  [pdf, other

    cs.CV cs.AI

    Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic Model

    Authors: Quang Vinh Nguyen, Thanh Hoang Son Vo, Sae-Ryung Kang, Soo-Hyung Kim

    Abstract: Automatic polyp segmentation is crucial for effective diagnosis and treatment in colonoscopy images. Traditional methods encounter significant challenges in accurately delineating polyps due to limitations in feature representation and the handling of variability in polyp appearance. Deep learning techniques, including CNN and Transformer-based methods, have been explored to improve polyp segmenta… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Asian Conference on Computer Vision 2024

    ACM Class: I.5.4; I.2.1; I.4.6; J.3

  37. arXiv:2410.01196  [pdf, other

    stat.AP cs.LG stat.ML

    Diverse Expected Improvement (DEI): Diverse Bayesian Optimization of Expensive Computer Simulators

    Authors: John Joshua Miller, Simon Mak, Benny Sun, Sai Ranjeet Narayanan, Suo Yang, Zongxuan Sun, Kenneth S. Kim, Chol-Bum Mike Kweon

    Abstract: The optimization of expensive black-box simulators arises in a myriad of modern scientific and engineering applications. Bayesian optimization provides an appealing solution, by leveraging a fitted surrogate model to guide the selection of subsequent simulator evaluations. In practice, however, the objective is often not to obtain a single good solution, but rather a ''basket'' of good solutions f… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  38. arXiv:2410.01162  [pdf, other

    eess.AS cs.CL cs.SD

    Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

    Authors: Wonjune Kang, Junteng Jia, Chunyang Wu, Wei Zhou, Egor Lakomkin, Yashesh Gaur, Leda Sari, Suyoun Kim, Ke Li, Jay Mahadeokar, Ozlem Kalinli

    Abstract: As speech becomes an increasingly common modality for interacting with large language models (LLMs), it is becoming desirable to develop systems where LLMs can take into account users' emotions or speaking styles when providing their responses. In this work, we study the potential of an LLM to understand these aspects of speech without fine-tuning its weights. To do this, we utilize an end-to-end… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  39. arXiv:2410.00521  [pdf, other

    cs.RO cs.CV

    Design and Identification of Keypoint Patches in Unstructured Environments

    Authors: Taewook Park, Seunghwan Kim, Hyondong Oh

    Abstract: Reliable perception of targets is crucial for the stable operation of autonomous robots. A widely preferred method is keypoint identification in an image, as it allows direct mapping from raw images to 2D coordinates, facilitating integration with other algorithms like localization and path planning. In this study, we closely examine the design and identification of keypoint patches in cluttered e… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 12 pages, 8 figures, 7 tables

  40. arXiv:2410.00432  [pdf, other

    cs.LG cs.AI

    Scalable Multi-Task Transfer Learning for Molecular Property Prediction

    Authors: Chanhui Lee, Dae-Woong Jeong, Sung Moon Ko, Sumin Lee, Hyunseung Kim, Soorin Yim, Sehui Han, Sungwoong Kim, Sungbin Lim

    Abstract: Molecules have a number of distinct properties whose importance and application vary. Often, in reality, labels for some properties are hard to achieve despite their practical importance. A common solution to such data scarcity is to use models of good generalization with transfer learning. This involves domain experts for designing source and target tasks whose features are shared. However, this… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Journal ref: ICML2024-AI4Science Poster

  41. arXiv:2410.00046  [pdf, other

    eess.IV cs.CV cs.LG

    Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation

    Authors: Yujin Oh, Sangjoon Park, Xiang Li, Wang Yi, Jonathan Paly, Jason Efstathiou, Annie Chan, Jun Won Kim, Hwa Kyung Byun, Ik Jae Lee, Jaeho Cho, Chan Woo Wee, Peng Shu, Peilong Wang, Nathan Yu, Jason Holmes, Jong Chul Ye, Quanzheng Li, Wei Liu, Woong Sub Koom, Jin Sung Kim, Kyungsang Kim

    Abstract: Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the… ▽ More

    Submitted 27 September, 2024; originally announced October 2024.

    Comments: 39 pages

  42. arXiv:2409.19846  [pdf, other

    cs.CV

    Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

    Authors: Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

    Abstract: Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what objects are present. However, they struggle with pixel-level recognition tasks like semantic segmentation, which additionally require understanding where the objects are located. In this work, we propose a novel method, PixelCLIP, to adapt the… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: To appear at NeurIPS 2024. Project page is available at https://cvlab-kaist.github.io/PixelCLIP

  43. arXiv:2409.19614  [pdf, other

    cs.SD eess.AS

    Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals

    Authors: Jinyi Mi, Sehun Kim, Tomoki Toda

    Abstract: Automatic music transcription (AMT), aiming to convert musical signals into musical notation, is one of the important tasks in music information retrieval. Recently, previous works have applied high-resolution labels, i.e., the continuous onset and offset times of piano notes, as training targets, achieving substantial improvements in transcription performance. However, there still remain some iss… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted to APSIPA ASC 2024

  44. arXiv:2409.18260  [pdf, other

    cs.CV cs.AI

    PCEvE: Part Contribution Evaluation Based Model Explanation for Human Figure Drawing Assessment and Beyond

    Authors: Jongseo Lee, Geo Ahn, Seong Tae Kim, Jinwoo Choi

    Abstract: For automatic human figure drawing (HFD) assessment tasks, such as diagnosing autism spectrum disorder (ASD) using HFD images, the clarity and explainability of a model decision are crucial. Existing pixel-level attribution-based explainable AI (XAI) approaches demand considerable effort from users to interpret the semantic information of a region in an image, which can be often time-consuming and… ▽ More

    Submitted 3 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: This papaer is under review

  45. arXiv:2409.18046  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

    Authors: Soeun Lee, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim

    Abstract: Recent advancements in image captioning have explored text-only training methods to overcome the limitations of paired image-text data. However, existing text-only training methods often overlook the modality gap between using text data during training and employing images during inference. To address this issue, we propose a novel approach called Image-like Retrieval, which aligns text features w… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP 2024

  46. arXiv:2409.16845  [pdf, other

    cs.CV

    IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

    Authors: Oh-Tae Jang, Hae-Kang Song, Min-Jun Kim, Kyung-Hwan Lee, Geon Lee, Sung-Ho Kim, Hee-Sub Shin, Jae-Woo Ok, Min-Young Back, Jae-Hyuk Yoon, Kyung-Tae Kim

    Abstract: Recently, computer-aided design models and electromagnetic simulations have been used to augment synthetic aperture radar (SAR) data for deep learning. However, an automatic target recognition (ATR) model struggles with domain shift when using synthetic data because the model learns specific clutter patterns present in such data, which disturbs performance when applied to measured data with differ… ▽ More

    Submitted 7 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: 16 pages, 11 figures

  47. arXiv:2409.16706  [pdf, other

    cs.CV cs.AI

    Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

    Authors: Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan, Shiho Kim

    Abstract: This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our approach leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder-decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed globa… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 19 pages,12 figures

  48. arXiv:2409.16630  [pdf, other

    cs.LG cs.AI cs.CV

    Stochastic Subsampling With Average Pooling

    Authors: Bum Jun Kim, Sang Woo Kim

    Abstract: Regularization of deep neural networks has been an important issue to achieve higher generalization performance without overfitting problems. Although the popular method of Dropout provides a regularization effect, it causes inconsistent properties in the output, which may degrade the performance of deep neural networks. In this study, we propose a new module called stochastic average pooling, whi… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 17 pages, 8 figures

  49. arXiv:2409.16581  [pdf, other

    cs.CV

    SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling

    Authors: Laurent Dillard, Hyeonsoo Lee, Weonsuk Lee, Tae Soo Kim, Ali Diba, Thijs Kooi

    Abstract: When developing Computer Aided Detection (CAD) systems for Digital Breast Tomosynthesis (DBT), the complexity arising from the volumetric nature of the modality poses significant technical challenges for obtaining large-scale accurate annotations. Without access to large-scale annotations, the resulting model may not generalize to different domains. Given the costly nature of obtaining DBT annotat… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures, 1 table

    MSC Class: 68T45; 92C55 68T45; 92C55 ACM Class: I.4.9; I.5.4

  50. arXiv:2409.15784  [pdf

    physics.app-ph cond-mat.mtrl-sci cs.LG physics.optics

    Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers

    Authors: Sung Yun Lee, Do Hyung Cho, Chulho Jung, Daeho Sung, Daewoong Nam, Sangsoo Kim, Changyong Song

    Abstract: Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capa… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    MSC Class: 68T07 ACM Class: J.2