-
Tuning Language Models by Mixture-of-Depths Ensemble
Authors:
Haoyan Luo,
Lucia Specia
Abstract:
Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for training and final-layer representations for predictions, potentially overlooking the predictive power embedded in intermediate layers. Surprisingly, we find that focusing training efforts on these intermediate layers can yield training losses comparable to those of final layers, with complementary test-time…
▽ More
Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for training and final-layer representations for predictions, potentially overlooking the predictive power embedded in intermediate layers. Surprisingly, we find that focusing training efforts on these intermediate layers can yield training losses comparable to those of final layers, with complementary test-time performance. We introduce a novel tuning framework, Mixture-of-Depths (MoD), which trains late layers as ensembles contributing to the final logits through learned routing weights. With the auxiliary distillation loss and additional normalization modules, we ensure that the outputs of the late layers adapt to language modeling. Our MoD framework, which can be integrated with any existing tuning method, shows consistent improvement on various language modelling tasks. Furthermore, by replacing traditional trainable modules with MoD, our approach achieves similar performance with significantly fewer trainable parameters, demonstrating the potential of leveraging predictive power from intermediate representations during training.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Enhance Graph Alignment for Large Language Models
Authors:
Haitong Luo,
Xuying Meng,
Suhang Wang,
Tianxiang Zhao,
Fali Wang,
Hanyun Cao,
Yujun Zhang
Abstract:
Graph-structured data is prevalent in the real world. Recently, due to the powerful emergent capabilities, Large Language Models (LLMs) have shown promising performance in modeling graphs. The key to effectively applying LLMs on graphs is converting graph data into a format LLMs can comprehend. Graph-to-token approaches are popular in enabling LLMs to process graph information. They transform grap…
▽ More
Graph-structured data is prevalent in the real world. Recently, due to the powerful emergent capabilities, Large Language Models (LLMs) have shown promising performance in modeling graphs. The key to effectively applying LLMs on graphs is converting graph data into a format LLMs can comprehend. Graph-to-token approaches are popular in enabling LLMs to process graph information. They transform graphs into sequences of tokens and align them with text tokens through instruction tuning, where self-supervised instruction tuning helps LLMs acquire general knowledge about graphs, and supervised fine-tuning specializes LLMs for the downstream tasks on graphs. Despite their initial success, we find that existing methods have a misalignment between self-supervised tasks and supervised downstream tasks, resulting in negative transfer from self-supervised fine-tuning to downstream tasks. To address these issues, we propose Graph Alignment Large Language Models (GALLM) to benefit from aligned task templates. In the self-supervised tuning stage, we introduce a novel text matching task using templates aligned with downstream tasks. In the task-specific tuning stage, we propose two category prompt methods that learn supervision information from additional explanation with further aligned templates. Experimental evaluations on four datasets demonstrate substantial improvements in supervised learning, multi-dataset generalizability, and particularly in zero-shot capability, highlighting the model's potential as a graph foundation model.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Visual-Geometric Collaborative Guidance for Affordance Learning
Authors:
Hongchen Luo,
Wei Zhai,
Jiao Wang,
Yang Cao,
Zheng-Jun Zha
Abstract:
Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing affordance learning algorithms often adopt the label assignment paradigm and presume that there is a unique relationship between functional region and afford…
▽ More
Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing affordance learning algorithms often adopt the label assignment paradigm and presume that there is a unique relationship between functional region and affordance label, yielding poor performance when adapting to unseen environments with large appearance variations. In this paper, we propose to leverage interactive affinity for affordance learning, \ie extracting interactive affinity from human-object interaction and transferring it to non-interactive objects. Interactive affinity, which represents the contacts between different parts of the human body and local regions of the target object, can provide inherent cues of interconnectivity between humans and objects, thereby reducing the ambiguity of the perceived action possibilities. To this end, we propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues to excavate interactive affinity from human-object interactions jointly. Besides, a contact-driven affordance learning (CAL) dataset is constructed by collecting and labeling over 55,047 images. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Project: \href{https://github.com/lhc1224/VCR-Net}{github.com/lhc1224/VCR-Net}.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Transferable Belief Model on Quantum Circuits
Authors:
Qianli Zhou,
Hao Luo,
Lipeng Pan,
Yong Deng,
Eloi Bosse
Abstract:
The transferable belief model, as a semantic interpretation of Dempster-Shafer theory, enables agents to perform reasoning and decision making in imprecise and incomplete environments. The model offers distinct semantics for handling unreliable testimonies, allowing for a more reasonable and general process of belief transfer compared to the Bayesian approach. However, because both the belief mass…
▽ More
The transferable belief model, as a semantic interpretation of Dempster-Shafer theory, enables agents to perform reasoning and decision making in imprecise and incomplete environments. The model offers distinct semantics for handling unreliable testimonies, allowing for a more reasonable and general process of belief transfer compared to the Bayesian approach. However, because both the belief masses and the structure of focal sets must be considered when updating belief functions-leading to extra computational complexity during reasoning-the transferable belief model has gradually lost favor among researchers in recent developments. In this paper, we implement the transferable belief model on quantum circuits and demonstrate that belief functions offer a more concise and effective alternative to Bayesian approaches within the quantum computing framework. Furthermore, leveraging the unique characteristics of quantum computing, we propose several novel belief transfer approaches. More broadly, this paper introduces a new perspective on basic information representation for quantum AI models, suggesting that belief functions are more suitable than Bayesian approach for handling uncertainty on quantum circuits.
△ Less
Submitted 17 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation
Authors:
Runze Chen,
Haiyong Luo,
Fang Zhao,
Jingze Yu,
Yupeng Jia,
Juan Wang,
Xuepeng Ma
Abstract:
Monocular depth estimation, enabled by self-supervised learning, is a key technique for 3D perception in computer vision. However, it faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night. Our research reveals that we can divide monocular depth estimation into three sub-problems: depth…
▽ More
Monocular depth estimation, enabled by self-supervised learning, is a key technique for 3D perception in computer vision. However, it faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night. Our research reveals that we can divide monocular depth estimation into three sub-problems: depth structure consistency, local texture disambiguation, and semantic-structural correlation. Our approach tackles the non-robustness of existing self-supervised monocular depth estimation models to interference textures by adopting a structure-centered perspective and utilizing the scene structure characteristics demonstrated by semantics and illumination. We devise a novel approach to reduce over-reliance on local textures, enhancing robustness against missing or interfering patterns. Additionally, we incorporate a semantic expert model as the teacher and construct inter-model feature dependencies via learnable isomorphic graphs to enable aggregation of semantic structural knowledge. Our approach achieves state-of-the-art out-of-distribution monocular depth estimation performance across a range of public adverse scenario datasets. It demonstrates notable scalability and compatibility, without necessitating extensive model engineering. This showcases the potential for customizing models for diverse industrial applications.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Quantum dynamics in a spin-1/2 square lattice $J_{1}$-$J_{2}$-$δ$ altermagnet
Authors:
Yang Liu,
Shiqi Shao,
Saisai He,
Z. Y. Xie,
Jia-Wei Mei,
Hong-Gang Luo,
Jize Zhao
Abstract:
A key feature of the newly discovered altermagnet is that its spin degeneracy is lifted, although it has a antiferromagnetic order and zero net magnetization. In this work, we investigate a frustrated spin-1/2 $J_1$-$J_2$-$δ$ Heisenberg model on the square lattice by the tensor network method in combination with the linear spin-wave theory, with our focus on both the magnon excitations and longitu…
▽ More
A key feature of the newly discovered altermagnet is that its spin degeneracy is lifted, although it has a antiferromagnetic order and zero net magnetization. In this work, we investigate a frustrated spin-1/2 $J_1$-$J_2$-$δ$ Heisenberg model on the square lattice by the tensor network method in combination with the linear spin-wave theory, with our focus on both the magnon excitations and longitudinal Higgs excitations. For a small $J_2$ and a finite range of $δ$ we demonstrate that such a model hosts an altermagnetic ground state. Its magnon spectrum is split into two branches and the largest splitting occurs at $\left(\pmπ/2, \pmπ/2\right)$ in the Brillouin zone. The magnitudes of splitting in the two magnon modes are equal with respect to the case of $δ=0$. Dynamical spin structure factors show that the splitting also occurs in the longitudinal Higgs modes, and the relative positions of the magnon modes and longitudinal Higgs modes in energy may change in the presence of a finite $δ$.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Authors:
Ge Ya Luo,
Gian Mario Favero,
Zhi Hao Luo,
Alexia Jolicoeur-Martineau,
Christopher Pal
Abstract:
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectiveness relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes require…
▽ More
The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectiveness relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.
△ Less
Submitted 8 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection
Authors:
Hengrui Luo,
Meng Li
Abstract:
Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising ability to distinguish transformations (which we call symbolic feature selection) that remain obscure under current theoretical understanding. This work provides…
▽ More
Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising ability to distinguish transformations (which we call symbolic feature selection) that remain obscure under current theoretical understanding. This work provides a finite-sample analysis of tree-based methods from a ranking perspective. We link oracle partitions in tree methods to response rankings at local splits, offering new insights into their finite-sample behavior in regression and feature selection tasks. Building on this local ranking perspective, we extend our analysis in two ways: (i) We examine the global ranking performance of individual trees and ensembles, including Classification and Regression Trees (CART) and Bayesian Additive Regression Trees (BART), providing finite-sample oracle bounds, ranking consistency, and posterior contraction results. (ii) Inspired by the ranking perspective, we propose concordant divergence statistics $\mathcal{T}_0$ to evaluate symbolic feature mappings and establish their properties. Numerical experiments demonstrate the competitive performance of these statistics in symbolic feature selection tasks compared to existing methods.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Quantifying Generalization Complexity for Large Language Models
Authors:
Zhenting Qi,
Hongyin Luo,
Xuliang Huang,
Zhuokai Zhao,
Yibo Jiang,
Xiangjun Fan,
Himabindu Lakkaraju,
James Glass
Abstract:
While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evaluation. To address this challenge, we introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of LLMs…
▽ More
While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evaluation. To address this challenge, we introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of LLMs. Scylla disentangles generalization from memorization via assessing model performance on both in-distribution (ID) and out-of-distribution (OOD) data through 20 tasks across 5 levels of complexity. Through extensive experiments, we uncover a non-monotonic relationship between task complexity and the performance gap between ID and OOD data, which we term the generalization valley. Specifically, this phenomenon reveals a critical threshold - referred to as critical complexity - where reliance on non-generalizable behavior peaks, indicating the upper bound of LLMs' generalization capabilities. As model size increases, the critical complexity shifts toward higher levels of task complexity, suggesting that larger models can handle more complex reasoning tasks before over-relying on memorization. Leveraging Scylla and the concept of critical complexity, we benchmark 28LLMs including both open-sourced models such as LLaMA and Qwen families, and close-sourced models like Claude and GPT, providing a more robust evaluation and establishing a clearer understanding of LLMs' generalization capabilities.
△ Less
Submitted 3 October, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Addition is All You Need for Energy-efficient Language Models
Authors:
Hongyin Luo,
Wei Sun
Abstract:
Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precision. We propose the linear-complexity multiplication L-Mul algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less…
▽ More
Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precision. We propose the linear-complexity multiplication L-Mul algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less computation resource than 8-bit floating point multiplication but achieves higher precision. Compared to 8-bit floating point multiplications, the proposed method achieves higher precision but consumes significantly less bit-level computation. Since multiplying floating point numbers requires substantially higher energy compared to integer addition operations, applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element-wise floating point tensor multiplications and 80% energy cost of dot products. We calculated the theoretical error expectation of L-Mul, and evaluated the algorithm on a wide range of textual, visual, and symbolic tasks, including natural language understanding, structural reasoning, mathematics, and commonsense question answering. Our numerical analysis experiments agree with the theoretical error estimation, which indicates that L-Mul with 4-bit mantissa achieves comparable precision as float8_e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8_e5m2. Evaluation results on popular benchmarks show that directly applying L-Mul to the attention mechanism is almost lossless. We further show that replacing all floating point multiplications with 3-bit mantissa L-Mul in a transformer model achieves equivalent precision as using float8_e4m3 as accumulation precision in both fine-tuning and inference.
△ Less
Submitted 2 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection
Authors:
Huilin Deng,
Hongchen Luo,
Wei Zhai,
Yang Cao,
Yu Kang
Abstract:
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufacturing. However, existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts. Recently, adapting Mul…
▽ More
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufacturing. However, existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts. Recently, adapting Multimodal Large Language Models (MLLMs) for Industrial Anomaly Detection (IAD) presents a viable solution. Unlike fixed-prompt methods, MLLMs exhibit a generative paradigm with open-ended text interpretation, enabling more adaptive anomaly analysis. However, this adaption faces inherent challenges as anomalies often manifest in fine-grained regions and exhibit minimal visual discrepancies from normal samples. To address these challenges, we propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception, simultaneously providing precise detection and comprehensive analysis of anomalies. Specifically, we design a Defect-Sensitive Structure Learning scheme that transfers patch-similarities cues from visual branch to our MLLM for improved anomaly discrimination. Besides, we introduce a novel visual projector, Locality-enhanced Token Compression, which mines multi-level features in local contexts to enhance fine-grained detection. Furthermore, we introduce the Real Industrial Anomaly Detection (RIAD), a comprehensive IAD dataset with detailed anomaly descriptions and analyses, offering a valuable resource for MLLM-based IAD development. Extensive experiments on zero-shot benchmarks, including MVTec-AD, Visa, WFDD, and RIAD datasets, demonstrate our superior performance over state-of-the-art methods. The code and dataset will be available soon.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Grounding 3D Scene Affordance From Egocentric Interactions
Authors:
Cuiyu Liu,
Wei Zhai,
Yuhang Yang,
Hongchen Luo,
Sen Liang,
Yang Cao,
Zheng-Jun Zha
Abstract:
Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the env…
▽ More
Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the environment, making it reliant on predefined semantic instructions. In contrast, humans develop complex interaction skills by observing and imitating how others interact with their surroundings. To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction. This task faces the challenges of spatial complexity and alignment complexity across multiple sources. To address these challenges, we propose the Egocentric Interaction-driven 3D Scene Affordance Grounding (Ego-SAG) framework, which utilizes interaction intent to guide the model in focusing on interaction-relevant sub-regions and aligns affordance features from different sources through a bidirectional query decoder mechanism. Furthermore, we introduce the Egocentric Video-3D Scene Affordance Dataset (VSAD), covering a wide range of common interaction types and diverse 3D environments to support this task. Extensive experiments on VSAD validate both the feasibility of the proposed task and the effectiveness of our approach.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Spatial Fluctuation of the Electric Field within SF6 Streamer Channel in Highly Non-Uniform Fields: Phenomenon, Validation, and Mechanism
Authors:
Zihao Feng,
Xinxin Wang,
Xiaobing Zou,
Haiyun Luo,
Yangyang Fu
Abstract:
The electric field within the streamer channel is a critical parameter in the calculation model for the nonlinear breakdown voltage of SF6, motivating the research presented in this paper. By using a 2D fluid model, we investigate the microscopic characteristics of the SF6 streamer channel in highly non-uniform fields and uncover a previously unexplained coherent structure: the spatial fluctuation…
▽ More
The electric field within the streamer channel is a critical parameter in the calculation model for the nonlinear breakdown voltage of SF6, motivating the research presented in this paper. By using a 2D fluid model, we investigate the microscopic characteristics of the SF6 streamer channel in highly non-uniform fields and uncover a previously unexplained coherent structure: the spatial fluctuation of the electric field (SFEF). We validate the physical validity of SFEF by modifying model parameters that could potentially introduce non-physical effects. Further comparative analysis reveals that SFEF is driven by an ion-conducting channel formed due to the strong electronegativity of SF6. This ion-conducting channel exhibits local characteristics, which fundamentally arise from the slow response of charged species to local charge relaxation. We identify that some charge separation originates from the accumulation of negative ions at the rear edge of the streamer head due to strong electric field shielding in this region. As the streamer propagates, charge separation is continuously generated and passively carried into the streamer channel, ultimately forming the SFEF. Finally, we confirm that SFEF does not occur in uniform fields, indicating that it is a phenomenon exclusive to highly non-uniform fields. These findings provide a deep insight into the electric field within the SF6 streamer channel and offer a potential avenue for further investigation into the mechanisms of SF6 nonlinear breakdown voltage.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Authors:
Shaoxiong Ji,
Zihao Li,
Indraneil Paul,
Jaakko Paavola,
Peiqin Lin,
Pinzhen Chen,
Dayy�n O'Brien,
Hengyu Luo,
Hinrich Sch�tze,
J�rg Tiedemann,
Barry Haddow
Abstract:
In this work, we introduce EMMA-500, a large-scale multilingual language model continue-trained on texts across 546 languages designed for enhanced multilingual performance, focusing on improving language coverage for low-resource languages. To facilitate continual pre-training, we compile the MaLA corpus, a comprehensive multilingual dataset enriched with curated datasets across diverse domains.…
▽ More
In this work, we introduce EMMA-500, a large-scale multilingual language model continue-trained on texts across 546 languages designed for enhanced multilingual performance, focusing on improving language coverage for low-resource languages. To facilitate continual pre-training, we compile the MaLA corpus, a comprehensive multilingual dataset enriched with curated datasets across diverse domains. Leveraging this corpus, we conduct extensive continual pre-training of the Llama 2 7B model, resulting in EMMA-500, which demonstrates robust performance across a wide collection of benchmarks, including a comprehensive set of multilingual tasks and PolyWrite, an open-ended generation benchmark developed in this study. Our results highlight the effectiveness of continual pre-training in expanding large language models' language capacity, particularly for underrepresented languages, demonstrating significant gains in cross-lingual transfer, task generalization, and language adaptability.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
Authors:
Jinghao Zhang,
Wen Qian,
Hao Luo,
Fan Wang,
Feng Zhao
Abstract:
Diffusion models have made compelling progress on facilitating high-throughput daily production. Nevertheless, the appealing customized requirements are remain suffered from instance-level finetuning for authentic fidelity. Prior zero-shot customization works achieve the semantic consistence through the condensed injection of identity features, while addressing detailed low-level signatures throug…
▽ More
Diffusion models have made compelling progress on facilitating high-throughput daily production. Nevertheless, the appealing customized requirements are remain suffered from instance-level finetuning for authentic fidelity. Prior zero-shot customization works achieve the semantic consistence through the condensed injection of identity features, while addressing detailed low-level signatures through complex model configurations and subject-specific fabrications, which significantly break the statistical coherence within the overall system and limit the applicability across various scenarios. To facilitate the generic signature concentration with rectified efficiency, we present \textbf{AnyLogo}, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs. Streamlined as vanilla image generation, we discern that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model. In place of the external configurations, the gemini status of the denoising model promote the reinforced subject transmission efficiency and disentangled semantic-signature space with continuous signature decoration. Moreover, the sparse recycling paradigm is adopted to prevent the duplicated risk with compressed transmission quota for diversified signature stimulation. Extensive experiments on constructed logo-level benchmarks demonstrate the effectiveness and practicability of our methods.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Multi-functional reservoir computing
Authors:
Yao Du,
Haibo Luo,
Jianmin Guo,
Jinghua Xiao,
Yizhen Yu,
Xingang Wang
Abstract:
Whereas the power of reservoir computing (RC) in inferring chaotic systems has been well established in the literature, the studies are mostly restricted to mono-functional machines where the training and testing data are acquired from the same attractor. Here, using the strategies of attractor labeling and trajectory separation, we propose a new scheme of RC capable of learning multiple attractor…
▽ More
Whereas the power of reservoir computing (RC) in inferring chaotic systems has been well established in the literature, the studies are mostly restricted to mono-functional machines where the training and testing data are acquired from the same attractor. Here, using the strategies of attractor labeling and trajectory separation, we propose a new scheme of RC capable of learning multiple attractors generated by entirely different dynamics, namely multi-functional RC. Specifically, we demonstrate that by incorporating a label channel into the standard RC, a single machine is able to learn from data the dynamics of multiple chaotic attractors, while each attractor can be accurately retrieved by inputting just a scalar in the prediction phase. The dependence of the machine performance on the labeling and separation parameters is investigated, and it is found that the machine performance is optimized when the parameters take intermediate values. The working mechanism of multi-functional RC is analyzed by the method of functional networks in neuroscience, and it is revealed that each attractor is represented by a stable, unique functional network in the reservoir, and the optimal performance arises as a balance between the stability, complexity, and distinguishability of the functional networks.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
MuxHand: A Cable-driven Dexterous Robotic Hand Using Time-division Multiplexing Motors
Authors:
Jianle Xu,
Shoujie Li,
Hong Luo,
Houde Liu,
Xueqian Wang,
Wenbo Ding,
Chongkun Xia
Abstract:
The robotic dexterous hand is responsible for both grasping and dexterous manipulation. The number of motors directly influences both the dexterity and the cost of such systems. In this paper, we present MuxHand, a robotic hand that employs a time-division multiplexing motor (TDMM) mechanism. This system allows 9 cables to be independently controlled by just 4 motors, significantly reducing cost w…
▽ More
The robotic dexterous hand is responsible for both grasping and dexterous manipulation. The number of motors directly influences both the dexterity and the cost of such systems. In this paper, we present MuxHand, a robotic hand that employs a time-division multiplexing motor (TDMM) mechanism. This system allows 9 cables to be independently controlled by just 4 motors, significantly reducing cost while maintaining high dexterity. To enhance stability and smoothness during grasping and manipulation tasks, we have integrated magnetic joints into the three 3D-printed fingers. These joints offer superior impact resistance and self-resetting capabilities. We conduct a series of experiments to evaluate the grasping and manipulation performance of MuxHand. The results demonstrate that the TDMM mechanism can precisely control each cable connected to the finger joints, enabling robust grasping and dexterous manipulation. Furthermore, the fingertip load capacity reached 1.0 kg, and the magnetic joints effectively absorbed impact and corrected misalignments without damage.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging
Authors:
Yifan Sun,
Rang Liu,
Zhiping Lu,
Honghao Luo,
Ming Li,
Qian Liu
Abstract:
Synthetic Aperture Radar (SAR) utilizes the movement of the radar antenna over a specific area of interest to achieve higher spatial resolution imaging. In this paper, we aim to investigate the realization of SAR imaging for a stationary radar system with the assistance of active reconfigurable intelligent surface (ARIS) mounted on an unmanned aerial vehicle (UAV). As the UAV moves along the stati…
▽ More
Synthetic Aperture Radar (SAR) utilizes the movement of the radar antenna over a specific area of interest to achieve higher spatial resolution imaging. In this paper, we aim to investigate the realization of SAR imaging for a stationary radar system with the assistance of active reconfigurable intelligent surface (ARIS) mounted on an unmanned aerial vehicle (UAV). As the UAV moves along the stationary trajectory, the ARIS can not only build a high-quality virtual line-of-sight (LoS) propagation path, but its mobility can also effectively create a much larger virtual aperture, which can be utilized to realize a SAR system. In this paper, we first present a range-Doppler (RD) imaging algorithm to obtain imaging results for the proposed ARIS-empowered SAR system. Then, to further improve the SAR imaging performance, we attempt to optimize the reflection coefficients of ARIS to maximize the signal-to-noise ratio (SNR) at the stationary radar receiver under the constraints of ARIS maximum power and amplification factor. An effective algorithm based on fractional programming (FP) and majorization minimization (MM) methods is developed to solve the resulting non-convex problem. Simulation results validate the effectiveness of ARIS-assisted SAR imaging and our proposed RD imaging and ARIS optimization algorithms.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models
Authors:
Hanjun Luo,
Yingbin Jin,
Xuecheng Liu,
Tong Shang,
Ruizhe Chen,
Zuozhu Liu
Abstract:
Large Language Models (LLMs) have supplanted traditional methods in numerous natural language processing tasks. Nonetheless, in Named Entity Recognition (NER), existing LLM-based methods underperform compared to baselines and require significantly more computational resources, limiting their application. In this paper, we introduce the task of generation-based extraction and in-context classificat…
▽ More
Large Language Models (LLMs) have supplanted traditional methods in numerous natural language processing tasks. Nonetheless, in Named Entity Recognition (NER), existing LLM-based methods underperform compared to baselines and require significantly more computational resources, limiting their application. In this paper, we introduce the task of generation-based extraction and in-context classification (GEIC), designed to leverage LLMs' prior knowledge and self-attention mechanisms for NER tasks. We then propose CascadeNER, a universal and multilingual GEIC framework for few-shot and zero-shot NER. CascadeNER employs model cascading to utilize two small-parameter LLMs to extract and classify independently, reducing resource consumption while enhancing accuracy. We also introduce AnythingNER, the first NER dataset specifically designed for LLMs, including 8 languages, 155 entity types and a novel dynamic categorization system. Experiments show that CascadeNER achieves state-of-the-art performance on low-resource and fine-grained scenarios, including CrossNER and FewNERD. Our work is openly accessible.
△ Less
Submitted 25 September, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting
Authors:
Runze Chen,
Mingyu Xiao,
Haiyong Luo,
Fang Zhao,
Fan Wu,
Hao Xiong,
Qi Liu,
Meng Song
Abstract:
We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses…
▽ More
We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses, limited viewpoints, and inconsistent lighting. CSS addresses these challenges through robust geometric priors and advanced illumination modeling, enabling high-quality novel view synthesis under complex, real-world conditions. Our method demonstrates clear improvements over existing approaches, paving the way for more accurate and flexible applications in AR, VR, and large-scale 3D reconstruction.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Wakamatsu tilting subcategories and weak support tau-tilting subcategories in recollement
Authors:
Yongduo Wang,
Hongyang Luo,
Jian He,
Dejun Wu
Abstract:
In this article, we prove that if (A, B, C) is a recollement of abelian categories, then wakamatsu tilting (resp. weak support tau-tilting) subcategories in A and C can induce wakamatsu tilting (resp. weak support tau-tilting) subcategories in B, and the converses hold under natural assumptions. As an application, we mainly consider the relationship of tau-cotorsion torsion triples in (A, B, C).
In this article, we prove that if (A, B, C) is a recollement of abelian categories, then wakamatsu tilting (resp. weak support tau-tilting) subcategories in A and C can induce wakamatsu tilting (resp. weak support tau-tilting) subcategories in B, and the converses hold under natural assumptions. As an application, we mainly consider the relationship of tau-cotorsion torsion triples in (A, B, C).
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras
Authors:
Zimu Liao,
Siyan Chen,
Rong Fu,
Yi Wang,
Zhongling Su,
Hao Luo,
Li Ma,
Linning Xu,
Bo Dai,
Hengjie Li,
Zhilin Pei,
Xingcheng Zhang
Abstract:
Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lens…
▽ More
Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lenses, which are crucial for its broader real-life applications. To tackle these challenges, we introduce Fisheye-GS.This innovative method recalculates the projection transformation and its gradients for fisheye cameras. Our approach can be seamlessly integrated as a module into other efficient 3D rendering methods, emphasizing its extensibility, lightweight nature, and modular design. Since we only modified the projection component, it can also be easily adapted for use with different camera models. Compared to methods that train after undistortion, our approach demonstrates a clear improvement in visual quality.
△ Less
Submitted 11 September, 2024; v1 submitted 7 September, 2024;
originally announced September 2024.
-
RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement
Authors:
Hao Luo,
Baoliang Chen,
Lingyu Zhu,
Peilin Chen,
Shiqi Wang
Abstract:
Scene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration pe…
▽ More
Scene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration performance for all views due to the ignorance of potential feature correspondence among different views. To alleviate this issue, we make the first attempt to investigate multi-view low-light image enhancement. First, we construct a new dataset called Multi-View Low-light Triplets (MVLT), including 1,860 pairs of triple images with large illumination ranges and wide noise distribution. Each triplet is equipped with three different viewpoints towards the same scene. Second, we propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet). Specifically, in order to benefit from similar texture correspondence across different views, we design the recurrent feature enhancement, alignment and fusion (ReEAF) module, in which intra-view feature enhancement (Intra-view EN) followed by inter-view feature alignment and fusion (Inter-view AF) is performed to model the intra-view and inter-view feature propagation sequentially via multi-view collaboration. In addition, two different modules from enhancement to alignment (E2A) and from alignment to enhancement (A2E) are developed to enable the interactions between Intra-view EN and Inter-view AF, which explicitly utilize attentive feature weighting and sampling for enhancement and alignment, respectively. Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods. All of our dataset, code, and model will be available at https://github.com/hluo29/RCNet.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Style Transfer: From Stitching to Neural Networks
Authors:
Xinhe Xu,
Zhuoer Wang,
Yihan Zhang,
Yizhou Liu,
Zhaoyue Wang,
Zhihao Xu,
Muhan Zhao,
Huaiying Luo
Abstract:
This article compares two style transfer methods in image processing: the traditional method, which synthesizes new images by stitching together small patches from existing images, and a modern machine learning-based approach that uses a segmentation network to isolate foreground objects and apply style transfer solely to the background. The traditional method excels in creating artistic abstracti…
▽ More
This article compares two style transfer methods in image processing: the traditional method, which synthesizes new images by stitching together small patches from existing images, and a modern machine learning-based approach that uses a segmentation network to isolate foreground objects and apply style transfer solely to the background. The traditional method excels in creating artistic abstractions but can struggle with seamlessness, whereas the machine learning method preserves the integrity of foreground elements while enhancing the background, offering improved aesthetic quality and computational efficiency. Our study indicates that machine learning-based methods are more suited for real-world applications where detail preservation in foreground elements is essential.
△ Less
Submitted 15 September, 2024; v1 submitted 1 September, 2024;
originally announced September 2024.
-
Deep Feature Embedding for Tabular Data
Authors:
Yuqian Wu,
Hengyi Luo,
Raymond S. T. Lee
Abstract:
Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks to generate effective feature embeddings for tabular data in machine lear…
▽ More
Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks to generate effective feature embeddings for tabular data in machine learning research. For numerical features, a two-step feature expansion and deep transformation technique is used to capture copious semantic information. For categorical features, a unique identification vector for each entity is referred by a compact lookup table with a parameterized deep embedding function to uniform the embedding size dimensions, and transformed into a embedding vector using deep neural network. Experiments are conducted on real-world datasets for performance evaluation.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Tur�n number of complete bipartite graphs with bounded matching number
Authors:
Huan Luo,
Xiamiao Zhao,
Mei Lu
Abstract:
Let $\mathscr{F}$ be a family of graphs. A graph $G$ is $\mathscr{F}$-free if $G$ does not contain any $F\in \mathcal{F}$ as a subgraph. The Tur�n number $ex(n, \mathscr{F})$ is the maximum number of edges in an $n$-vertex $\mathscr{F}$-free graph. Let $M_{s}$ be the matching consisting of $ s $ independent edges. Recently, Alon and Frank determined the exact value of $ex(n,\{K_{m},M_{s+1}\})$. Ge…
▽ More
Let $\mathscr{F}$ be a family of graphs. A graph $G$ is $\mathscr{F}$-free if $G$ does not contain any $F\in \mathcal{F}$ as a subgraph. The Tur�n number $ex(n, \mathscr{F})$ is the maximum number of edges in an $n$-vertex $\mathscr{F}$-free graph. Let $M_{s}$ be the matching consisting of $ s $ independent edges. Recently, Alon and Frank determined the exact value of $ex(n,\{K_{m},M_{s+1}\})$. Gerbner obtained several results about $ex(n,\{F,M_{s+1}\})$ when $F$ satisfies certain proportions. In this paper, we determine the exact value of $ex(n,\{K_{l,t},M_{s+1}\})$ when $s, n$ are large enough for every $3\leq l\leq t$. When $n$ is large enough, we also show that $ex(n,\{K_{2,2}, M_{s+1}\})=n+{s\choose 2}-\left\lceil\frac{s}{2}\right\rceil$ for $s\ge 12$ and $ex(n,\{K_{2,t},M_{s+1}\})=n+(t-1){s\choose 2}-\left\lceil\frac{s}{2}\right\rceil$ when $t\ge 3$ and $s$ is large enough.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Frontal Slice Approaches for Tensor Linear Systems
Authors:
Hengrui Luo,
Anna Ma
Abstract:
Inspired by the row and column action methods for solving large-scale linear systems, in this work, we explore the use of frontal slices for solving tensor linear systems. In particular, this paper presents a novel approach for using frontal slices of a tensor $\mathcal{A}$ to solve tensor linear systems $\mathcal{A} * \mathcal{X} = \mathcal{B}$ where $*$ denotes the t-product. In addition, we con…
▽ More
Inspired by the row and column action methods for solving large-scale linear systems, in this work, we explore the use of frontal slices for solving tensor linear systems. In particular, this paper presents a novel approach for using frontal slices of a tensor $\mathcal{A}$ to solve tensor linear systems $\mathcal{A} * \mathcal{X} = \mathcal{B}$ where $*$ denotes the t-product. In addition, we consider variations of this method, including cyclic, block, and randomized approaches, each designed to optimize performance in different operational contexts. Our primary contribution lies in the development and convergence analysis of these methods. Experimental results on synthetically generated and real-world data, including applications such as image and video deblurring, demonstrate the efficacy of our proposed approaches and validate our theoretical findings.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge
Authors:
Mingyu Xiao,
Runze Chen,
Haiyong Luo,
Fang Zhao,
Juan Wang,
Xuepeng Ma
Abstract:
Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenari…
▽ More
Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenarios. Large matching errors significantly impact the overall relocalization process, affecting both rotational and translational accuracy. Due to the inherent limitations of the camera itself, recovering the metric scale from a single image is crucial, as this significantly impacts the translation error. To address these challenges, we propose a map-free relocalization method enhanced by instance knowledge and depth knowledge. By leveraging instance-based matching information to improve global matching results, our method significantly reduces the possibility of mismatching across different objects. The robustness of instance knowledge across the scene helps the feature point matching model focus on relevant regions and enhance matching accuracy. Additionally, we use estimated metric depth from a single image to reduce metric errors and improve scale recovery accuracy. By integrating methods dedicated to mitigating large translational and rotational errors, our approach demonstrates superior performance in map-free relocalization techniques.
△ Less
Submitted 18 September, 2024; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Hierarchical Attention and Parallel Filter Fusion Network for Multi-Source Data Classification
Authors:
Han Luo,
Feng Gao,
Junyu Dong,
Lin Qi
Abstract:
Hyperspectral image (HSI) and synthetic aperture radar (SAR) data joint classification is a crucial and yet challenging task in the field of remote sensing image interpretation. However, feature modeling in existing methods is deficient to exploit the abundant global, spectral, and local features simultaneously, leading to sub-optimal classification performance. To solve the problem, we propose a…
▽ More
Hyperspectral image (HSI) and synthetic aperture radar (SAR) data joint classification is a crucial and yet challenging task in the field of remote sensing image interpretation. However, feature modeling in existing methods is deficient to exploit the abundant global, spectral, and local features simultaneously, leading to sub-optimal classification performance. To solve the problem, we propose a hierarchical attention and parallel filter fusion network for multi-source data classification. Concretely, we design a hierarchical attention module for hyperspectral feature extraction. This module integrates global, spectral, and local features simultaneously to provide more comprehensive feature representation. In addition, we develop parallel filter fusion module which enhances cross-modal feature interactions among different spatial locations in the frequency domain. Extensive experiments on two multi-source remote sensing data classification datasets verify the superiority of our proposed method over current state-of-the-art classification approaches. Specifically, our proposed method achieves 91.44% and 80.51% of overall accuracy (OA) on the respective datasets, highlighting its superior performance.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Excellent and CO$_2$$_{0.85}$Nd$_{0.1}$Cu$_{0.05}$O$_{2-δ}$-Nd$_x$Sr$_{1-x}$Fe$_{1-y}$Cu$_y$O$_{3-δ}$ dual-phase oxygen transport membranes
Authors:
Chao Zhang,
Yue Zhu,
Xiaopeng Wang,
Yanhao Huang,
Lingyong Zeng,
Kuan Li,
Peifeng Yu,
Kangwang Wang,
Longfu Li,
Zaichen Xiang,
Rui Chen,
Xuefeng Zhu,
Huixia Luo
Abstract:
Oxygen transport membranes(OTMs)have provided great opportunities in the last decades but are suffering from the trade-off effect between stability and oxygen permeability. Here, we report a group of new planar dual-phase mixed ionic-electronic conducting (MIEC) OTMs consisting of CO$_2$$_{0.85}$Nd$_{0.1}$Cu$_{0.05}$O$_2$ (CNCO) and Nd$_x$Sr$_{1-x}$Fe$_{1-y}$Cu$_y$O$_3$(NSFCO; $x = 0.4, 0.6$;…
▽ More
Oxygen transport membranes(OTMs)have provided great opportunities in the last decades but are suffering from the trade-off effect between stability and oxygen permeability. Here, we report a group of new planar dual-phase mixed ionic-electronic conducting (MIEC) OTMs consisting of CO$_2$$_{0.85}$Nd$_{0.1}$Cu$_{0.05}$O$_2$ (CNCO) and Nd$_x$Sr$_{1-x}$Fe$_{1-y}$Cu$_y$O$_3$(NSFCO; $x = 0.4, 0.6$; $y = 0.05, 0.1$) phases, showing excellent oxygen permeability while comparable CO$_2$-resistant stability. The substitution of Cu as a bifunctional additive decreases the sintering temperature and enhances bulk diffusion and oxygen permeability with the co-doping of Nd.The oxygen permeation fluxes reached 2.62 and 1.52 mL min$^{-1}$ cm$^{-2}$ at 1000$^\circ$C through the optimal 60wt%Ce0.85Nd0.1Cu0.05O2-40wt%Nd0.4Sr0.6Fe0.9Cu0.1O3 (CNCO-NSFCO41) composition with He and CO$_2$ sweeping, respectively, higher than all reported dense dual-phase OTMs. Such excellent CO$_2$-tolerant permeability meets the needs of potential industrial applications. Analysis with Zhu's oxygen permeation model shows lower bulk diffusion resistance of CNCO-NSFCO41 than that of reported 60wt%Ce0.85Pr0.1Cu0.05O2-40wt%Pr0.4Sr0.6Fe0.9Cu0.1O3(CPCO-PSFCO41)and more limitation by the interfacial exchange at high temperature. All the prepared OTMs also show good long-term stability over 100 hours in both atmospheres. Our results confirm the excellent oxygen permeability and stability under a high-concentration CO2 atmosphere, providing a material candidate for CO2 capture in oxyfuel combustion.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Mapping Hydrogen Evolution Activity Trends of V-based A15 Superconducting Alloys
Authors:
Peifeng Yu,
Jie Zhan,
Xiaobing Zhang,
Kangwang Wang,
Lingyong Zeng,
Kuan Li,
Chao Zhang,
Longfu Li,
Ying Liang,
Kai Yan,
Yan Sun,
Huixia Luo
Abstract:
Exploring high-efficiency and low-cost electrocatalysts is valuable for water-splitting technologies. Recently, Si-group compounds have attracted increasing attention in electrocatalysis, considering the abundant Si-group elements on Earth. However, Si-group compounds for HER electrocatalysis have not been systematically studied. In this study, we unveil the activity trends of non-noble metal cata…
▽ More
Exploring high-efficiency and low-cost electrocatalysts is valuable for water-splitting technologies. Recently, Si-group compounds have attracted increasing attention in electrocatalysis, considering the abundant Si-group elements on Earth. However, Si-group compounds for HER electrocatalysis have not been systematically studied. In this study, we unveil the activity trends of non-noble metal catalyst A15-type V3M (i.e., V3Si, V3Ge, and V3Sn) superconductors and show that V3Si is the most efficient HER catalyst because of the high electronic conductivity and suitable d-band center. Among them, the V3Si only requires 33.4 mV to reach 10 mA cm-2, and only 57.6 mV and 114.6 mV are required to attain a high current density of 100 mA cm-2 and 500 mA cm-2, respectively. These low overpotentials are close to the 34.3 mV at 10 mA cm-2 of state-of-art Pt/C (20 %) but superior to 168.5 mV of Pt/C (20 %) at 100 mA cm-2. Furthermore, the V3Si illustrates exceptional durability with no obvious decay in the 120 h at the different current densities (i.e., 10 - 250 mA cm-2). The excellent HER activity of V3Si alloy can be ascribed to the synergies of superior electronic conductivity and suitable d-band center. Moreover, DFT calculations reveal that the absolute hydrogen adsorption Gibbs free energy is decreased after introducing the V to Si. Beyond offering a stable and high-performance electrocatalyst in an acidic medium, this work inspires the rational design of desirable silicide electrocatalysts.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Structural and Superconducting Properties in the Te-doped Spinel CuRh2Se4
Authors:
Kuan Li,
Lingyong Zeng,
Longfu Li,
Rui Chen,
Peifeng Yu,
Kangwang Wang,
Chao Zhang,
Zaichen Xiang,
Huixia Luo
Abstract:
In this paper, we discuss the impact of tellurium (Te) doping on the spinel superconductor CuRh2Se4. We conducted a comprehensive evaluation of the structural and superconducting properties of the system using various techniques, including X-ray diffraction (XRD), resistivity, magnetization, and specific heat measurements. Based on our XRD analysis, we found that the spinel superconductor CuRh2Se4…
▽ More
In this paper, we discuss the impact of tellurium (Te) doping on the spinel superconductor CuRh2Se4. We conducted a comprehensive evaluation of the structural and superconducting properties of the system using various techniques, including X-ray diffraction (XRD), resistivity, magnetization, and specific heat measurements. Based on our XRD analysis, we found that the spinel superconductor CuRh2Se4-xTex crystallizes in the space group Fd3m(227) with x in the region of 0 to 0.28, while the layered compound CuRh2Se4-xTex crystallizes in the space group P3m1 (164) with x in the region of 2.8 to 4.0. The upper critical magnetic field can be increased from 0.95(2) T for CuRh2Se4 to 3.44(1) T for CuRh2Se3.72Te0.28 by doping with elemental Te. However, the layered compound CuRh2Se4-xTex did not exhibit superconducting properties. Besides, the specific heat measurements of CuRh2Se4-xTex (x = 0, 0.1, 0.28) indicate that the Te element doping affects the electronic structure and interactions of the material and breaks the stability of the superconducting pairing, which leads to a decrease in the Tc. Finally, we show the electronic phase diagram of Tc with Te doping to summarise our findings.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Revealing the nontrivial topological surface states of catalysts for effective photochemical carbon dioxide conversion
Authors:
Kangwang Wang,
Longfu Li,
Peifeng Yu,
Nannan Tang,
Lingyong Zeng,
Kuan Li,
Chao Zhang,
Rui Chen,
Zaichen Xiang,
Huichao Wang,
Yongqing Cai,
Kai Yan,
Huixia Luo
Abstract:
Topological semimetals with protected surface states mark a new paradigm of research beyond the early landmarks of band-structure engineering, allowing fabrication of efficient catalyst to harness the rich metallic surface states to activate specific chemical processes. Herein, we demonstrate a facile solid-phase method for in-situ doping of Ir at the Os sites in the Os3Sn7, an alloy with topologi…
▽ More
Topological semimetals with protected surface states mark a new paradigm of research beyond the early landmarks of band-structure engineering, allowing fabrication of efficient catalyst to harness the rich metallic surface states to activate specific chemical processes. Herein, we demonstrate a facile solid-phase method for in-situ doping of Ir at the Os sites in the Os3Sn7, an alloy with topological states, which significantly improves the photocatalytic performance for the reduction of CO2 to CO and CH4. Experimental evidence combined with theoretical calculations reveal that the nontrivial topological surface states greatly accelerate charge-separation/electron-enrichment and adsorption/activation of CO2 molecules, rendering highly efficient reaction channels to stimulate the formation of *COOH and *CO, as well CHO*. This work shows the promise of achieving high photocatalytic performances with synthesizing topological catalysts and provides hints on the design of novel topological catalysts with superior photoactivity towards the CO2 reduction reaction.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Non-trivial Topological Surface States Regulation of 1T-OsCoTe$_2$ Enables Selective C-C Coupling for Highly Efficient Photochemical CO$_2$ Reduction Toward C$_{2+}$ hydrocarbons
Authors:
Kangwang Wang,
Mingjie Wu,
Peifeng Yu,
Hector F. Garces,
Ying Liang,
Longfu Li,
Lingyong Zeng,
Kuan Li,
Chao Zhang,
Kai Yan,
Huixia Luo
Abstract:
Despite ongoing research, the rational design of nontrivial topological semimetal surface states for the selective photocatalytic CO$_2$ conversion into valuable products remains full of challenges. Herein, we present the synthesis of 1T-OsCoTe$_2$ for the photoreduction upgrading of CO$_2$ to tricarbon alkane C$_3$H$_8$,by the integration of experimental work and theory calculation. Experimental…
▽ More
Despite ongoing research, the rational design of nontrivial topological semimetal surface states for the selective photocatalytic CO$_2$ conversion into valuable products remains full of challenges. Herein, we present the synthesis of 1T-OsCoTe$_2$ for the photoreduction upgrading of CO$_2$ to tricarbon alkane C$_3$H$_8$,by the integration of experimental work and theory calculation. Experimental studies suggested a high electron based selectivity of 71.2% for C$_3$H$_8$ and an internal quantum efficiency of 54.6% at 380 nm. In-situ X-ray photoelectron spectroscopy and X-ray absorption fine structure spectroscopy demonstrated that Co and Os atoms coordinated with Te atoms enable an efficient Os-Te-Co electron transfer to activate the generation of *CH$_3$,*CHOCO and *CH$_2$OCOCO. Density functional theory calculations further confirmed Os-Te-Co electron bridging on the improved CO$_2$ conversion kinetics. To our knowledge, this is the first report suggesting the role of Os atoms in accelerating the photocatalytic CO$_2$ conversion activity of the topological semimetal 1T-OsCoTe$_2$.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery
Authors:
Daomin Ji,
Hui Luo,
Zhifeng Bao,
J. Shane Culpepper
Abstract:
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository th…
▽ More
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository that are capable of generating similar line charts. To solve this problem, we propose a novel approach called Fine-grained Cross-modal Relevance Learning Model (FCM), which aims to estimate the relevance between a line chart and a candidate dataset. To achieve this goal, FCM first employs a visual element extractor to extract informative visual elements, i.e., lines and y-ticks, from a line chart. Then, two novel segment-level encoders are adopted to learn representations for a line chart and a dataset, preserving fine-grained information, followed by a cross-modal matcher to match the learned representations in a fine-grained way. Furthermore, we extend FCM to support line chart queries generated based on data aggregation. Last, we propose a benchmark tailored for this problem since no such dataset exists. Extensive evaluation on the new benchmark verifies the effectiveness of our proposed method. Specifically, our proposed approach surpasses the best baseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Doping Dependence of Spin-Momentum Locking in Bismuth-Based High-Temperature Cuprate Superconductors
Authors:
Hailan Luo,
Kayla Currier,
Chiu-Yun Lin,
Kenneth Gotlieb,
Ryo Mori,
Hiroshi Eisaki,
Alexei Fedorov,
Zahid Hussain,
Alessandra Lanzara
Abstract:
Non-zero spin orbit coupling has been reported in several unconventional superconductors due to the absence of inversion symmetry breaking. This contrasts with cuprate superconductors, where such interaction has been neglected for a long time. The recent report of a non-trivial spin orbit coupling in overdoped Bi2212 cuprate superconductor, has re-opened an old debate on both the source and role o…
▽ More
Non-zero spin orbit coupling has been reported in several unconventional superconductors due to the absence of inversion symmetry breaking. This contrasts with cuprate superconductors, where such interaction has been neglected for a long time. The recent report of a non-trivial spin orbit coupling in overdoped Bi2212 cuprate superconductor, has re-opened an old debate on both the source and role of such interaction and its evolution throughout the superconducting dome. Using high-resolution spin- and angle-resolved photoemission spectroscopy, we reveal a momentum-dependent spin texture throughout the hole-doped side of the superconducting phase diagram for single- and double-layer bismuth-based cuprates. The universality of the reported effect among different dopings and the disappearance of spin polarization upon lead substitution, suggest a common source. We argue that local structural fluctuations of the CuO planes and the resulting charge imbalance may cause local inversion symmetry breaking and spin polarization, which might be crucial for understanding cuprates physics.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Convergence of Symbiotic Communications and Blockchain for Sustainable and Trustworthy 6G Wireless Networks
Authors:
Haoxiang Luo,
Gang Sun,
Cheng Chi,
Hongfang Yu,
Mohsen Guizani
Abstract:
Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resou…
▽ More
Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resources and energy inefficiency. Symbiotic relationships among communication systems can complement radio resources in 6G. However, the absence of established trust relationships among diverse communication systems presents a formidable hurdle in ensuring efficient and trusted resource and service exchange within SC frameworks. To better realize trusted SC services in 6G, in this paper, we propose a solution that converges SC and blockchain, called a symbiotic blockchain network (SBN). Specifically, we first use cognitive backscatter communication to transform blockchain consensus, that is, the symbiotic blockchain consensus (SBC), so that it can be better suited for the wireless network. Then, for SBC, we propose a highly energy-efficient sharding scheme to meet the extremely low power consumption requirements in 6G. Finally, such a blockchain scheme guarantees trusted transactions of communication services in SC. Through ablation experiments, our proposed SBN demonstrates significant efficacy in mitigating energy consumption and reducing processing latency in adversarial networks, which is expected to achieve a sustainable and trusted 6G wireless network.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction
Authors:
Hui Luo,
Jiashuang Huang,
Hengrong Ju,
Tianyi Zhou,
Weiping Ding
Abstract:
Accurate cancer survival prediction is crucial for assisting clinical doctors in formulating treatment plans. Multimodal data, including histopathological images and genomic data, offer complementary and comprehensive information that can greatly enhance the accuracy of this task. However, the current methods, despite yielding promising results, suffer from two notable limitations: they do not eff…
▽ More
Accurate cancer survival prediction is crucial for assisting clinical doctors in formulating treatment plans. Multimodal data, including histopathological images and genomic data, offer complementary and comprehensive information that can greatly enhance the accuracy of this task. However, the current methods, despite yielding promising results, suffer from two notable limitations: they do not effectively utilize global context and disregard modal uncertainty. In this study, we put forward a neural network model called M2EF-NNs, which leverages multimodal and multi-instance evidence fusion techniques for accurate cancer survival prediction. Specifically, to capture global information in the images, we use a pre-trained Vision Transformer (ViT) model to obtain patch feature embeddings of histopathological images. Then, we introduce a multimodal attention module that uses genomic embeddings as queries and learns the co-attention mapping between genomic and histopathological images to achieve an early interaction fusion of multimodal information and better capture their correlations. Subsequently, we are the first to apply the Dempster-Shafer evidence theory (DST) to cancer survival prediction. We parameterize the distribution of class probabilities using the processed multimodal features and introduce subjective logic to estimate the uncertainty associated with different modalities. By combining with the Dempster-Shafer theory, we can dynamically adjust the weights of class probabilities after multimodal fusion to achieve trusted survival prediction. Finally, Experimental validation on the TCGA datasets confirms the significant improvements achieved by our proposed method in cancer survival prediction and enhances the reliability of the model.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Efficient Decision Trees for Tensor Regressions
Authors:
Hengrui Luo,
Akira Horiguchi,
Li Ma
Abstract:
We proposed the tensor-input tree (TT) method for scalar-on-tensor and tensor-on-tensor regression problems. We first address scalar-on-tensor problem by proposing scalar-output regression tree models whose input variable are tensors (i.e., multi-way arrays). We devised and implemented fast randomized and deterministic algorithms for efficient fitting of scalar-on-tensor trees, making TT competiti…
▽ More
We proposed the tensor-input tree (TT) method for scalar-on-tensor and tensor-on-tensor regression problems. We first address scalar-on-tensor problem by proposing scalar-output regression tree models whose input variable are tensors (i.e., multi-way arrays). We devised and implemented fast randomized and deterministic algorithms for efficient fitting of scalar-on-tensor trees, making TT competitive against tensor-input GP models. Based on scalar-on-tensor tree models, we extend our method to tensor-on-tensor problems using additive tree ensemble approaches. Theoretical justification and extensive experiments on real and synthetic datasets are provided to illustrate the performance of TT.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
PEAR: Phrase-Based Hand-Object Interaction Anticipation
Authors:
Zichen Zhang,
Hongchen Luo,
Wei Zhai,
Yang Cao,
Yu Kang
Abstract:
First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction m…
▽ More
First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction manipulation (i.e., manipulation trajectories and hand poses with contact). Existing research typically anticipates only interaction intention while neglecting manipulation, resulting in incomplete predictions and an increased likelihood of intention errors due to the lack of manipulation constraints. To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation. To handle uncertainties in the interaction process, we employ a twofold approach. Firstly, we perform cross-alignment of verbs, nouns, and images to reduce the diversity of hand movement patterns and object functional attributes, thereby mitigating intention uncertainty. Secondly, we establish bidirectional constraints between intention and manipulation using dynamic integration and residual connections, ensuring consistency among elements and thus overcoming manipulation uncertainty. To rigorously evaluate the performance of the proposed model, we collect a new task-relevant dataset, EGO-HOIP, with comprehensive annotations. Extensive experimental results demonstrate the superiority of our method.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
SuperVINS: A visual-inertial SLAM framework integrated deep learning features
Authors:
Hongkun Luo,
Chi Guo,
Yang Liu,
Zengke Li
Abstract:
In this article, we propose enhancements to VINS-Fusion by incorporating deep learning features and deep learning matching methods. We implemented the training of deep learning feature bag of words and utilized these features for loop closure detection. Additionally, we introduce the RANSAC algorithm in the deep learning feature matching module to optimize matching. SuperVINS, an improved version…
▽ More
In this article, we propose enhancements to VINS-Fusion by incorporating deep learning features and deep learning matching methods. We implemented the training of deep learning feature bag of words and utilized these features for loop closure detection. Additionally, we introduce the RANSAC algorithm in the deep learning feature matching module to optimize matching. SuperVINS, an improved version of VINS-Fusion, outperforms it in terms of positioning accuracy, robustness, and more. Particularly in challenging scenarios like low illumination and rapid jitter, traditional geometric features fail to fully exploit image information, whereas deep learning features excel at capturing image features.To validate our proposed improvement scheme, we conducted experiments using open source datasets. We performed a comprehensive analysis of the experimental results from both qualitative and quantitative perspectives. The results demonstrate the feasibility and effectiveness of this deep learning-based approach for SLAM systems.To foster knowledge exchange in this field, we have made the code for this article publicly available. You can find the code at this link: https://github.com/luohongk/SuperVINS.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning
Authors:
Yu Feng,
Zhen Tian,
Yifan Zhu,
Zongfu Han,
Haoran Luo,
Guangwei Zhang,
Meina Song
Abstract:
The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this pape…
▽ More
The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.
△ Less
Submitted 2 August, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
Accelerated Primal-Dual Proximal Gradient Splitting Methods for Convex-Concave Saddle-Point Problems
Authors:
Hao Luo
Abstract:
In this paper, based a novel primal-dual dynamical model with adaptive scaling parameters and Bregman divergences, we propose new accelerated primal-dual proximal gradient splitting methods for solving bilinear saddle-point problems with provable optimal nonergodic convergence rates. For the first, using the spectral analysis, we show that a naive extension of acceleration model for unconstrained…
▽ More
In this paper, based a novel primal-dual dynamical model with adaptive scaling parameters and Bregman divergences, we propose new accelerated primal-dual proximal gradient splitting methods for solving bilinear saddle-point problems with provable optimal nonergodic convergence rates. For the first, using the spectral analysis, we show that a naive extension of acceleration model for unconstrained optimization problems to a quadratic game is unstable. Motivated by this, we present an accelerated primal-dual hybrid gradient (APDHG) flow which combines acceleration with careful velocity correction. To work with non-Euclidean distances, we also equip our APDHG model with general Bregman divergences and prove the exponential decay of a Lyapunov function. Then, new primal-dual splitting methods are developed based on proper semi-implicit Euler schemes of the continuous model, and the theoretical convergence rates are nonergodic and optimal with respect to the matrix norms,\, Lipschitz constants and convexity parameters. Thanks to the primal and dual scaling parameters, both the algorithm designing and convergence analysis cover automatically the convex and (partially) strongly convex objectives. Moreover, the use of Bregman divergences not only unifies the standard Euclidean distances and general cases in an elegant way, but also makes our methods more flexible and adaptive to problem-dependent metrics.
△ Less
Submitted 3 September, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary
Authors:
Hanjun Luo,
Ziye Deng,
Haoyu Huang,
Xuecheng Liu,
Ruizhe Chen,
Zuozhu Liu
Abstract:
With the rapid development of Text-to-Image (T2I) models, biases in human image generation against demographic social groups become a significant concern, impacting fairness and ethical standards in AI. Some researchers propose their methods to tackle with the issue. However, existing methods are designed for specific models with fixed prompts, limiting their adaptability to the fast-evolving mode…
▽ More
With the rapid development of Text-to-Image (T2I) models, biases in human image generation against demographic social groups become a significant concern, impacting fairness and ethical standards in AI. Some researchers propose their methods to tackle with the issue. However, existing methods are designed for specific models with fixed prompts, limiting their adaptability to the fast-evolving models and diverse practical scenarios. Moreover, they neglect the impact of hallucinations, leading to discrepancies between expected and actual results. To address these issues, we introduce VersusDebias, a novel and universal debiasing framework for biases in arbitrary T2I models, consisting of an array generation (AG) module and an image generation (IG) module. The self-adaptive AG module generates specialized attribute arrays to post-process hallucinations and debias multiple attributes simultaneously. The IG module employs a small language model to modify prompts according to the arrays and drives the T2I model to generate debiased images, enabling zero-shot debiasing. Extensive experiments demonstrate VersusDebias's capability to debias any models across gender, race, and age simultaneously. In both zero-shot and few-shot scenarios, VersusDebias outperforms existing methods, showcasing its exceptional utility. Our work is accessible at https://github.com/VersusDebias/VersusDebias to ensure reproducibility and facilitate further research.
△ Less
Submitted 16 August, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection
Authors:
Jiahao Wang,
Mingxuan Li,
Haichen Luo,
Jinguo Zhu,
Aijun Yang,
Mingzhe Rong,
Xiaohua Wang
Abstract:
The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista…
▽ More
The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assistant designed to offer professional and reliable inspection services for power transmission line by engaging in dialogues with humans. Moreover, we also construct a large-scale and high-quality dataset specialized for the inspection task. By employing a two-stage training strategy on the constructed dataset, Power-LLaVA demonstrates exceptional performance at a comparatively low training cost. Extensive experiments further prove the great capabilities of Power-LLaVA within the realm of power transmission line inspection. Code shall be released.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design
Authors:
Zhi Hao Luo,
Luis Lara,
Ge Ya Luo,
Florian Golemo,
Christopher Beckham,
Christopher Pal
Abstract:
Text conditioned generative models for images have yielded impressive results. Text conditioned floorplan generation as a special type of raster image generation task also received particular attention. However there are many use cases in floorpla generation where numerical properties of the generated result are more important than the aesthetics. For instance, one might want to specify sizes for…
▽ More
Text conditioned generative models for images have yielded impressive results. Text conditioned floorplan generation as a special type of raster image generation task also received particular attention. However there are many use cases in floorpla generation where numerical properties of the generated result are more important than the aesthetics. For instance, one might want to specify sizes for certain rooms in a floorplan and compare the generated floorplan with given specifications Current approaches, datasets and commonly used evaluations do not support these kinds of constraints. As such, an attractive strategy is to generate an intermediate data structure that contains numerical properties of a floorplan which can be used to generate the final floorplan image. To explore this setting we (1) construct a new dataset for this data-structure to data-structure formulation of floorplan generation using two popular image based floorplan datasets RPLAN and ProcTHOR-10k, and provide the tools to convert further procedurally generated ProcTHOR floorplan data into our format. (2) We explore the task of floorplan generation given a partial or complete set of constraints and we design a series of metrics and benchmarks to enable evaluating how well samples generated from models respect the constraints. (3) We create multiple baselines by finetuning a large language model (LLM), Llama3, and demonstrate the feasibility of using floorplan data structure conditioned LLMs for the problem of floorplan generation respecting numerical constraints. We hope that our new datasets and benchmarks will encourage further research on different ways to improve the performance of LLMs and other generative modelling techniques for generating designs where quantitative constraints are only partially specified, but must be respected.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM
Authors:
Hanjun Luo,
Haoyu Huang,
Ziye Deng,
Xuecheng Liu,
Ruizhe Chen,
Zuozhu Liu
Abstract:
Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, which also raises concerns about the social biases in their outputs, especially in the human generation. Sociological research has established systematic classifications of bias. However, existing bias research about T2I models conflates different types of bias, impeding me…
▽ More
Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, which also raises concerns about the social biases in their outputs, especially in the human generation. Sociological research has established systematic classifications of bias. However, existing bias research about T2I models conflates different types of bias, impeding methodological progress. In this paper, we introduce BIGbench, a unified benchmark for Biases of Image Generation, featuring a meticulously designed dataset. Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes, which ensures exceptional accuracy for analysis. Furthermore, BIGbench applies advanced multi-modal large language models to achieve fully automated and highly accurate evaluations. We apply BIGbench to evaluate eight representative general T2I models and three debiased methods. Our human evaluation results underscore BIGbench's effectiveness in aligning images and identifying various biases. Besides, our study also reveal new research directions about biases, such as the effect of distillation and irrelevant protected attributes. Our benchmark is openly accessible at https://github.com/BIGbench2024/BIGbench2024/ to ensure reproducibility.
△ Less
Submitted 16 August, 2024; v1 submitted 21 July, 2024;
originally announced July 2024.
-
Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness
Authors:
Harsh Kumar,
Suhyeon Yoo,
Angela Zavaleta Bernuy,
Jiakai Shi,
Huayin Luo,
Joseph Williams,
Anastasia Kuzminykh,
Ashton Anderson,
Rachel Kornfield
Abstract:
Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigati…
▽ More
Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigations of LLMs to support behavior change remain underexplored. We conducted two randomized experiments to assess the impact of LLM agents on user engagement with mindfulness exercises. First, a single-session study, involved 502 crowdworkers; second, a three-week study, included 54 participants. We explored two types of LLM agents: one providing information and another facilitating self-reflection. Both agents enhanced users' intentions to practice mindfulness. However, only the information-providing LLM, featuring a friendly persona, significantly improved engagement with the exercises. Our findings suggest that specific LLM agents may bridge the social support gap in digital health interventions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model
Authors:
Tao Wang,
Wei Wen,
Jingzhi Zhai,
Kang Xu,
Haoming Luo
Abstract:
Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Po…
▽ More
Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Point Cloud Mamba Segmentation Model (Serialized Point Mamba), which leverages a state-space model to dynamically compress sequences, reduce memory usage, and enhance computational efficiency. Serialized Point Mamba integrates local-global modeling capabilities with linear complexity, achieving state-of-the-art performance on both indoor and outdoor datasets. This approach includes novel techniques such as staged point cloud sequence learning, grid pooling, and Conditional Positional Encoding, facilitating effective segmentation across diverse point cloud tasks. Our method achieved 76.8 mIoU on Scannet and 70.3 mIoU on S3DIS. In Scannetv2 instance segmentation, it recorded 40.0 mAP. It also had the lowest latency and reasonable memory use, making it the SOTA among point semantic segmentation models based on mamba.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Magnetic skin effect in Pb(Fe$_{1/2}$Nb$_{1/2}$)O$_3$
Authors:
N. Giles-Donovan,
A. D. Hillier,
K. Ishida,
B. V. Hampshire,
S. R. Giblin,
B. Roessli,
P. M. Gehring,
G. Xu,
X. Li,
H. Luo,
S. Cochran,
C. Stock
Abstract:
Relaxor-ferroelectrics display exceptional dielectric properties resulting from the underlying random dipolar fields induced by strong chemical inhomogeneity. An unusual structural aspect of relaxors is a skin-effect where the near-surface region in single crystals exhibit structures and critical phenomena that differ from the bulk. Relaxors are unique in that this skin effect extends over a macro…
▽ More
Relaxor-ferroelectrics display exceptional dielectric properties resulting from the underlying random dipolar fields induced by strong chemical inhomogeneity. An unusual structural aspect of relaxors is a skin-effect where the near-surface region in single crystals exhibit structures and critical phenomena that differ from the bulk. Relaxors are unique in that this skin effect extends over a macroscopic lengthscale of $\sim$ 100$μ$m whereas usual surface layers only extend over a few unit cells (or $\sim$ nm). We present a muon spectroscopy study of Pb(Fe$_{1/2}$Nb$_{1/2}$)O$_{3}$ (PFN) which displays ferroelectric order, including many relaxor-like dielectric properties such as a frequency broadened dielectric response, and antiferromagnetism with spatially short-range polar correlations and hence can be termed a multiferroic. In terms of the magnetic behavior determined by the Fe$^{3+}$ ($S=5/2$, $L\approx0$) ions, PFN has been characterized as a unique example of a "cluster spin-glass". We use variable momentum muon spectroscopy to study the depth dependence of the slow magnetic relaxations in a large 1 cm$^{3}$ crystal of PFN. Zero-field positive muon spin relaxation is parameterized using a stretched exponential, indicative of a distribution of relaxation rates of the Fe$^{3+}$ spins. This bandwidth of frequencies changes as a function of muon momentum, indicative of a change in the Fe$^{3+}$ relaxation rates as a function of muon implantation depth in our single crystal. Using negative muon elemental analysis, we find small-to-no measurable change in the Fe$^{3+}$/Nb$^{5+}$ concentration with depth implying that chemical concentration alone cannot account for the change in the relaxational dynamics. PFN displays an analogous magnetic skin effect reported to exist in the structural properties of relaxor-ferroelectrics.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.