Skip to main content

Showing 1–7 of 7 results for author: Reizinger, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.13728  [pdf, other

    cs.CL cs.LG stat.ML

    Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts

    Authors: Anna M�sz�ros, Szilvia Ujv�ry, Wieland Brendel, Patrik Reizinger, Ferenc Husz�r

    Abstract: LLMs show remarkable emergent abilities, such as inferring concepts from presumably out-of-distribution prompts, known as in-context learning. Though this success is often attributed to the Transformer architecture, our systematic understanding is limited. In complex real-world data sets, even defining what is out-of-distribution is not obvious. To better understand the OOD behaviour of autoregres… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  2. arXiv:2407.00143  [pdf, other

    cs.LG cs.CV stat.ML

    InfoNCE: Identifying the Gap Between Theory and Practice

    Authors: Evgenia Rusak, Patrik Reizinger, Attila Juhos, Oliver Bringmann, Roland S. Zimmermann, Wieland Brendel

    Abstract: Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under certain assumptions, the learned representations uncover the ground-truth latent factors. We argue these theories overlook crucial aspects of how CL is deployed in practice. Specifically, they assume that within a positive pair, all latent factors either vary to a similar extent, or that some do not vary at all.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2406.14302  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

    Authors: Patrik Reizinger, Siyuan Guo, Ferenc Husz�r, Bernhard Sch�lkopf, Wieland Brendel

    Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)… ▽ More

    Submitted 9 September, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2405.01964  [pdf, other

    stat.ML cs.LG

    Position: Understanding LLMs Requires More Than Statistical Generalization

    Authors: Patrik Reizinger, Szilvia Ujv�ry, Anna M�sz�ros, Anna Kerekes, Wieland Brendel, Ferenc Husz�r

    Abstract: The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statist… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted as a position paper at ICML2024, Code: https://github.com/rpatrik96/llm-non-identifiability

  5. arXiv:2311.18048  [pdf, other

    cs.LG cs.CE eess.SY stat.ME

    An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis

    Authors: Goutham Rajendran, Patrik Reizinger, Wieland Brendel, Pradeep Ravikumar

    Abstract: We investigate the relationship between system identification and intervention design in dynamical systems. While previous research demonstrated how identifiable representation learning methods, such as Independent Component Analysis (ICA), can reveal cause-effect relationships, it relied on a passive perspective without considering how to collect data. Our work shows that in Gaussian Linear Time-… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CLeaR2024 camera ready. Code available at https://github.com/rpatrik96/lti-ica

  6. arXiv:2206.02416  [pdf, other

    stat.ML cs.AI cs.LG

    Embrace the Gap: VAEs Perform Independent Mechanism Analysis

    Authors: Patrik Reizinger, Luigi Gresele, Jack Brady, Julius von K�gelgen, Dominik Zietlow, Bernhard Sch�lkopf, Georg Martius, Wieland Brendel, Michel Besserve

    Abstract: Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, sinc… ▽ More

    Submitted 27 January, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: NeurIPS2022 final version

  7. Attention-based Curiosity-driven Exploration in Deep Reinforcement Learning

    Authors: Patrik Reizinger, M�rton Szemenyei

    Abstract: Reinforcement Learning enables to train an agent via interaction with the environment. However, in the majority of real-world scenarios, the extrinsic feedback is sparse or not sufficient, thus intrinsic reward formulations are needed to successfully train the agent. This work investigates and extends the paradigm of curiosity-driven exploration. First, a probabilistic approach is taken to exploit… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP2020, 5 pages, 8 figures, 2 tables