-
Assessment of Sports Concussion in Female Athletes: A Role for Neuroinformatics?
Authors:
Rachel Edelstein,
Sterling Gutterman,
Benjamin Newman,
John Darrell Van Horn
Abstract:
Over the past decade, the intricacies of sports-related concussions among female athletes have become readily apparent. Traditional clinical methods for diagnosing concussions suffer limitations when applied to female athletes, often failing to capture subtle changes in brain structure and function. Advanced neuroinformatics techniques and machine learning models have become invaluable assets in t…
▽ More
Over the past decade, the intricacies of sports-related concussions among female athletes have become readily apparent. Traditional clinical methods for diagnosing concussions suffer limitations when applied to female athletes, often failing to capture subtle changes in brain structure and function. Advanced neuroinformatics techniques and machine learning models have become invaluable assets in this endeavor. While these technologies have been extensively employed in understanding concussion in male athletes, there remains a significant gap in our comprehension of their effectiveness for female athletes. With its remarkable data analysis capacity, machine learning offers a promising avenue to bridge this deficit. By harnessing the power of machine learning, researchers can link observed phenotypic neuroimaging data to sex-specific biological mechanisms, unraveling the mysteries of concussions in female athletes. Furthermore, embedding methods within machine learning enable examining brain architecture and its alterations beyond the conventional anatomical reference frame. In turn, allows researchers to gain deeper insights into the dynamics of concussions, treatment responses, and recovery processes. To guarantee that female athletes receive the optimal care they deserve, researchers must employ advanced neuroimaging techniques and sophisticated machine-learning models. These tools enable an in-depth investigation of the underlying mechanisms responsible for concussion symptoms stemming from neuronal dysfunction in female athletes. This paper endeavors to address the crucial issue of sex differences in multimodal neuroimaging experimental design and machine learning approaches within female athlete populations, ultimately ensuring that they receive the tailored care they require when facing the challenges of concussions.
△ Less
Submitted 9 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Linking Symptom Inventories using Semantic Textual Similarity
Authors:
Eamonn Kennedy,
Shashank Vadlamani,
Hannah M Lindsey,
Kelly S Peterson,
Kristen Dams OConnor,
Kenton Murray,
Ronak Agarwal,
Houshang H Amiri,
Raeda K Andersen,
Talin Babikian,
David A Baron,
Erin D Bigler,
Karen Caeyenberghs,
Lisa Delano-Wood,
Seth G Disner,
Ekaterina Dobryakova,
Blessen C Eapen,
Rachel M Edelstein,
Carrie Esopenko,
Helen M Genova,
Elbert Geuze,
Naomi J Goodrich-Hunsaker,
Jordan Grafman,
Asta K Haberg,
Cooper B Hodges
, et al. (57 additional authors not shown)
Abstract:
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores…
▽ More
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Absynthe: Abstract Interpretation-Guided Synthesis
Authors:
Sankha Narayan Guria,
Jeffrey S. Foster,
David Van Horn
Abstract:
Synthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular…
▽ More
Synthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular language model. In this paper, we propose Absynthe, an alternative approach based on user-defined abstract semantics that aims to be both lightweight and language agnostic, yet effective in guiding the search for programs. A synthesis goal in Absynthe is specified as an abstract specification in a lightweight user-defined abstract domain and concrete test cases. The synthesis engine is parameterized by the abstract semantics and independent of the source language. Absynthe validates candidate programs against test cases using the actual concrete language implementation to ensure correctness. We formalize the synthesis rules for Absynthe and describe how the key ideas are scaled-up in our implementation in Ruby. We evaluated Absynthe on SyGuS strings benchmark and found it competitive with other enumerative search solvers. Moreover, Absynthe's ability to combine abstract domains allows the user to move along a cost spectrum, i.e., expressive domains prune more programs but require more time. Finally, to verify Absynthe can act as a general purpose synthesis tool, we use Absynthe to synthesize Pandas data frame manipulating programs in Python using simple abstractions like types and column labels of a data frame. Absynthe reaches parity with AutoPandas, a deep learning based tool for the same benchmark suite. In summary, our results demonstrate Absynthe is a promising step forward towards a general-purpose approach to synthesis that may broaden the applicability of synthesis to more $\ldots$
△ Less
Submitted 24 April, 2023; v1 submitted 25 February, 2023;
originally announced February 2023.
-
On the long-term archiving of research data
Authors:
Cyril Pernet,
Claus Svarer,
Ross Blair,
John D. Van Horn,
Russell A. Poldrack
Abstract:
Accessing research data at any time is what FAIR (Findable Accessible Interoperable Reusable) data sharing aims to achieve at scale. Yet, we argue that it is not sustainable to keep accumulating and maintaining all datasets for rapid access, considering the monetary and ecological cost of maintaining repositories. Here, we address the issue of cold data storage: when to dispose of data for offline…
▽ More
Accessing research data at any time is what FAIR (Findable Accessible Interoperable Reusable) data sharing aims to achieve at scale. Yet, we argue that it is not sustainable to keep accumulating and maintaining all datasets for rapid access, considering the monetary and ecological cost of maintaining repositories. Here, we address the issue of cold data storage: when to dispose of data for offline storage, how can this be done while maintaining FAIR principles and who should be responsible for cold archiving and long-term preservation.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
A Formal Model of Checked C
Authors:
Liyi Li,
Yiyun Liu,
Deena L. Postol,
Leonidas Lampropoulos,
David Van Horn,
Michael Hicks
Abstract:
We present a formal model of Checked C, a dialect of C that aims to enforce spatial memory safety. Our model pays particular attention to the semantics of dynamically sized, potentially null-terminated arrays. We formalize this model in Coq, and prove that any spatial memory safety errors can be blamed on portions of the program labeled unchecked; this is a Checked C feature that supports incremen…
▽ More
We present a formal model of Checked C, a dialect of C that aims to enforce spatial memory safety. Our model pays particular attention to the semantics of dynamically sized, potentially null-terminated arrays. We formalize this model in Coq, and prove that any spatial memory safety errors can be blamed on portions of the program labeled unchecked; this is a Checked C feature that supports incremental porting and backward compatibility. While our model's operational semantics uses annotated ("fat") pointers to enforce spatial safety, we show that such annotations can be safely erased: Using PLT Redex we formalize an executable version of our model and a compilation procedure from it to an untyped C-like language, and use randomized testing to validate that generated code faithfully simulates the original. Finally, we develop a custom random generator for well-typed and almost-well-typed terms in our Redex model, and use it to search for inconsistencies between our model and the Clang Checked C implementation. We find these steps to be a useful way to co-develop a language (Checked C is still in development) and a core model of it.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
RbSyn: Type- and Effect-Guided Program Synthesis
Authors:
Sankha Narayan Guria,
Jeffrey S. Foster,
David Van Horn
Abstract:
In recent years, researchers have explored component-based synthesis, which aims to automatically construct programs that operate by composing calls to existing APIs. However, prior work has not considered efficient synthesis of methods with side effects, e.g., web app methods that update a database. In this paper, we introduce RbSyn, a novel type- and effect-guided synthesis tool for Ruby. An RbS…
▽ More
In recent years, researchers have explored component-based synthesis, which aims to automatically construct programs that operate by composing calls to existing APIs. However, prior work has not considered efficient synthesis of methods with side effects, e.g., web app methods that update a database. In this paper, we introduce RbSyn, a novel type- and effect-guided synthesis tool for Ruby. An RbSyn synthesis goal is specified as the type for the target method and a series of test cases it must pass. RbSyn works by recursively generating well-typed candidate method bodies whose write effects match the read effects of the test case assertions. After finding a set of candidates that separately satisfy each test, RbSyn synthesizes a solution that branches to execute the correct candidate code under the appropriate conditions. We formalize RbSyn on a core, object-oriented language $λ_{syn}$ and describe how the key ideas of the model are scaled-up in our implementation for Ruby. We evaluated RbSyn on 19 benchmarks, 12 of which come from popular, open-source Ruby apps. We found that RbSyn synthesizes correct solutions for all benchmarks, with 15 benchmarks synthesizing in under 9 seconds, while the slowest benchmark takes 83 seconds. Using observed reads to guide synthesize is effective: using type-guidance alone times out on 10 of 12 app benchmarks. We also found that using less precise effect annotations leads to worse synthesis performance. In summary, we believe type- and effect-guided synthesis is an important step forward in synthesis of effectful methods from test cases.
△ Less
Submitted 7 April, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Corpse Reviver: Sound and Efficient Gradual Typing via Contract Verification
Authors:
Cameron Moy,
Phúc C. Nguyen,
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
Gradually-typed programming languages permit the incremental addition of static types to untyped programs. To remain sound, languages insert run-time checks at the boundaries between typed and untyped code. Unfortunately, performance studies have shown that the overhead of these checks can be disastrously high, calling into question the viability of sound gradual typing. In this paper, we show tha…
▽ More
Gradually-typed programming languages permit the incremental addition of static types to untyped programs. To remain sound, languages insert run-time checks at the boundaries between typed and untyped code. Unfortunately, performance studies have shown that the overhead of these checks can be disastrously high, calling into question the viability of sound gradual typing. In this paper, we show that by building on existing work on soft contract verification, we can reduce or eliminate this overhead.
Our key insight is that while untyped code cannot be trusted by a gradual type system, there is no need to consider only the worst case when optimizing a gradually-typed program. Instead, we statically analyze the untyped portions of a gradually-typed program to prove that almost all of the dynamic checks implied by gradual type boundaries cannot fail, and can be eliminated at compile time. Our analysis is modular, and can be applied to any portion of a program.
We evaluate this approach on a dozen existing gradually-typed programs previously shown to have prohibitive performance overhead---with a median overhead of $3.5\times$ and up to $73.6\times$ in the worst case---and eliminate all overhead in most cases, suffering only $1.6\times$ overhead in the worst case.
△ Less
Submitted 9 October, 2020; v1 submitted 24 July, 2020;
originally announced July 2020.
-
Type-Level Computations for Ruby Libraries
Authors:
Milod Kazerounian,
Sankha Narayan Guria,
Niki Vazou,
Jeffrey S. Foster,
David Van Horn
Abstract:
Many researchers have explored ways to bring static typing to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a database query in Ruby on Rails depends on the table and column names used in the query. To address this issue, we introduce CompRDL, a type system…
▽ More
Many researchers have explored ways to bring static typing to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a database query in Ruby on Rails depends on the table and column names used in the query. To address this issue, we introduce CompRDL, a type system for Ruby that allows library method type signatures to include type-level computations (or comp types for short). Combined with singleton types for table and column names, comp types let us give database query methods type signatures that compute a table's schema to yield very precise type information. Comp types for hash, array, and string libraries can also increase precision and thereby reduce the need for type casts. We formalize CompRDL and prove its type system sound. Rather than type check the bodies of library methods with comp types---those methods may include native code or be complex---CompRDL inserts run-time checks to ensure library methods abide by their computed types. We evaluated CompRDL by writing annotations with type-level computations for several Ruby core libraries and database query APIs. We then used those annotations to type check two popular Ruby libraries and four Ruby on Rails web apps. We found the annotations were relatively compact and could successfully type check 132 methods across our subject programs. Moreover, the use of type-level computations allowed us to check more expressive properties, with fewer manually inserted casts, than was possible without type-level computations. In the process, we found two type errors and a documentation error that were confirmed by the developers. Thus, we believe CompRDL is an important step forward in bringing precise static type checking to dynamic languages.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Size-Change Termination as a Contract
Authors:
Phuc C. Nguyen,
Thomas Gilray,
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
Termination is an important but undecidable program property, which has led to a large body of work on static methods for conservatively predicting or enforcing termination. One such method is the size-change termination approach of Lee, Jones, and Ben-Amram, which operates in two phases: (1) abstract programs into "size-change graphs," and (2) check these graphs for the size-change property: the…
▽ More
Termination is an important but undecidable program property, which has led to a large body of work on static methods for conservatively predicting or enforcing termination. One such method is the size-change termination approach of Lee, Jones, and Ben-Amram, which operates in two phases: (1) abstract programs into "size-change graphs," and (2) check these graphs for the size-change property: the existence of paths that lead to infinite decreasing sequences.
We transpose these two phases with an operational semantics that accounts for the run-time enforcement of the size-change property, postponing (or entirely avoiding) program abstraction. This choice has two key consequences: (1) size-change termination can be checked at run-time and (2) termination can be rephrased as a safety property analyzed using existing methods for systematic abstraction.
We formulate run-time size-change checks as contracts in the style of Findler and Felleisen. The result compliments existing contracts that enforce partial correctness specifications to obtain contracts for total correctness. Our approach combines the robustness of the size-change principle for termination with the precise information available at run-time. It has tunable overhead and can check for nontermination without the conservativeness necessary in static checking. To obtain a sound and computable termination analysis, we apply existing abstract interpretation techniques directly to the operational semantics, avoiding the need for custom abstractions for termination. The resulting analyzer is competitive with with existing, purpose-built analyzers.
△ Less
Submitted 25 April, 2019; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Constructive Galois Connections
Authors:
David Darais,
David Van Horn
Abstract:
Galois connections are a foundational tool for structuring abstraction in semantics and their use lies at the heart of the theory of abstract interpretation. Yet, mechanization of Galois connections using proof assistants remains limited to restricted modes of use, preventing their general application in mechanized metatheory and certified programming.
This paper presents constructive Galois con…
▽ More
Galois connections are a foundational tool for structuring abstraction in semantics and their use lies at the heart of the theory of abstract interpretation. Yet, mechanization of Galois connections using proof assistants remains limited to restricted modes of use, preventing their general application in mechanized metatheory and certified programming.
This paper presents constructive Galois connections, a variant of Galois connections that is effective both on paper and in proof assistants; is complete with respect to a large subset of classical Galois connections; and enables more general reasoning principles, including the "calculational" style advocated by Cousot.
To design constructive Galois connections we identify a restricted mode of use of classical ones which is both general and amenable to mechanization in dependently-typed functional programming languages. Crucial to our metatheory is the addition of monadic structure to Galois connections to control a "specification effect." Effectful calculations may reason classically, while pure calculations have extractable computational content. Explicitly moving between the worlds of specification and implementation is enabled by our metatheory.
To validate our approach, we provide two case studies in mechanizing existing proofs from the literature: the first uses calculational abstract interpretation to design a static analyzer; the second forms a semantic basis for gradual typing. Both mechanized proofs closely follow their original paper-and-pencil counterparts, employ reasoning principles not captured by previous mechanization approaches, support the extraction of verified algorithms, and are novel.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Gradual Liquid Type Inference
Authors:
Niki Vazou,
Éric Tanter,
David Van Horn
Abstract:
Liquid typing provides a decidable refinement inference mechanism that is convenient but subject to two major issues: (1) inference is global and requires top-level annotations, making it unsuitable for inference of modular code components and prohibiting its applicability to library code, and (2) inference failure results in obscure error messages. These difficulties seriously hamper the migratio…
▽ More
Liquid typing provides a decidable refinement inference mechanism that is convenient but subject to two major issues: (1) inference is global and requires top-level annotations, making it unsuitable for inference of modular code components and prohibiting its applicability to library code, and (2) inference failure results in obscure error messages. These difficulties seriously hamper the migration of existing code to use refinements. This paper shows that gradual liquid type inference---a novel combination of liquid inference and gradual refinement types---addresses both issues. Gradual refinement types, which support imprecise predicates that are optimistically interpreted, can be used in argument positions to constrain liquid inference so that the global inference process e effectively infers modular specifications usable for library components. Dually, when gradual refinements appear as the result of inference, they signal an inconsistency in the use of static refinements. Because liquid refinements are drawn from a nite set of predicates, in gradual liquid type inference we can enumerate the safe concretizations of each imprecise refinement, i.e. the static refinements that justify why a program is gradually well-typed. This enumeration is useful for static liquid type error explanation, since the safe concretizations exhibit all the potential inconsistencies that lead to static type errors. We develop the theory of gradual liquid type inference and explore its pragmatics in the setting of Liquid Haskell.
△ Less
Submitted 30 October, 2019; v1 submitted 5 July, 2018;
originally announced July 2018.
-
Functional Pearl: Theorem Proving for All (Equational Reasoning in Liquid Haskell)
Authors:
Niki Vazou,
Joachim Breitner,
Will Kunkel,
David Van Horn,
Graham Hutton
Abstract:
Equational reasoning is one of the key features of pure functional languages such as Haskell. To date, however, such reasoning always took place externally to Haskell, either manually on paper, or mechanised in a theorem prover. This article shows how equational reasoning can be performed directly and seamlessly within Haskell itself, and be checked using Liquid Haskell. In particular, language le…
▽ More
Equational reasoning is one of the key features of pure functional languages such as Haskell. To date, however, such reasoning always took place externally to Haskell, either manually on paper, or mechanised in a theorem prover. This article shows how equational reasoning can be performed directly and seamlessly within Haskell itself, and be checked using Liquid Haskell. In particular, language learners --- to whom external theorem provers are out of reach --- can benefit from having their proofs mechanically checked. Concretely, we show how the equational proofs and derivations from Graham's textbook can be recast as proofs in Haskell (spoiler: they look essentially the same).
△ Less
Submitted 9 June, 2018;
originally announced June 2018.
-
Soft Contract Verification for Higher-Order Stateful Programs
Authors:
Phuc C. Nguyen,
Thomas Gilray,
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract verification aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general as…
▽ More
Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract verification aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general assurance of correctness. Existing methods for static contract verification satisfy the needs of more restricted target languages, but fail to address the challenges unique to those conjoining untyped, dynamic programming, higher-order functions, modularity, and statefulness. Our approach tackles all these features at once, in the context of the full Racket system---a mature environment for stateful, higher-order, multi-paradigm programming with or without types. Evaluating our method using a set of both pure and stateful benchmarks, we are able to verify 99.94% of checks statically (all but 28 of 49, 861).
Stateful, higher-order functions pose significant challenges for static contract verification in particular. In the presence of these features, a modular analysis must permit code from the current module to escape permanently to an opaque context (unspecified code from outside the current module) that may be stateful and therefore store a reference to the escaped closure. Also, contracts themselves, being predicates wri en in unrestricted Racket, may exhibit stateful behavior; a sound approach must be robust to contracts which are arbitrarily expressive and interwoven with the code they monitor. In this paper, we present and evaluate our solution based on higher-order symbolic execution, explain the techniques we used to address such thorny issues, formalize a notion of behavioral approximation, and use it to provide a mechanized proof of soundness.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
Abstracting Definitional Interpreters
Authors:
David Darais,
Nicholas Labich,
Phuc C. Nguyen,
David Van Horn
Abstract:
In this functional pearl, we examine the use of definitional interpreters as a basis for abstract interpretation of higher-order programming languages. As it turns out, definitional interpreters, especially those written in monadic style, can provide a nice basis for a wide variety of collecting semantics, abstract interpretations, symbolic executions, and their intermixings.
But the real insigh…
▽ More
In this functional pearl, we examine the use of definitional interpreters as a basis for abstract interpretation of higher-order programming languages. As it turns out, definitional interpreters, especially those written in monadic style, can provide a nice basis for a wide variety of collecting semantics, abstract interpretations, symbolic executions, and their intermixings.
But the real insight of this story is a replaying of an insight from Reynold's landmark paper, Definitional Interpreters for Higher-Order Programming Languages, in which he observes definitional interpreters enable the defined-language to inherit properties of the defining-language. We show the same holds true for definitional abstract interpreters. Remarkably, we observe that abstract definitional interpreters can inherit the so-called "pushdown control flow" property, wherein function calls and returns are precisely matched in the abstract semantics, simply by virtue of the function call mechanism of the defining-language.
The first approaches to achieve this property for higher-order languages appeared within the last ten years, and have since been the subject of many papers. These approaches start from a state-machine semantics and uniformly involve significant technical engineering to recover the precision of pushdown control flow. In contrast, starting from a definitional interpreter, the pushdown control flow property is inherent in the meta-language and requires no further technical mechanism to achieve.
△ Less
Submitted 15 July, 2017;
originally announced July 2017.
-
A Vision for Online Verification-Validation
Authors:
Matthew A. Hammer,
Bor-Yuh Evan Chang,
David Van Horn
Abstract:
Today's programmers face a false choice between creating software that is extensible and software that is correct. Specifically, dynamic languages permit software that is richly extensible (via dynamic code loading, dynamic object extension, and various forms of reflection), and today's programmers exploit this flexibility to "bring their own language features" to enrich extensible languages (e.g.…
▽ More
Today's programmers face a false choice between creating software that is extensible and software that is correct. Specifically, dynamic languages permit software that is richly extensible (via dynamic code loading, dynamic object extension, and various forms of reflection), and today's programmers exploit this flexibility to "bring their own language features" to enrich extensible languages (e.g., by using common JavaScript libraries). Meanwhile, such library-based language extensions generally lack enforcement of their abstractions, leading to programming errors that are complex to avoid and predict.
To offer verification for this extensible world, we propose online verification-validation (OVV), which consists of language and VM design that enables a "phaseless" approach to program analysis, in contrast to the standard static-dynamic phase distinction. Phaseless analysis freely interposes abstract interpretation with concrete execution, allowing analyses to use dynamic (concrete) information to prove universal (abstract) properties about future execution.
In this paper, we present a conceptual overview of OVV through a motivating example program that uses a hypothetical database library. We present a generic semantics for OVV, and an extension to this semantics that offers a simple gradual type system for the database library primitives. The result of instantiating this gradual type system in an OVV setting is a checker that can progressively type successive continuations of the program until a continuation is fully verified. To evaluate the proposed vision of OVV for this example, we implement the VM semantics (in Rust), and show that this design permits progressive typing in this manner.
△ Less
Submitted 21 August, 2016;
originally announced August 2016.
-
Constructive Galois Connections: Taming the Galois Connection Framework for Mechanized Metatheory
Authors:
David Darais,
David Van Horn
Abstract:
Galois connections are a foundational tool for structuring abstraction in semantics and their use lies at the heart of the theory of abstract interpretation. Yet, mechanization of Galois connections remains limited to restricted modes of use, preventing their general application in mechanized metatheory and certified programming.
This paper presents constructive Galois connections, a variant of…
▽ More
Galois connections are a foundational tool for structuring abstraction in semantics and their use lies at the heart of the theory of abstract interpretation. Yet, mechanization of Galois connections remains limited to restricted modes of use, preventing their general application in mechanized metatheory and certified programming.
This paper presents constructive Galois connections, a variant of Galois connections that is effective both on paper and in proof assistants; is complete with respect to a large subset of classical Galois connections; and enables more general reasoning principles, including the "calculational" style advocated by Cousot.
To design constructive Galois connection we identify a restricted mode of use of classical ones which is both general and amenable to mechanization in dependently-typed functional programming languages. Crucial to our metatheory is the addition of monadic structure to Galois connections to control a "specification effect". Effectful calculations may reason classically, while pure calculations have extractable computational content. Explicitly moving between the worlds of specification and implementation is enabled by our metatheory.
To validate our approach, we provide two case studies in mechanizing existing proofs from the literature: one uses calculational abstract interpretation to design a static analyzer, the other forms a semantic basis for gradual typing. Both mechanized proofs closely follow their original paper-and-pencil counterparts, employ reasoning principles not captured by previous mechanization approaches, support the extraction of verified algorithms, and are novel.
△ Less
Submitted 26 October, 2016; v1 submitted 21 November, 2015;
originally announced November 2015.
-
Higher-order symbolic execution for contract verification and refutation
Authors:
Phuc C. Nguyen,
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
We present a new approach to automated reasoning about higher-order programs by endowing symbolic execution with a notion of higher-order, symbolic values. Our approach is sound and relatively complete with respect to a first-order solver for base type values. Therefore, it can form the basis of automated verification and bug-finding tools for higher-order programs.
To validate our approach, we…
▽ More
We present a new approach to automated reasoning about higher-order programs by endowing symbolic execution with a notion of higher-order, symbolic values. Our approach is sound and relatively complete with respect to a first-order solver for base type values. Therefore, it can form the basis of automated verification and bug-finding tools for higher-order programs.
To validate our approach, we use it to develop and evaluate a system for verifying and refuting behavioral software contracts of components in a functional language, which we call soft contract verification. In doing so, we discover a mutually beneficial relation between behavioral contracts and higher-order symbolic execution.
Our system uses higher-order symbolic execution, leveraging contracts as a source of symbolic values including unknown behavioral values, and employs an updatable heap of contract invariants to reason about flow-sensitive facts. Whenever a contract is refuted, it reports a concrete counterexample reproducing the error, which may involve solving for an unknown function. The approach is able to analyze first-class contracts, recursive data structures, unknown functions, and control-flow-sensitive refinements of values, which are all idiomatic in dynamic languages. It makes effective use of an off-the-shelf solver to decide problems without heavy encodings. The approach is competitive with a wide range of existing tools---including type systems, flow analyzers, and model checkers---on their own benchmarks. We have built a tool which analyzes programs written in Racket, and report on its effectiveness in verifying and refuting contracts.
△ Less
Submitted 20 March, 2016; v1 submitted 16 July, 2015;
originally announced July 2015.
-
Mechanically Verified Calculational Abstract Interpretation
Authors:
David Darais,
David Van Horn
Abstract:
Calculational abstract interpretation, long advocated by Cousot, is a technique for deriving correct-by-construction abstract interpreters from the formal semantics of programming languages.
This paper addresses the problem of deriving correct-by-verified-construction abstract interpreters with the use of a proof assistant. We identify several technical challenges to overcome with the aim of sup…
▽ More
Calculational abstract interpretation, long advocated by Cousot, is a technique for deriving correct-by-construction abstract interpreters from the formal semantics of programming languages.
This paper addresses the problem of deriving correct-by-verified-construction abstract interpreters with the use of a proof assistant. We identify several technical challenges to overcome with the aim of supporting verified calculational abstract interpretation that is faithful to existing pencil-and-paper proofs, supports calculation with Galois connections generally, and enables the extraction of verified static analyzers from these proofs. To meet these challenges, we develop a theory of Galois connections in monadic style that include a specification effect. Effectful calculations may reason classically, while pure calculations have extractable computational content. Moving between the worlds of specification and implementation is enabled by our metatheory.
To validate our approach, we give the first mechanically verified proof of correctness for Cousot's "Calculational design of a generic abstract interpreter." Our proof "by calculus" closely follows the original paper-and-pencil proof and supports the extraction of a verified static analyzer.
△ Less
Submitted 13 July, 2015;
originally announced July 2015.
-
Pushdown Control-Flow Analysis for Free
Authors:
Thomas Gilray,
Steven Lyde,
Michael D. Adams,
Matthew Might,
David Van Horn
Abstract:
Traditional control-flow analysis (CFA) for higher-order languages, whether implemented by constraint-solving or abstract interpretation, introduces spurious connections between callers and callees. Two distinct invocations of a function will necessarily pollute one another's return-flow. Recently, three distinct approaches have been published which provide perfect call-stack precision in a comput…
▽ More
Traditional control-flow analysis (CFA) for higher-order languages, whether implemented by constraint-solving or abstract interpretation, introduces spurious connections between callers and callees. Two distinct invocations of a function will necessarily pollute one another's return-flow. Recently, three distinct approaches have been published which provide perfect call-stack precision in a computable manner: CFA2, PDCFA, and AAC. Unfortunately, CFA2 and PDCFA are difficult to implement and require significant engineering effort. Furthermore, all three are computationally expensive; for a monovariant analysis, CFA2 is in $O(2^n)$, PDCFA is in $O(n^6)$, and AAC is in $O(n^9 log n)$.
In this paper, we describe a new technique that builds on these but is both straightforward to implement and computationally inexpensive. The crucial insight is an unusual state-dependent allocation strategy for the addresses of continuation. Our technique imposes only a constant-factor overhead on the underlying analysis and, with monovariance, costs only O(n3) in the worst case.
This paper presents the intuitions behind this development, a proof of the precision of this analysis, and benchmarks demonstrating its efficacy.
△ Less
Submitted 21 March, 2016; v1 submitted 11 July, 2015;
originally announced July 2015.
-
Incremental Computation with Names
Authors:
Matthew A. Hammer,
Jana Dunfield,
Kyle Headley,
Nicholas Labich,
Jeffrey S. Foster,
Michael Hicks,
David Van Horn
Abstract:
Over the past thirty years, there has been significant progress in developing general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design challenge in such approaches is how to provide efficient incremental support for a broad range of programs. In this paper, we argue that first-class na…
▽ More
Over the past thirty years, there has been significant progress in developing general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design challenge in such approaches is how to provide efficient incremental support for a broad range of programs. In this paper, we argue that first-class names are a critical linguistic feature for efficient incremental computation. Names identify computations to be reused across differing runs of a program, and making them first class gives programmers a high level of control over reuse. We demonstrate the benefits of names by presenting NOMINAL ADAPTON, an ML-like language for incremental computation with names. We describe how to use NOMINAL ADAPTON to efficiently incrementalize several standard programming patterns -- including maps, folds, and unfolds -- and show how to build efficient, incremental probabilistic trees and tries. Since NOMINAL ADAPTON's implementation is subtle, we formalize it as a core calculus and prove it is from-scratch consistent, meaning it always produces the same answer as simply re-running the computation. Finally, we demonstrate that NOMINAL ADAPTON can provide large speedups over both from-scratch computation and ADAPTON, a previous state-of-the-art incremental computation system.
△ Less
Submitted 23 March, 2021; v1 submitted 26 March, 2015;
originally announced March 2015.
-
Running Probabilistic Programs Backwards
Authors:
Neil Toronto,
Jay McCarthy,
David Van Horn
Abstract:
Many probabilistic programming languages allow programs to be run under constraints in order to carry out Bayesian inference. Running programs under constraints could enable other uses such as rare event simulation and probabilistic verification---except that all such probabilistic languages are necessarily limited because they are defined or implemented in terms of an impoverished theory of proba…
▽ More
Many probabilistic programming languages allow programs to be run under constraints in order to carry out Bayesian inference. Running programs under constraints could enable other uses such as rare event simulation and probabilistic verification---except that all such probabilistic languages are necessarily limited because they are defined or implemented in terms of an impoverished theory of probability. Measure-theoretic probability provides a more general foundation, but its generality makes finding computational content difficult.
We develop a measure-theoretic semantics for a first-order probabilistic language with recursion, which interprets programs as functions that compute preimages. Preimage functions are generally uncomputable, so we derive an abstract semantics. We implement the abstract semantics and use the implementation to carry out Bayesian inference, stochastic ray tracing (a rare event simulation), and probabilistic verification of floating-point error bounds.
△ Less
Submitted 16 January, 2015; v1 submitted 12 December, 2014;
originally announced December 2014.
-
Relatively Complete Counterexamples for Higher-Order Programs
Authors:
Phuc C. Nguyen,
David Van Horn
Abstract:
In this paper, we study the problem of generating inputs to a higher-order program causing it to error. We first study the problem in the setting of PCF, a typed, core functional language and contribute the first relatively complete method for constructing counterexamples for PCF programs. The method is relatively complete in the sense of Hoare logic; completeness is reduced to the completeness of…
▽ More
In this paper, we study the problem of generating inputs to a higher-order program causing it to error. We first study the problem in the setting of PCF, a typed, core functional language and contribute the first relatively complete method for constructing counterexamples for PCF programs. The method is relatively complete in the sense of Hoare logic; completeness is reduced to the completeness of a first-order solver over the base types of PCF. In practice, this means an SMT solver can be used for the effective, automated generation of higher-order counterexamples for a large class of programs.
We achieve this result by employing a novel form of symbolic execution for higher-order programs. The remarkable aspect of this symbolic execution is that even though symbolic higher-order inputs and values are considered, the path condition remains a first-order formula. Our handling of symbolic function application enables the reconstruction of higher-order counterexamples from this first-order formula.
After establishing our main theoretical results, we sketch how to apply the approach to untyped, higher-order, stateful languages with first-class contracts and show how counterexample generation can be used to detect contract violations in this setting. To validate our approach, we implement a tool generating counterexamples for erroneous modules written in Racket.
△ Less
Submitted 21 April, 2015; v1 submitted 14 November, 2014;
originally announced November 2014.
-
Galois Transformers and Modular Abstract Interpreters
Authors:
David Darais,
Matthew Might,
David Van Horn
Abstract:
The design and implementation of static analyzers has become increasingly systematic. Yet for a given language or analysis feature, it often requires tedious and error prone work to implement an analyzer and prove it sound. In short, static analysis features and their proofs of soundness do not compose well, causing a dearth of reuse in both implementation and metatheory.
We solve the problem of…
▽ More
The design and implementation of static analyzers has become increasingly systematic. Yet for a given language or analysis feature, it often requires tedious and error prone work to implement an analyzer and prove it sound. In short, static analysis features and their proofs of soundness do not compose well, causing a dearth of reuse in both implementation and metatheory.
We solve the problem of systematically constructing static analyzers by introducing Galois transformers: monad transformers that transport Galois connection properties. In concert with a monadic interpreter, we define a library of monad transformers that implement building blocks for classic analysis parameters like context, path, and heap (in)sensitivity. Moreover, these can be composed together independent of the language being analyzed.
Significantly, a Galois transformer can be proved sound once and for all, making it a reusable analysis component. As new analysis features and abstractions are developed and mixed in, soundness proofs need not be reconstructed, as the composition of a monad transformer stack is sound by virtue of its constituents. Galois transformers provide a viable foundation for reusable and composable metatheory for program analysis.
Finally, these Galois transformers shift the level of abstraction in analysis design and implementation to a level where non-specialists have the ability to synthesize sound analyzers over a number of parameters.
△ Less
Submitted 5 October, 2015; v1 submitted 14 November, 2014;
originally announced November 2014.
-
Pruning, Pushdown Exception-Flow Analysis
Authors:
Shuying Liang,
Weibin Sun,
Matthew Might,
Andy Keep,
David Van Horn
Abstract:
Statically reasoning in the presence of exceptions and about the effects of exceptions is challenging: exception-flows are mutually determined by traditional control-flow and points-to analyses. We tackle the challenge of analyzing exception-flows from two angles. First, from the angle of pruning control-flows (both normal and exceptional), we derive a pushdown framework for an object-oriented lan…
▽ More
Statically reasoning in the presence of exceptions and about the effects of exceptions is challenging: exception-flows are mutually determined by traditional control-flow and points-to analyses. We tackle the challenge of analyzing exception-flows from two angles. First, from the angle of pruning control-flows (both normal and exceptional), we derive a pushdown framework for an object-oriented language with full-featured exceptions. Unlike traditional analyses, it allows precise matching of throwers to catchers. Second, from the angle of pruning points-to information, we generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis. We then seamlessly weave the techniques into enhanced reachability computation, yielding highly precise exception-flow analysis, without becoming intractable, even for large applications. We evaluate our pruned, pushdown exception-flow analysis, comparing it with an established analysis on large scale standard Java benchmarks. The results show that our analysis significantly improves analysis precision over traditional analysis within a reasonable analysis time.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.
-
Pushdown flow analysis with abstract garbage collection
Authors:
J. Ian Johnson,
Ilya Sergey,
Christopher Earl,
Matthew Might,
David Van Horn
Abstract:
In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection push the boundaries of what we can learn about programs statically. This work illuminates and poses solutions to theoretical and practical challenges that stand in the way of combining the power of these techniques. Pushdown flow analysis grants unbounded yet computable polyvariance to the analysi…
▽ More
In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection push the boundaries of what we can learn about programs statically. This work illuminates and poses solutions to theoretical and practical challenges that stand in the way of combining the power of these techniques. Pushdown flow analysis grants unbounded yet computable polyvariance to the analysis of return-flow in higher-order programs. Abstract garbage collection grants unbounded polyvariance to abstract addresses which become unreachable between invocations of the abstract contexts in which they were created. Pushdown analysis solves the problem of precisely analyzing recursion in higher-order languages; abstract garbage collection is essential in solving the "stickiness" problem. Alone, our benchmarks demonstrate that each method can reduce analysis times and boost precision by orders of magnitude. We combine these methods. The challenge in marrying these techniques is not subtle: computing the reachable control states of a pushdown system relies on limiting access during transition to the top of the stack; abstract garbage collection, on the other hand, needs full access to the entire stack to compute a root set, just as concrete collection does. Conditional pushdown systems were developed for just such a conundrum, but existing methods are ill-suited for the dynamic nature of garbage collection. We show fully precise and approximate solutions to the feasible paths problem for pushdown garbage-collecting control-flow analysis. Experiments reveal synergistic interplay between garbage collection and pushdown techniques, and the fusion demonstrates "better-than-both-worlds" precision.
△ Less
Submitted 19 June, 2014;
originally announced June 2014.
-
Flow analysis, linearity, and PTIME
Authors:
David Van Horn,
Harry G. Mairson
Abstract:
Flow analysis is a ubiquitous and much-studied component of compiler technology---and its variations abound. Amongst the most well known is Shivers' 0CFA; however, the best known algorithm for 0CFA requires time cubic in the size of the analyzed program and is unlikely to be improved. Consequently, several analyses have been designed to approximate 0CFA by trading precision for faster computation.…
▽ More
Flow analysis is a ubiquitous and much-studied component of compiler technology---and its variations abound. Amongst the most well known is Shivers' 0CFA; however, the best known algorithm for 0CFA requires time cubic in the size of the analyzed program and is unlikely to be improved. Consequently, several analyses have been designed to approximate 0CFA by trading precision for faster computation. Henglein's simple closure analysis, for example, forfeits the notion of directionality in flows and enjoys an "almost linear" time algorithm. But in making trade-offs between precision and complexity, what has been given up and what has been gained? Where do these analyses differ and where do they coincide?
We identify a core language---the linear $λ$-calculus---where 0CFA, simple closure analysis, and many other known approximations or restrictions to 0CFA are rendered identical. Moreover, for this core language, analysis corresponds with (instrumented) evaluation. Because analysis faithfully captures evaluation, and because the linear $λ$-calculus is complete for PTIME, we derive PTIME-completeness results for all of these analyses.
△ Less
Submitted 22 November, 2013;
originally announced November 2013.
-
Deciding $k$CFA is complete for EXPTIME
Authors:
David Van Horn,
Harry G. Mairson
Abstract:
We give an exact characterization of the computational complexity of the $k$CFA hierarchy. For any $k > 0$, we prove that the control flow decision problem is complete for deterministic exponential time. This theorem validates empirical observations that such control flow analysis is intractable. It also provides more general insight into the complexity of abstract interpretation.
We give an exact characterization of the computational complexity of the $k$CFA hierarchy. For any $k > 0$, we prove that the control flow decision problem is complete for deterministic exponential time. This theorem validates empirical observations that such control flow analysis is intractable. It also provides more general insight into the complexity of abstract interpretation.
△ Less
Submitted 22 November, 2013;
originally announced November 2013.
-
The Complexity of Flow Analysis in Higher-Order Languages
Authors:
David Van Horn
Abstract:
This dissertation proves lower bounds on the inherent difficulty of deciding flow analysis problems in higher-order programming languages. We give exact characterizations of the computational complexity of 0CFA, the $k$CFA hierarchy, and related analyses. In each case, we precisely capture both the expressiveness and feasibility of the analysis, identifying the elements responsible for the trade-o…
▽ More
This dissertation proves lower bounds on the inherent difficulty of deciding flow analysis problems in higher-order programming languages. We give exact characterizations of the computational complexity of 0CFA, the $k$CFA hierarchy, and related analyses. In each case, we precisely capture both the expressiveness and feasibility of the analysis, identifying the elements responsible for the trade-off.
0CFA is complete for polynomial time. This result relies on the insight that when a program is linear (each bound variable occurs exactly once), the analysis makes no approximation; abstract and concrete interpretation coincide, and therefore pro- gram analysis becomes evaluation under another guise. Moreover, this is true not only for 0CFA, but for a number of further approximations to 0CFA. In each case, we derive polynomial time completeness results.
For any $k > 0$, $k$CFA is complete for exponential time. Even when $k = 1$, the distinction in binding contexts results in a limited form of closures, which do not occur in 0CFA. This theorem validates empirical observations that $k$CFA is intractably slow for any $k > 0$. There is, in the worst case---and plausibly, in practice---no way to tame the cost of the analysis. Exponential time is required. The empirically observed intractability of this analysis can be understood as being inherent in the approximation problem being solved, rather than reflecting unfortunate gaps in our programming abilities.
△ Less
Submitted 19 November, 2013;
originally announced November 2013.
-
Resolving and Exploiting the $k$-CFA Paradox
Authors:
Matthew Might,
Yannis Smaragdakis,
David Van Horn
Abstract:
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of…
▽ More
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities.
Recent results indicate that the relationship between the functional and object-oriented incarnations of $k$-CFA is not as well understood as thought. Van Horn and Mairson proved $k$-CFA for $k \geq 1$ to be EXPTIME-complete; hence, no polynomial-time algorithm can exist. Yet, there are several polynomial-time formulations of context-sensitive points-to analyses in object-oriented languages. Thus, it seems that functional $k$-CFA may actually be a profoundly different analysis from object-oriented $k$-CFA. We resolve this paradox by showing that the exact same specification of $k$-CFA is polynomial-time for object-oriented languages yet exponential- time for functional ones: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity and complexity. This illumination leads to an immediate payoff: by projecting the object-oriented treatment of objects onto closures, we derive a polynomial-time hierarchy of context-sensitive CFAs for functional programs.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.
-
Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation
Authors:
Shuying Liang,
Andrew W. Keep,
Matthew Might,
Steven Lyde,
Thomas Gilray,
Petey Aldous,
David Van Horn
Abstract:
We present Anadroid, a static malware analysis framework for Android apps. Anadroid exploits two techniques to soundly raise precision: (1) it uses a pushdown system to precisely model dynamically dispatched interprocedural and exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to soundly approximate all possible interleavings of asynchronous entry points in Android applicatio…
▽ More
We present Anadroid, a static malware analysis framework for Android apps. Anadroid exploits two techniques to soundly raise precision: (1) it uses a pushdown system to precisely model dynamically dispatched interprocedural and exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to soundly approximate all possible interleavings of asynchronous entry points in Android applications. (It also integrates static taint-flow analysis and least permissions analysis to expand the class of malicious behaviors which it can catch.) Anadroid provides rich user interface support for human analysts which must ultimately rule on the "maliciousness" of a behavior.
To demonstrate the effectiveness of Anadroid's malware analysis, we had teams of analysts analyze a challenge suite of 52 Android applications released as part of the Auto- mated Program Analysis for Cybersecurity (APAC) DARPA program. The first team analyzed the apps using a ver- sion of Anadroid that uses traditional (finite-state-machine-based) control-flow-analysis found in existing malware analysis tools; the second team analyzed the apps using a version of Anadroid that uses our enhanced pushdown-based control-flow-analysis. We measured machine analysis time, human analyst time, and their accuracy in flagging malicious applications. With pushdown analysis, we found statistically significant (p < 0.05) decreases in time: from 85 minutes per app to 35 minutes per app in human plus machine analysis time; and statistically significant (p < 0.05) increases in accuracy with the pushdown-driven analyzer: from 71% correct identification to 95% correct identification.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.
-
AnaDroid: Malware Analysis of Android with User-supplied Predicates
Authors:
Shuying Liang,
Matthew Might,
David Van Horn
Abstract:
Today's mobile platforms provide only coarse-grained permissions to users with regard to how third- party applications use sensitive private data. Unfortunately, it is easy to disguise malware within the boundaries of legitimately-granted permissions. For instance, granting access to "contacts" and "internet" may be necessary for a text-messaging application to function, even though the user does…
▽ More
Today's mobile platforms provide only coarse-grained permissions to users with regard to how third- party applications use sensitive private data. Unfortunately, it is easy to disguise malware within the boundaries of legitimately-granted permissions. For instance, granting access to "contacts" and "internet" may be necessary for a text-messaging application to function, even though the user does not want contacts transmitted over the internet. To understand fine-grained application use of permissions, we need to statically analyze their behavior. Even then, malware detection faces three hurdles: (1) analyses may be prohibitively expensive, (2) automated analyses can only find behaviors that they are designed to find, and (3) the maliciousness of any given behavior is application-dependent and subject to human judgment. To remedy these issues, we propose semantic-based program analysis, with a human in the loop as an alternative approach to malware detection. In particular, our analysis allows analyst-crafted semantic predicates to search and filter analysis results. Human-oriented semantic-based program analysis can systematically, quickly and concisely characterize the behaviors of mobile applications. We describe a tool that provides analysts with a library of the semantic predicates and the ability to dynamically trade speed and precision. It also provides analysts the ability to statically inspect details of every suspicious state of (abstract) execution in order to make a ruling as to whether or not the behavior is truly malicious with respect to the intent of the application. In addition, permission and profiling reports are generated to aid analysts in identifying common malicious behaviors.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.
-
Soft Contract Verification
Authors:
Phuc C. Nguyen,
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
Behavioral software contracts are a widely used mechanism for governing the flow of values between components. However, run-time monitoring and enforcement of contracts imposes significant overhead and delays discovery of faulty components to run-time.
To overcome these issues, we present soft contract verification, which aims to statically prove either complete or partial contract correctness o…
▽ More
Behavioral software contracts are a widely used mechanism for governing the flow of values between components. However, run-time monitoring and enforcement of contracts imposes significant overhead and delays discovery of faulty components to run-time.
To overcome these issues, we present soft contract verification, which aims to statically prove either complete or partial contract correctness of components, written in an untyped, higher-order language with first-class contracts. Our approach uses higher-order symbolic execution, leveraging contracts as a source of symbolic values including unknown behavioral values, and employs an updatable heap of contract invariants to reason about flow-sensitive facts. We prove the symbolic execution soundly approximates the dynamic semantics and that verified programs can't be blamed.
The approach is able to analyze first-class contracts, recursive data structures, unknown functions, and control-flow-sensitive refinements of values, which are all idiomatic in dynamic languages. It makes effective use of an off-the-shelf solver to decide problems without heavy encodings. The approach is competitive with a wide range of existing tools---including type systems, flow analyzers, and model checkers---on their own benchmarks.
△ Less
Submitted 16 June, 2014; v1 submitted 23 July, 2013;
originally announced July 2013.
-
From Principles to Practice with Class in the First Year
Authors:
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
We propose a bridge between functional and object-oriented programming in the first-year curriculum. Traditionally, curricula that begin with functional programming transition to a professional, usually object-oriented, language in the second course. This transition poses obstacles for students, and often results in confusing the details of development environments, syntax, and libraries with the…
▽ More
We propose a bridge between functional and object-oriented programming in the first-year curriculum. Traditionally, curricula that begin with functional programming transition to a professional, usually object-oriented, language in the second course. This transition poses obstacles for students, and often results in confusing the details of development environments, syntax, and libraries with the fundamentals of OO programming that the course should focus on. Instead, we propose to begin the second course with a sequence of custom teaching languages which minimize the transition from the first course, and allow students to focus on core ideas. After working through the sequence of pedagogical languages, we then transition to Java, at which point students have a strong command of the basic principles. We have 3 years of experience with this course, with notable success.
△ Less
Submitted 10 December, 2013; v1 submitted 19 June, 2013;
originally announced June 2013.
-
Abstracting Abstract Control (Extended)
Authors:
J. Ian Johnson,
David Van Horn
Abstract:
The strength of a dynamic language is also its weakness: run-time flexibility comes at the cost of compile-time predictability. Many of the hallmarks of dynamic languages such as closures, continuations, various forms of reflection, and a lack of static types make many programmers rejoice, while compiler writers, tool developers, and verification engineers lament. The dynamism of these features si…
▽ More
The strength of a dynamic language is also its weakness: run-time flexibility comes at the cost of compile-time predictability. Many of the hallmarks of dynamic languages such as closures, continuations, various forms of reflection, and a lack of static types make many programmers rejoice, while compiler writers, tool developers, and verification engineers lament. The dynamism of these features simply confounds statically reasoning about programs that use them. Consequently, static analyses for dynamic languages are few, far between, and seldom sound.
The "abstracting abstract machines" (AAM) approach to constructing static analyses has recently been proposed as a method to ameliorate the difficulty of designing analyses for such language features. The approach, so called because it derives a function for the sound and computable approximation of program behavior starting from the abstract machine semantics of a language, provides a viable approach to dynamic language analysis since all that is required is a machine description of the interpreter.
The original AAM recipe produces finite state abstractions, which cannot faithfully represent an interpreter's control stack. Recent advances have shown that higher-order programs can be approximated with pushdown systems. However, these automata theoretic models either break down on features that inspect or modify the control stack.
In this paper, we tackle the problem of bringing pushdown flow analysis to the domain of dynamic language features. We revise the abstracting abstract machines technique to target the stronger computational model of pushdown systems. In place of automata theory, we use only abstract machines and memoization. As case studies, we show the technique applies to a language with closures, garbage collection, stack-inspection, and first-class composable continuations.
△ Less
Submitted 14 August, 2014; v1 submitted 14 May, 2013;
originally announced May 2013.
-
Pushdown Exception-Flow Analysis of Object-Oriented Programs
Authors:
Shuying Liang,
Matthew Might,
Thomas Gilray,
David Van Horn
Abstract:
Statically reasoning in the presence of and about exceptions is challenging: exceptions worsen the well-known mutual recursion between data-flow and control-flow analysis. The recent development of pushdown control-flow analysis for the λ-calculus hints at a way to improve analysis of exceptions: a pushdown stack can precisely match catches to throws in the same way it matches returns to calls. Th…
▽ More
Statically reasoning in the presence of and about exceptions is challenging: exceptions worsen the well-known mutual recursion between data-flow and control-flow analysis. The recent development of pushdown control-flow analysis for the λ-calculus hints at a way to improve analysis of exceptions: a pushdown stack can precisely match catches to throws in the same way it matches returns to calls. This work generalizes pushdown control-flow analysis to object-oriented programs and to exceptions. Pushdown analysis of exceptions improves precision over the next best analysis, Bravenboer and Smaragdakis's Doop, by orders of magnitude. By then generalizing abstract garbage collection to object-oriented programs, we reduce analysis time by half over pure pushdown analysis. We evaluate our implementation for Dalvik bytecode on standard benchmarks as well as several Android applications.
△ Less
Submitted 11 February, 2013;
originally announced February 2013.
-
Optimizing Abstract Abstract Machines
Authors:
J. Ian Johnson,
Nicholas Labich,
Matthew Might,
David Van Horn
Abstract:
The technique of abstracting abstract machines (AAM) provides a systematic approach for deriving computable approximations of evaluators that are easily proved sound. This article contributes a complementary step-by-step process for subsequently going from a naive analyzer derived under the AAM approach, to an efficient and correct implementation. The end result of the process is a two to three or…
▽ More
The technique of abstracting abstract machines (AAM) provides a systematic approach for deriving computable approximations of evaluators that are easily proved sound. This article contributes a complementary step-by-step process for subsequently going from a naive analyzer derived under the AAM approach, to an efficient and correct implementation. The end result of the process is a two to three order-of-magnitude improvement over the systematically derived analyzer, making it competitive with hand-optimized implementations that compute fundamentally less precise results.
△ Less
Submitted 24 July, 2013; v1 submitted 15 November, 2012;
originally announced November 2012.
-
Introspective Pushdown Analysis of Higher-Order Programs
Authors:
Christopher Earl,
Ilya Sergey,
Matthew Might,
David Van Horn
Abstract:
In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection skirt just inside the boundaries of soundness and decidability. Alone, each method reduces analysis times and boosts precision by orders of magnitude. This work illuminates and conquers the theoretical challenges that stand in the way of combining the power of these techniques. The challenge in ma…
▽ More
In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection skirt just inside the boundaries of soundness and decidability. Alone, each method reduces analysis times and boosts precision by orders of magnitude. This work illuminates and conquers the theoretical challenges that stand in the way of combining the power of these techniques. The challenge in marrying these techniques is not subtle: computing the reachable control states of a pushdown system relies on limiting access during transition to the top of the stack; abstract garbage collection, on the other hand, needs full access to the entire stack to compute a root set, just as concrete collection does. \emph{Introspective} pushdown systems resolve this conflict. Introspective pushdown systems provide enough access to the stack to allow abstract garbage collection, but they remain restricted enough to compute control-state reachability, thereby enabling the sound and precise product of pushdown analysis and abstract garbage collection. Experiments reveal synergistic interplay between the techniques, and the fusion demonstrates "better-than-both-worlds" precision.
△ Less
Submitted 7 July, 2012;
originally announced July 2012.
-
Pushdown Abstractions of JavaScript
Authors:
David Van Horn,
Matthew Might
Abstract:
We design a family of program analyses for JavaScript that make no approximation in matching calls with returns, exceptions with handlers, and breaks with labels. We do so by starting from an established reduction semantics for JavaScript and systematically deriving its intensional abstract interpretation. Our first step is to transform the semantics into an equivalent low-level abstract machine:…
▽ More
We design a family of program analyses for JavaScript that make no approximation in matching calls with returns, exceptions with handlers, and breaks with labels. We do so by starting from an established reduction semantics for JavaScript and systematically deriving its intensional abstract interpretation. Our first step is to transform the semantics into an equivalent low-level abstract machine: the JavaScript Abstract Machine (JAM). We then give an infinite-state yet decidable pushdown machine whose stack precisely models the structure of the concrete program stack. The precise model of stack structure in turn confers precise control-flow analysis even in the presence of control effects, such as exceptions and finally blocks. We give pushdown generalizations of traditional forms of analysis such as k-CFA, and prove the pushdown framework for abstract interpretation is sound and computable.
△ Less
Submitted 20 December, 2011; v1 submitted 20 September, 2011;
originally announced September 2011.
-
Systematic Abstraction of Abstract Machines
Authors:
David Van Horn,
Matthew Might
Abstract:
We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines for higher-order and imperative programming languages. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine's machine, and the stack-inspecting CM…
▽ More
We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines for higher-order and imperative programming languages. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine's machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique of store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection. In order to close the gap between formalism and implementation, we provide translations of the mathematics as running Haskell code for the initial development of our method.
△ Less
Submitted 18 July, 2011;
originally announced July 2011.
-
Abstracting Abstract Machines: A Systematic Approach to Higher-Order Program Analysis
Authors:
David Van Horn,
Matthew Might
Abstract:
Predictive models are fundamental to engineering reliable software systems. However, designing conservative, computable approximations for the behavior of programs (static analyses) remains a difficult and error-prone process for modern high-level programming languages. What analysis designers need is a principled method for navigating the gap between semantics and analytic models: analysis design…
▽ More
Predictive models are fundamental to engineering reliable software systems. However, designing conservative, computable approximations for the behavior of programs (static analyses) remains a difficult and error-prone process for modern high-level programming languages. What analysis designers need is a principled method for navigating the gap between semantics and analytic models: analysis designers need a method that tames the interaction of complex languages features such as higher-order functions, recursion, exceptions, continuations, objects and dynamic allocation.
We contribute a systematic approach to program analysis that yields novel and transparently sound static analyses. Our approach relies on existing derivational techniques to transform high-level language semantics into low-level deterministic state-transition systems (with potentially infinite state spaces). We then perform a series of simple machine refactorings to obtain a sound, computable approximation, which takes the form of a non-deterministic state-transition systems with finite state spaces. The approach scales up uniformly to enable program analysis of realistic language features, including higher-order functions, tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection.
△ Less
Submitted 9 May, 2011;
originally announced May 2011.
-
Semantic Solutions to Program Analysis Problems
Authors:
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
Problems in program analysis can be solved by developing novel program semantics and deriving abstractions conventionally. For over thirty years, higher-order program analysis has been sold as a hard problem. Its solutions have required ingenuity and complex models of approximation. We claim that this difficulty is due to premature focus on abstraction and propose a new approach that emphasizes se…
▽ More
Problems in program analysis can be solved by developing novel program semantics and deriving abstractions conventionally. For over thirty years, higher-order program analysis has been sold as a hard problem. Its solutions have required ingenuity and complex models of approximation. We claim that this difficulty is due to premature focus on abstraction and propose a new approach that emphasizes semantics. Its simplicity enables new analyses that are beyond the current state of the art.
△ Less
Submitted 30 April, 2011;
originally announced May 2011.
-
A family of abstract interpretations for static analysis of concurrent higher-order programs
Authors:
Matthew Might,
David Van Horn
Abstract:
We develop a framework for computing two foundational analyses for concurrent higher-order programs: (control-)flow analysis (CFA) and may-happen-in-parallel analysis (MHP). We pay special attention to the unique challenges posed by the unrestricted mixture of first-class continuations and dynamically spawned threads. To set the stage, we formulate a concrete model of concurrent higher-order progr…
▽ More
We develop a framework for computing two foundational analyses for concurrent higher-order programs: (control-)flow analysis (CFA) and may-happen-in-parallel analysis (MHP). We pay special attention to the unique challenges posed by the unrestricted mixture of first-class continuations and dynamically spawned threads. To set the stage, we formulate a concrete model of concurrent higher-order programs: the P(CEK*)S machine. We find that the systematic abstract interpretation of this machine is capable of computing both flow and MHP analyses. Yet, a closer examination finds that the precision for MHP is poor. As a remedy, we adapt a shape analytic technique-singleton abstraction-to dynamically spawned threads (as opposed to objects in the heap). We then show that if MHP analysis is not of interest, we can substantially accelerate the computation of flow analysis alone by collapsing thread interleavings with a second layer of abstraction.
△ Less
Submitted 14 June, 2011; v1 submitted 26 March, 2011;
originally announced March 2011.
-
Higher-Order Symbolic Execution via Contracts
Authors:
Sam Tobin-Hochstadt,
David Van Horn
Abstract:
We present a new approach to automated reasoning about higher-order programs by extending symbolic execution to use behavioral contracts as symbolic values, enabling symbolic approximation of higher-order behavior.
Our approach is based on the idea of an abstract reduction semantics that gives an operational semantics to programs with both concrete and symbolic components. Symbolic components ar…
▽ More
We present a new approach to automated reasoning about higher-order programs by extending symbolic execution to use behavioral contracts as symbolic values, enabling symbolic approximation of higher-order behavior.
Our approach is based on the idea of an abstract reduction semantics that gives an operational semantics to programs with both concrete and symbolic components. Symbolic components are approximated by their contract and our semantics gives an operational interpretation of contracts-as-values. The result is a executable semantics that soundly predicts program behavior, including contract failures, for all possible instantiations of symbolic components. We show that our approach scales to an expressive language of contracts including arbitrary programs embedded as predicates, dependent function contracts, and recursive contracts. Supporting this feature-rich language of specifications leads to powerful symbolic reasoning using existing program assertions.
We then apply our approach to produce a verifier for contract correctness of components, including a sound and computable approximation to our semantics that facilitates fully automated contract verification. Our implementation is capable of verifying contracts expressed in existing programs, and of justifying valuable contract-elimination optimizations.
△ Less
Submitted 26 April, 2012; v1 submitted 7 March, 2011;
originally announced March 2011.
-
Evaluating Call-By-Need on the Control Stack
Authors:
Stephen Chang,
David Van Horn,
Matthias Felleisen
Abstract:
Ariola and Felleisen's call-by-need λ-calculus replaces a variable occurrence with its value at the last possible moment. To support this gradual notion of substitution, function applications-once established-are never discharged. In this paper we show how to translate this notion of reduction into an abstract machine that resolves variable references via the control stack. In particular, the mach…
▽ More
Ariola and Felleisen's call-by-need λ-calculus replaces a variable occurrence with its value at the last possible moment. To support this gradual notion of substitution, function applications-once established-are never discharged. In this paper we show how to translate this notion of reduction into an abstract machine that resolves variable references via the control stack. In particular, the machine uses the static address of a variable occurrence to extract its current value from the dynamic control stack.
△ Less
Submitted 16 September, 2010;
originally announced September 2010.
-
Stack-Summarizing Control-Flow Analysis of Higher-Order Programs
Authors:
Christopher Earl,
Matthew Might,
David Van Horn
Abstract:
Two sinks drain precision from higher-order flow analyses: (1) merging of argument values upon procedure call and (2) merging of return values upon procedure return. To combat the loss of precision, these two sinks have been addressed independently. In the case of procedure calls, abstract garbage collection reduces argument merging; while in the case of procedure returns, context-free approache…
▽ More
Two sinks drain precision from higher-order flow analyses: (1) merging of argument values upon procedure call and (2) merging of return values upon procedure return. To combat the loss of precision, these two sinks have been addressed independently. In the case of procedure calls, abstract garbage collection reduces argument merging; while in the case of procedure returns, context-free approaches eliminate return value merging. It is natural to expect a combined analysis could enjoy the mutually beneficial interaction between the two approaches. The central contribution of this work is a direct product of abstract garbage collection with context-free analysis. The central challenge to overcome is the conflict between the core constraint of a pushdown system and the needs of garbage collection: a pushdown system can only see the top of the stack, yet garbage collection needs to see the entire stack during a collection. To make the direct product computable, we develop "stack summaries," a method for tracking stack properties at each control state in a pushdown analysis of higher-order programs.
△ Less
Submitted 8 September, 2010;
originally announced September 2010.
-
Abstracting Abstract Machines
Authors:
David Van Horn,
Matthew Might
Abstract:
We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine's machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interp…
▽ More
We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine's machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique we call store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection.
△ Less
Submitted 7 September, 2010; v1 submitted 26 July, 2010;
originally announced July 2010.
-
Pushdown Control-Flow Analysis of Higher-Order Programs
Authors:
Christopher Earl,
Matthew Might,
David Van Horn
Abstract:
Context-free approaches to static analysis gain precision over classical approaches by perfectly matching returns to call sites---a property that eliminates spurious interprocedural paths. Vardoulakis and Shivers's recent formulation of CFA2 showed that it is possible (if expensive) to apply context-free methods to higher-order languages and gain the same boost in precision achieved over first-ord…
▽ More
Context-free approaches to static analysis gain precision over classical approaches by perfectly matching returns to call sites---a property that eliminates spurious interprocedural paths. Vardoulakis and Shivers's recent formulation of CFA2 showed that it is possible (if expensive) to apply context-free methods to higher-order languages and gain the same boost in precision achieved over first-order programs.
To this young body of work on context-free analysis of higher-order programs, we contribute a pushdown control-flow analysis framework, which we derive as an abstract interpretation of a CESK machine with an unbounded stack. One instantiation of this framework marks the first polyvariant pushdown analysis of higher-order programs; another marks the first polynomial-time analysis. In the end, we arrive at a framework for control-flow analysis that can efficiently compute pushdown generalizations of classical control-flow analyses.
△ Less
Submitted 24 July, 2010;
originally announced July 2010.