Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Mike Dodds
About me
Posts
Function Argument Nullability Using an LLM (Galois blog)
Published:
Galois blog post: Link
Generative AI for Specifications (Galois blog)
Published:
Galois blog post: Link
Galois / Twisp: Avoiding Foolishness in Distributed Systems (Galois blog)
Published:
Galois blog post: Link
The Impact of Provable Security: AWS and Supranational (Galois blog)
Published:
Galois blog post: Link
Building a Concurrency Verifier Using Crucible (Galois blog)
Published:
Galois blog post: Link
Who is verifying their cryptographic protocols? (Galois blog)
Published:
Galois blog post: Link
QUIC Testing, a Quick Replication (Galois blog)
Published:
Galois blog post: Link
Proofs Should Repair Themselves (Galois blog)
Published:
Galois blog post: Link
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
Extending C for Checking Shape Safety
Published in Workshop on Graph Transformation for Verification and Concurrency (GT-VC), 2005
Abstract: The project Safe Pointers by Graph Transformation at the University of York has developed a method for specifying the shape of pointer-data structures by graph reduction, and a static checking algorithm for proving the shape safety of graph transformation rules modelling operations on pointer structures. In this paper, we outline how to apply this approach to the C programming language. We extend ANSI C with so-called transformers which model graph transformation rules, and with shape specifications for pointer structures. For the resulting language C-GRS, we present both a translation to C and and an abstraction to graph transformation. Our main result is that the abstraction of transformers to graph transformation rules is correct in that the C code implementing transformers is compatible with the semantics of graph transformation.
Using Trace Data to Diagnose Non-Termination Errors
Published in Hat Day 2005: work in progress on the Hat tracing system for Haskell (Technical Report YCS-2005-395), 2005
Abstract: This paper discusses black-hat and hat-nonterm, two tools for locating and diagnosing non-termination errors in Haskell programs. Both of these tools give a small trace which is intended to illuminate the cause of the non-termination error. black-hat analyses programs which contain black holes, a particularly restricted kind of non-termination error, while hat-nonterm applies the approach used in black-hat to more general non-terminating programs. This paper discusses the traces generated by black-hat and hat-nonterm, as well as the approach used to generate these traces.
Graph Transformation in Constant Time
Published in International Conference on Graph Transformation (ICGT), 2006
Abstract: We present conditions under which graph transformation rules can be applied in time independent of the size of the input graph: graphs must contain a unique root label, nodes in the left-hand sides of rules must be reachable from the root, and nodes must have a bounded outdegree. We establish a constant upper bound for the time needed to construct all graphs resulting from an application of a fixed rule to an input graph. We also give an improved upper bound under the stronger condition that all edges outgoing from a node must have distinct labels. Then this result is applied to identify a class of graph reduction systems that define graph languages with a linear membership test. In a case study we prove that the (non-context-free) language of balanced binary trees with backpointers belongs to this class.
PhD Thesis: Graph Transformation and Pointer Structures
Published in University of York, Department of Computer Science, 2008
Abstract: This thesis is concerned with the use of graph-transformation rules to specify and manipulate pointer structures. In it, we show that graph transformation can form the basis of a practical and well-formalised approach to specifying pointer properties. We also show that graph transformation rules can be used as an efficient mechanism for checking the properties of graphs.
From Hyperedge Replacement to Separation Logic and Back
Published in International Conference on Graph Transformation (ICGT) - Doctoral Symposium, 2008
Abstract: Hyperedge-replacement grammars and separation-logic formulas both define classes of graph-like structures. In this paper, we relate the different formalisms by effectively translating restricted hyperedge-replacement grammars into formulas of a fragment of separation-logic with recursive predicates, and vice versa. The translations preserve the classes of specified graphs, and hence the two approaches are of equivalent power. It follows that our fragment of separation-logic inherits properties of hyperedge-replacement grammars, such as inexpressibility results. We also show that several operators of full separation logic cannot be expressed using hyperedge replacement.
Deny-Guarantee Reasoning
Published in European Symposium on Programming (ESOP), 2009
Abstract: Rely-guarantee is a well-established approach to reasoning about concurrent programs that use parallel composition. However, parallel composition is not how concurrency is structured in real systems. Instead, threads are started by ‘fork’ and collected with ‘join’ commands. This style of concurrency cannot be reasoned about using rely-guarantee, as the life-time of a thread can be scoped dynamically. With parallel composition the scope is static.
Explicit Stabilisation for Modular Rely-Guarantee Reasoning
Published in European Symposium on Programming (ESOP), 2010
Abstract: We propose a new formalisation of stability for Rely-Guarantee, in which an assertion’s stability is encoded into its syntactic form. This allows two advances in modular reasoning. Firstly, it enables Rely-Guarantee, for the first time, to verify concurrent libraries independently of their clients’ environments. Secondly, in a sequential setting, it allows a module’s internal interference to be hidden while verifying its clients. We demonstrate our approach by verifying, using RGSep, the Version 7 Unix memory manager, uncovering a twenty-year-old bug in the process.
Concurrent Abstract Predicates
Published in European Conference on Object-Oriented Programming (ECOOP), 2010
Abstract: Abstraction is key to understanding and reasoning about large computer systems. Abstraction is simple to achieve if the relevant data structures are disjoint, but rather difficult when they are partially shared, as is often the case for concurrent modules. We present a program logic for reasoning abstractly about data structures that provides a fiction of disjointness and permits compositional reasoning. The internal details of a module are completely hidden from the client by concurrent abstract predicates. We reason about a module’s implementation using separation logic with permissions, and provide abstract specifications for use by client programs using concurrent abstract predicates. We illustrate our abstract reasoning by building two implementations of a lock module on top of hardware instructions, and two implementations of a concurrent set module on top of the lock module.
Modular reasoning for deterministic parallelism
Published in Principles of Programming Languages (POPL), 2011
Abstract: Weaving a concurrency control protocol into a program is difficult and error-prone. One way to alleviate this burden is deterministic parallelism. In this well-studied approach to parallelisation, a sequential program is annotated with sections that can execute concurrently, with automatically injected control constructs used to ensure observable behaviour consistent with the original program.
Automatic safety proofs for asynchronous memory operations
Published in Principles and practice of parallel programming (PPoPP), 2011
Abstract: We present a work-in-progress proof system and tool, based on separation logic, for analysing memory safety of multicore programs that use asynchronous memory operations.
coreStar: the core of jStar
Published in 1st International Workshop on Intermediate Verification Languages (Boogie 2011), 2011
Abstract: Separation logic is a promising approach to program verification. However, currently there is no shared infrastructure for building verification tools. This increases the time to build and experiment with new ideas. In this paper, we outline coreStar, the verification framework underlying jStar. Our aim is to provide basic support for developing separation logic tools. This paper shows how a language can be encoded into coreStar, and gives details of how coreStar works to enable extensions.
jStar-eclipse: an IDE for automated verification of Java programs
Published in Principles and practice of parallel programming (PPoPP), 2011
Abstract: jStar is a tool for automatically verifying Java programs. It uses separation logic to support abstract reasoning about object specifications. jStar can verify a number of challenging design patterns, including Subject/Observer, Visitor, Factory and Pooling. However, to use jStar one has to deal with a family of command-line tools that expect specifications in separate files and diagnose the errors by inspecting the text output from these tools.
A simple abstraction for complex concurrent indexes
Published in Object oriented Programming Systems Languages and Applications (OOPSLA), 2011
Abstract: Indexes are ubiquitous. Examples include associative arrays, dictionaries, maps and hashes used in applications such as databases, file systems and dynamic languages. Abstractly, a sequential index can be viewed as a partial function from keys to values. Values can be queried by their keys, and the index can be mutated by adding or removing mappings. Whilst appealingly simple, this abstract specification is insufficient for reasoning about indexes accessed concurrently. We present an abstract specification for concurrent indexes. We verify several representative concurrent client applications using our specification, demonstrating that clients can reason abstractly without having to consider specific underlying implementations. Our specification would, however, mean nothing if it were not satisfied by standard implementations of concurrent indexes. We verify that our specification is satisfied by algorithms based on linked lists, hash tables and B-Link trees. The complexity of these algorithms, in particular the B-Link tree algorithm, can be completely hidden from the client’s view by our abstract specification.
Safe asynchronous multicore memory operations
Published in Automated Software Engineering (ASE), 2011
Abstract: Asynchronous memory operations provide a means for coping with the memory wall problem in multicore processors, and are available in many platforms and languages, e.g., the Cell Broadband Engine, CUDA and OpenCL. Reasoning about the correct usage of such operations involves complex analysis of memory accesses to check for races. We present a method and tool for proving memory-safety and race-freedom of multicore programs that use asynchronous memory operations. Our approach uses separation logic with permissions, and our tool automates this method, targeting a C-like core language. We describe our solutions to several challenges that arose in the course of this research. These include: syntactic reasoning about permissions and arrays, integration of numerical abstract domains, and utilization of an SMT solver. We demonstrate the feasibility of our approach experimentally by checking absence of DMA races on a set of programs drawn from the IBM Cell SDK.
Resource-sensitive synchronization inference by abduction
Published in Principals of Programming Languages (POPL), 2012
Abstract: We present an analysis which takes as its input a sequential program, augmented with annotations indicating potential parallelization opportunities, and a sequential proof, written in separation logic, and produces a correctly-synchronized parallelized program and proof of that program. Unlike previous work, ours is not an independence analysis; we insert synchronization constructs to preserve relevant dependencies found in the sequential program that may otherwise be violated by a naive translation. Separation logic allows us to parallelize fine-grained patterns of resource-usage, moving beyond straightforward points-to analysis. Our analysis works by using the sequential proof to discover dependencies between different parts of the program. It leverages these discovered dependencies to guide the insertion of synchronization primitives into the parallelized program, and to ensure that the resulting parallelized program satisfies the same specification as the original sequential program, and exhibits the same sequential behaviour. Our analysis is built using frame inference and abduction, two techniques supported by an increasing number of separation logic tools.
Library Abstraction for C/C++ Concurrency
Published in Principals of Programming Languages (POPL), 2013
Abstract: When constructing complex concurrent systems, abstraction is vital: programmers should be able to reason about concurrent libraries in terms of abstract specifications that hide the implementation details. Relaxed memory models present substantial challenges in this respect, as libraries need not provide sequentially consistent abstractions: to avoid unnecessary synchronisation, they may allow clients to observe relaxed memory effects, and library specifications must capture these.
Ribbon Proofs for Separation Logic
Published in European Symposium on Programming (ESOP), 2013
Abstract: We present ribbon proofs, a diagrammatic system for proving program correctness based on separation logic. Ribbon proofs emphasise the structure of a proof, so are intelligible and pedagogical. Because they contain less redundancy than proof outlines, and allow each proof step to be checked locally, they may be more scalable. Where proof outlines are cumbersome to modify, ribbon proofs can be visually manoeuvred to yield proofs of variant programs. This paper introduces the ribbon proof system, proves its soundness and completeness, and outlines a prototype tool for validating the diagrams in Isabelle.
Proof-Directed Parallelization Synthesis by Separation Logic
Published in Transactions on Programming Languages and Systems (TOPLAS), Volume 35, Issue 2, 2013
Abstract: We present an analysis which takes as its input a sequential program, augmented with annotations indicating potential parallelization opportunities, and a sequential proof, written in separation logic, and produces a correctly synchronized parallelized program and proof of that program. Unlike previous work, ours is not a simple independence analysis that admits parallelization only when threads do not interfere; rather, we insert synchronization to preserve dependencies in the sequential program that might be violated by a naïve translation. Separation logic allows us to parallelize fine-grained patterns of resource usage, moving beyond straightforward points-to analysis. The sequential proof need only represent shape properties, meaning we can handle complex algorithms without verifying every aspect of their behavior.
C/C++ Causal Cycles Confound Compositionality
Published in Tiny Transactions on Computer Science (TinyToCS) Volume 2, 2014
Abstract: C/C++ permit seemingly-impossible cycles in causality. This breaks compositionality: two apparently safe programs may fault when composed.
Towards Rigorously Faking Bidirectional Model Transformations
Published in Analysis of Model Transformations Workshop, 2014
Abstract: Bidirectional model transformations (bx) are mechanisms for automatically restoring consistency between multiple concurrently modified models. They are, however, challenging to implement; many model transformation languages not supporting them at all. In this paper, we propose an approach for automatically obtaining the consistency guarantees of bx without the complexities of a bx language. First, we show how to “fake” true bidirectionality using pairs of unidirectional transformations and inter-model consistency constraints in Epsilon. Then, we propose to automatically verify that these transformations are consistency preserving—thus indistinguishable from true bx—by defining translations to graph rewrite rules and nested conditions, and leveraging recent proof calculi for graph transformation verification.
Learning Assertions to Verify Linked-List Programs
Published in Software Engineering and Formal Methods (SEFM), 2015
Abstract: C programs that manipulate list-based dynamic data structures remain a challenging target for static verification. In this paper we employ the dynamic analysis of dsOli to locate and identify data structure operations in a program, and then use this information to automatically annotate that program with assertions in separation logic. These annotations comprise candidate pre/post-conditions and loop invariants suitable to statically verify memory safety with the verification tool VeriFast. By using both textbook and real-world examples on our prototype implementation, we show that the generated assertions are often discharged automatically. Even when this is not the case, candidate invariants are of great help to the verification engineer, significantly reducing the manual verification effort.
A Scalable, Correct Time-Stamped Stack
Published in Principles of Programming Languages (POPL), 2015
Abstract: Concurrent data-structures, such as stacks, queues, and deques, often implicitly enforce a total order over elements in their underlying memory layout. However, much of this order is unnecessary: linearizability only requires that elements are ordered if the insert methods ran in sequence. We propose a new approach which uses timestamping to avoid unnecessary ordering. Pairs of elements can be left unordered if their associated insert operations ran concurrently, and order imposed as necessary at the eventual removal.
Refining Existential Properties in Separation Logic Analyses
Published in ArXiv [cs.LO], 2015
Abstract: In separation logic program analyses, tractability is generally achieved by restricting invariants to a finite abstract domain. As this domain cannot vary, loss of information can cause failure even when verification is possible in the underlying logic. In this paper, we propose a CEGAR-like method for detecting spurious failures and avoiding them by refining the abstract domain. Our approach is geared towards discovering existential properties, e.g. “list contains value x”. To diagnose failures, we use abduction, a technique for inferring command preconditions. Our method works backwards from an error, identifying necessary information lost by abstraction, and refining the forward analysis to avoid the error. We define domains for several classes of existential properties, and show their effectiveness on case studies adapted from Redis, Azureus and FreeRTOS.
Verifying Custom Synchronization Constructs Using Higher-Order Separation Logic
Published in Transactions on Programming Languages and Systems (TOPLAS), Volume 38, Issue 2, 2016
Abstract: Synchronization constructs lie at the heart of any reliable concurrent program. Many such constructs are standard (e.g., locks, queues, stacks, and hash-tables). However, many concurrent applications require custom synchronization constructs with special-purpose behavior. These constructs present a significant challenge for verification. Like standard constructs, they rely on subtle racy behavior, but unlike standard constructs, they may not have well-understood abstract interfaces. As they are custom built, such constructs are also far more likely to be unreliable.
Proving Linearizability Using Partial Orders
Published in European Symposium on Programming (ESOP), 2017
Abstract: Linearizability is the commonly accepted notion of correctness for concurrent data structures. It requires that any execution of the data structure is justified by a linearization — a linear order on operations satisfying the data structure’s sequential specification. Proving linearizability is often challenging because an operation’s position in the linearization order may depend on future operations. This makes it very difficult to incrementally construct the linearization in a proof.
Starling: Lightweight Concurrency Verification with Views
Published in Computer Aided Verification (CAV), 2017
Abstract: Modern program logics have made it feasible to verify the most complex concurrent algorithms. However, many such logics are complex, and most lack automated tool support. We propose Starling, a new lightweight logic and automated tool for concurrency verification. Starling takes a proof outline written in an abstracted Hoare-logic style, and converts it into proof terms that can be discharged by a sequential solver. Starling’s approach is generic in its structure, making it easy to target different solvers. In this paper we verify shared-variable algorithms using the Z3 SMT solver, and heap-based algorithms using the GRASShopper solver. We have applied our approach to a range of concurrent algorithms, including Rust’s atomic reference counter, the Linux ticketed lock, the CLH queue-lock, and a fine-grained list algorithm.
Compositional Verification of Compiler Optimisations on Relaxed Memory
Published in European Symposium on Programming (ESOP), 2018
Abstract: A valid compiler optimisation transforms a block in a program without introducing new observable behaviours to the program as a whole. Deciding which optimisations are valid can be difficult, and depends closely on the semantic model of the programming language. Axiomatic relaxed models, such as C++11, present particular challenges for determining validity, because such models allow subtle effects of a block transformation to be observed by the rest of the program. In this paper we present a denotational theory that captures optimisation validity on an axiomatic model corresponding to a fragment of C++11. Our theory allows verifying an optimisation compositionally, by considering only the block it transforms instead of the whole program. Using this property, we realise the theory in the first push-button tool that can verify real-world optimisations under an axiomatic memory model.
Model-Based Compliance Testing of PKCS#11 Providers
Published in Galois, Inc. - Technical Report GALOIS-20-01, 2020
Abstract: PKCS#11, also known as Cryptoki, is a widely adopted C-language API and interoperability standard for communicating with cryptographic libraries. While comprehensive and prescriptive, the standard is also extremely complex. The base specification alone describes approximately 50 functions through over 100 pages of documentation. Add-on specifications provide additional functionality, but also impose additional complexity. As the standard evolves through collaboration and expansion, new areas of imprecision and ambiguity are introduced which make it difficult for vendors to implement libraries that are 100% functionally accurate and compliant with the specification. Users of cryptographic libraries are familiar with the industry reality that PKCS#11 libraries will not always interoperate flawlessly with each other. In the best case, these divergences result in development delays as build issues and functional deficiencies are root-caused and remedied. In the worst case, customers may face production outages or data corruption due to incorrect assumptions or imperfect testing.
On the Formal Verification of the Stellar Consensus Protocol
Published in Workshop on Formal Methods for Blockchains (FMBC), 2020
Abstract: The Stellar Consensus Protocol (SCP) is a quorum-based BFT consensus protocol. However, instead of using threshold-based quorums, SCP is permissionless and its quorum system emerges from participants’ self-declared trust relationships. In this paper, we describe the methodology we deploy to formally verify the safety and liveness of SCP for arbitrary but fixed configurations. The proof uses a combination of Ivy and Isabelle/HOL. In Ivy, we model SCP in first-order logic, and we verify safety and liveness under eventual synchrony. In Isabelle/HOL, we prove the validity of our first-order encoding with respect to a more direct higher-order model. SCP is currently deployed in the Stellar Network, and we believe this is the first mechanized proof of both safety and liveness, specified in LTL, for a deployed BFT protocol.
Verified Cryptographic Code for Everybody
Published in Computer Aided Verification (CAV), 2021
Abstract: We have completed machine-assisted proofs of two highly-optimized cryptographic primitives, AES-256-GCM and SHA-384. We have verified that the implementations of these primitives, written in a mix of C and x86 assembly, are memory safe and functionally correct, by which we mean input-output equivalent to their algorithmic specifications. Our proofs were completed using SAW, a bounded cryptographic verification tool which we have extended to handle embedded x86. The code we have verified comes from AWS LibCrypto. This code is identical to BoringSSL and very similar to OpenSSL, from which it ultimately derives. We believe we are the first to formally verify these implementations, which protect the security of nearly everybody on the internet.
Formally Verifying Industry Cryptography
Published in IEEE Security & Privacy (Volume: 20, Issue: 3, May-June 2022), 2022
Abstract: Over the past five years, Galois has formally verified several cryptographic systems that are used in demanding industry environments. This article discusses our approach to these verification projects, focusing on the practical engineering challenges that exist when building and deploying proofs in industry.
Trustworthy Runtime Verification via Bisimulation (Experience Report)
Published in International Conference on Functional Programming (ICFP), 2023
Abstract: When runtime verification is used to monitor safety-critical systems, it is essential that monitoring code behaves correctly. The Copilot runtime verification framework pursues this goal by automatically generating C monitor programs from a high-level DSL embedded in Haskell. In safety-critical domains, every piece of deployed code must be accompanied by an assurance argument that is convincing to human auditors. However, it is difficult for auditors to determine with confidence that a compiled monitor cannot crash and implements the behavior required by the Copilot semantics.
Research Report: An Optim (l) Approach to Parsing Random-Access Formats
Published in Tenth LangSec Workshop at IEEE Security & Privacy, 2024
Abstract: We introduce a Domain Specific Language (DSL) that allows for the declarative specification of random access formats (formats that include offsets or require out of order parsing, e.g., zip, ICC, and PDF). Our DSL is composed by layering three distinct computational languages: first, we have the base layer of primitive parsers (these could use various “off the shelf” parsing technology). Second, upon this base we layer our XRP (eXplicit Region Parser) DSL, a pure functional language for declaratively specifying random-access formats. XRP, although similar to parser combinators, gives greater control and expressiveness by making parsing regions explicit. Third, we embed XRP into Optim (l), a novel DSL that describes Directed Acyclic Graphs (DAGs) of computations. A node represents a computation in XRP, an edge represents a dependency between the computations. From a declarative Optim (l) program we can generate an imperative module that provides an API for ondemand and incremental parsing of the random access format.
Daedalus: Safer Document Parsing
Published in Programming Language Design and Implementation (PLDI), 2024
Abstract: Despite decades of contributions to the theoretical foundations of parsing and the many tools available to aid in parser development, many security attacks in the wild still exploit parsers. The issues are myriad—flaws in memory management in contexts lacking memory safety, flaws in syntactic or semantic validation of input, and misinterpretation of hundred-page-plus standards documents. It remains challenging to build and maintain parsers for common, mature data formats. In response to these challenges, we present Daedalus, a new domain-specific language (DSL) and toolchain for writing safe parsers. Daedalus is built around functional-style parser combinators, which suit the rich data dependencies often found in complex data formats. It adds domain-specific constructs for stream manipulation, allowing the natural expression of parsing noncontiguous formats. Balancing between expressivity and domain-specific constructs lends Daedalus specifications simplicity and leaves them amenable to analysis. As a stand-alone DSL, Daedalus is able to generate safe parsers in multiple languages, currently C++ and Haskell. We have implemented 20 data formats with Daedalus, including two large, complex formats—PDF and NITF—and our evaluation shows that Daedalus parsers are concise and performant. Our experience with PDF forms our largest case study. We worked with the PDF Association to build a reference implementation, which was subject to a red-teaming exercise along with a number of other PDF parsers and was the only parser to be found free of defects.
Macaw: A Machine Code Toolbox for the Busy Binary Analyst
Published in ArXiv [cs.PL], 2024
Abstract: When attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern instruction set architectures.
Crux, a Precise Verifier for Rust and Other Languages
Published in ArXiv [cs.PL], 2024
Abstract: We present Crux, a cross-language verification tool for Rust and C/LLVM. Crux targets bounded, intricate pieces of code that are difficult for humans to get right: for example, cryptographic modules and serializer / deserializer pairs. Crux builds on the same framework as the mature SAW-Cryptol toolchain, but Crux provides an interface where proofs are phrased as symbolic unit tests. Crux is designed for use in production environments, and has already seen use in industry.
talks
How Do We Know Weak Memory Matters?
Published:
Parsing and Understanding in a Messy World
Published:
LLMs are Useful for Small Problems
Published:
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.