Habit: a new functional programming language

Habit is a pure functional language that explores the intersection of low-level programming problems and high-level programming paradigms.

We believe that the potential for such languages can be seen in recent verified software development projects, such as the construction of the seL4 microkernel. These projects are landmark achievements for formal verification, but the effort required makes such techniques inaccessible for the majority of software projects. For example, the functional correctness proofs for the seL4 kernel are an order of magnitude longer than the combined specification and implementation of the kernel, and construction of the proof required an estimated 20 person years. Because the seL4 kernel is written in C, some portion of this proof effort was needed to ensure that the implementation avoids common sources of bugs and security vulnerabilities in C programs such as buffer overflows and null pointer dereferences. Might it be possible instead to write software like this in a higher level programming language where such guarantees are obtained automatically as a result, for example, of type checking? How much impact might this have on the overall cost of verification? And can we still obtain the levels of raw performance and predictability that are typically required in systems programming when we are using higher level languages like this?

Features of Habit

Habit's syntax and type system are inspired by Haskell, while its semantics are inspired by ML. Its distinctive character arises from three particular focuses. First, Habit provides safe manipulation of low-level data by extending notions of algebraic datatypes to bit-level specification of value and memory layouts. Second, Habit includes powerful new mechanisms for type-level programming, providing more flexible, expressive forms of type-directed generic programming. Third, Habit has experimental support for a novel form of substructural typing, providing a foundation for capturing state, protocols, and other facets of program behavior in types without losing the expressiveness of functional programming.

Getting involved

The Habit language and its prototype implementation are open-source projects developed by a small team of researchers that is currently led by Mark P. Jones at Portland State University and by Garrett Morris at the University of Iowa. We welcome feedback and contributions.

Language Definition: The Habit language report (also available in source form on GitHub) gives a holistic specification of the language syntax and standard environment, including the predefined types and type classes. It also includes an extended example of Habit programming, showing how Habit's functional and low-level features come together in describing an efficient implementation of a priority queue. Note that there have been a few relatively small changes to the language design since this document was last updated in 2010, but most parts of the language are still as described in the report.

Implementation: A prototype compiler for Habit is under active development. Its implementation is currently split between two components:

Source code for both components is available on GitHub via the links above. We welcome early adopters who are interested in experimenting with the Habit language, and who might be willing to contribute to its development and documentation.

Other Materials: The design of Habit was featured in a course on Languages and Low-level Programming that was taught at Portland State University in Spring 2015, Spring 2016, and Fall 2017. The materials for this course describe some the tools, techniques, and challenges of developing low-level systems software and provide motivation for discussions of language design, including the design and some of the features of Habit.

History and Acknowledgements

The Habit language was originally developed as part of the HASP (High-Assurance Systems Programming) project at Portland State University, and built on the experience of House, an operating system written in Haskell.

Work on the mil-tools back-end during the period 2014-2018 has been supported in part by funding from the National Science Foundation, Award No. CNS-1422979.

Technical Papers

Implementation

The following papers describe some of the tools and methods that are used in our implementation of the current Habit compiler pipeline, comprising the Alb front-end and the mil-tools back-end.

▷ ▽ Mark P. Jones, Justin Bailey, and Theodore R. Cooper. "MIL, a Monadic Intermediate Language for Implementing Functional Languages". (2018)

This paper describes MIL, a "monadic intermediate language" that is designed for use in optimizing compilers for strict, strongly typed functional languages. By using a notation that exposes the construction and use of closures and algebraic datatype values, for example, the MIL optimizer is able to detect and eliminate many unnecessary uses of these structures prior to code generation. One feature that distinguishes MIL from other intermediate languages in this area is the use of a typed, parameterized notion for basic blocks. This both enables new optimization techniques, such as the ability to create specialized versions of basic blocks, and leads to a new approach for implementing changes in data representation.

To appear in Proceedings of the 30th symposium on Implementation and Application of Functional Languages (IFL 2018), Lowell, MA, September 2018. Draft version available in PDF.

▷ ▽ The High Assurance Systems Programming Project (Hasp). "A Compilation Strategy for the Habit Programming Language". Technical Report, November 2010.

This technical report was written to document the strategy for compiling Habit programs to Lambda Case in the initial version of the compiler front-end (Alb). Although there have been many changes to the implementation since this report was written, the basic structure and role of the key intermediate language that it describes—or really a family of languages, called "MPEG", which is an acronym for "Matches, Patterns, Expressions, and Guards"—is still broadly as described here.

An electronic version of this technical report is available in PDF.

Safe manipulation of low-level data

Algebraic data types provide a high-level view of data, guaranteeing safe and efficient construction and deconstruction of data while abstracting from details of representation. Bitdata provides the same guarantees of safe construction and deconstruction for word-sized (or smaller) values, while allowing exact specification of data layout, avoiding the need (and potential for error) in manual bit masking and manipulation. Habit also provides bytedata, which provides a similar level of control of representation for larger, memory-based data structures. The following papers describe bitdata and bytedata, giving motivation, examples, and the details of their typing and compilation.

▷ ▽ Iavor S. Diatchki, Mark P. Jones, and Rebekah Leslie. "High-level Views on Low-level Representations". (2005)

This paper explains how the high-level treatment of datatypes in functional languages---using features like constructor functions and pattern matching---can be made to coexist with bitdata. We use this term to describe the bit-level representations of data that are required in the construction of many different applications, including operating systems, device drivers, and assemblers. We explain our approach as a combination of two language extensions, each of which could potentially be adapted to any modern functional language. The first adds simple and elegant constructs for manipulating raw bitfield values, while the second provides a view-like mechanism for defining distinct new bitdata types with fine-control over the underlying representation. Our design leverages polymorphic type inference, as well as techniques for improvement of qualified types, to track both the type and the width of bitdata structures. We have implemented our extensions in a small functional language interpreter, and used it to show that our approach can handle a wide range of practical bitdata types.

Originally appeared in Proceedings of the International Conference on Functional Programming (ICFP 2005), Tallinn, Estonia, September 2005. Available in PDF.

▷ ▽ Iavor S. Diatchki and Mark P. Jones. "Strongly Typed Memory Areas: Programming Systems-Level Data Structures in a Functional Language". (2006)

Modern functional languages offer several attractive features to support development of reliable and secure software. However, in our efforts to use Haskell for systems programming tasks---including device driver and operating system construction---we have also encountered some significant gaps in functionality. As a result, we have been forced, either to code some non-trivial components in more traditional but unsafe languages like C or assembler, or else to adopt aspects of the foreign function interface that compromise on strong typing and type safety.

In this paper, we describe how we have filled one of these gaps by extending a Haskell-like language with facilities for working directly with low-level, memory-based data structures. Using this extension, we are able to program a wide range of examples, including hardware interfaces, kernel data structures, and operating system APIs. Our design allows us to address concerns about representation, alignment, and placement (in virtual or physical address spaces) that are critical in some systems applications, but clearly beyond the scope of most existing functional languages.

Our approach leverages type system features that are well-known and widely supported in existing Haskell implementations, including kinds, multiple parameter type classes, functional dependencies, and improvement. One interesting feature is the use of a syntactic abbreviation that makes it easy to define and work with functions at the type level.

In Proceedings of the ACM SIGPLAN 2006 Haskell Workshop, Portland, Oregon, September 2006. Available in PDF.

Expressive automatic type-level programming

Type classes provide a powerful facility for characterizing types, and so can provide strong typing for overloading (ad hoc polymorphism). Habit builds on existing type class systems in several ways. First, we provide enhanced syntactic support for functional dependencies, which allow us to characterize type-level computation. Second, we introduce a notion of negation for predicates, which both refine class specifications and can capture additional safety properties. Finally, we introduce alternatives in instance declarations; in combination with negated predicates (or functional dependencies) these provide a significant increase in expressiveness compared to other type class systems or other mechanisms for automatic type-level programming. The following paper give overview of programming with functional dependencies and of instance chains, which combine negated predicates and alternatives in instances.

▷ ▽ Mark P. Jones and Iavor Diatchki. "Language and Program Design for Functional Dependencies". (2008)

Eight years ago, functional dependencies, a concept from the theory of relational databases, were proposed as a mechanism for avoiding common problems with multiple parameter type classes in Haskell. In this context, functional dependencies give programmers a means to specify the semantics of a type class more precisely, and to obtain more accurate inferred types as a result. As time passed, however, several issues were uncovered---both in the design of a language to support functional dependencies, and in the ways that programmers use them---that led some to search for new, better alternatives.

This paper focuses on two related aspects of design for functional dependencies: (i) the design of language/type system extensions that implement them; and (ii) the design of programs that use them. Our goal is to clarify the issues of what functional dependencies are, how they should be used, and how the problems encountered with initial proposals and implementations can be addressed.

In Proceedings of the ACM Haskell Symposium (Haskell '08), Victoria, British Columbia, Canada, September 25, 2008. Available in PDF.

▷ ▽ J. Garrett Morris and Mark P. Jones. "Instance Chains: Type Class Programming Without Overlapping Instances". (2010)

Type classes have found a wide variety of uses in Haskell programs, from simple overloading of operators (such as equality or ordering) to complex invariants used to implement type-safe heterogeneous lists or limited subtyping. Unfortunately, many of the richer uses of type classes require extensions to the class system that have been incompletely described in the research literature and are not universally accepted within the Haskell community.

This paper describes a new type class system, implemented in a prototype tool called ilab, that simplifies and enhances Haskell-style type-class programming. In ilab, we replace overlapping instances with a new feature, instance chains, allowing explicit alternation and failure in instance declarations. We describe a technique for ascribing semantics to type class systems, relating classes, instances, and class constraints (such as kind signatures or functional dependencies) directly to a set-theoretic model of relations on types. Finally, we give a semantics for ilab and describe its implementation.

In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP '10), Baltimore, Maryland, USA, September 27-29, 2010. Available from the ACM DL or in PDF.

Sound foundations for linear and behavioral typing

A more experimental feature of Habit is its support for linear (and substructural) typing. Linear type systems provide a general mechanism to stateful safety properties with types. For example, we can use linear type to assure that file handles are not used after they are closed, track capabilities, or check communication protocols (expressed as session types). While there has been significant research on linear types and their usage, Habit provides a unique mechanism to integrate linear and functional programming. The following paper gives an overview of our linear type system, and additional examples of the integration of linear and functional programming.

▷ ▽ J. Garrett Morris. "The Best of Both Worlds: Linear Functional Programming Without Compromise". (2016)

We present a linear functional calculus with both the safety guarantees expressible with linear types and the rich language of combinators and composition provided by functional programming. Unlike previous combinations of linear typing and functional programming, we compromise neither the linear side (for example, our linear values are first-class citizens of the language) nor the functional side (for example, we do not require duplicate definitions of compositions for linear and unrestricted functions). To do so, we must generalize abstraction and application to encompass both linear and unrestricted functions. We capture the typing of the generalized constructs with a novel use of qualified types. Our system maintains the metatheoretic properties of the theory of qualified types, including principal types and decidable type inference. Finally, we give a formal basis for our claims of expressiveness, by showing that evaluation respects linearity, and that our language is a conservative extension of existing functional calculi.

In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016), Nara, Japan, 2016. An extended edition is available in PDF. Note that this version fixes several inadvertent omissions and typographical errors, and so should be preferred to the conference version. The conference version is available from the ACM DL.