By Larry Diehl
Symbiont Assembly is, first and foremost, an application platform for decentralized applications—called smart contracts, so there must be a programming language we use to write these applications. There are two main ways of accomplishing this goal: re-using a general-purpose language, such as Java, Golang, or Scala, or using a domain-specific programming language (DSL), like Bitcoin Script, or Solidity. In the enterprise blockchain space, there is some debate as to which strategy is better.
At Symbiont, we've chosen to develop a domain-specific language, called SymPL (Symbiont Programming Language), for writing smart contracts targeting the Assembly blockchain platform. The goal of this post is to describe the motivation behind the high-level design of the SymPL smart contract DSL. Specifically, I’ll describe how SymPL is able to provide best-in-class language safety while also being more familiar and easier to use than other smart contract DSLs.
Symbiont's customers are today using SymPL to write complex and rich privacy-preserving
smart contract applications for the Assembly platform, and deploying them in mission-critical use cases. They depend on the strong safety features provided by the language to ensure that their smart contracts operate correctly and securely. They are able to build larger, more sophisticated applications, faster and more easily, than if they had to learn the unfamiliar or cumbersome programming paradigms that are often associated with the level of safety SymPL provides.
There are many reasons why a blockchain platform supporting distributed applications should use a specially designed DSL, rather than shoehorn an existing language into that role:
The semantics for a smart contract language should execute deterministically (e.g. no rand function) so the network can agree on a consistent view of data.
The language semantics should provide resource-tracking, to prevent a bad computation from executing a denial-of-service attack (DOS) against the decentralized network.
The language should have privacy and concurrency mechanisms to safely but effectively share sensitive data. While concurrency support is common in non-blockchain languages, privacy support is not.
The language should help in preventing errors via safety mechanisms (e.g. a type system and static analysis tools). Bugs in a blockchain environment tend to be especially costly due to the often financial or otherwise critical nature of distributed applications.
Each of these points deserves its own blog post, but this post will focus on the last point, which is widely applicable to any modern programming language intended to minimize implementation bugs. Specifically, it will focus on motivating and describing the type system of SymPL, by first looking at ideal systems and then coming back to reality by integrating industry trends to help ease adoption. Safety and Ease-of-Use
Type systems are somewhat humorously referred to as the most successful application of formal methods in industry software, helping ensure that operations, such as functions or methods, respect their expected interfaces (i.e., types). Hence, it is sensible to use a typed programming language to write safety-critical applications.
If we want to maximize safety of applications at all costs, we might base a smart contract
language on a purely functional language like Haskell to avoid bugs resulting from complex imperative interactions. Going further, we might even choose a dependent type system, as that of Agda or Coq, where types are expressive enough to act as logical specifications of correct software.
When designing SymPL, our goal was to strike a balance between safety and ease of use, by introducing cutting-edge type system features to a broad audience in a way that isn't at all off-putting and that maximizes user friendliness. We are able to "square the circle" and provide safety and ease-of-use in a way that hasn't been done before really anywhere (when taking the combination of all features the language provides into account), and that's especially useful in the smart contract language space, which patently requires both. Features of SymPL
SUBTYPING AND FLOW SENSITIVITY
While gradual typing is not desirable in a blockchain setting where bugs can be introduced via a dynamically typed escape hatch, the subtyping and flow sensitivity language design research coming from gradually typed languages can be applied to proper typed languages. This enables well-typed code to be written using idioms and the look-and-feel of mainstream dynamically typed and object-oriented languages. BOOLEAN CONNECTIVE TYPES
We are absolutely not the first to try to appeal to idiomatic dynamically typed and object-oriented code via subtyping and flow sensitivity, and maximizing safety by eschewing gradual typing. Ceylon, Kotlin, the forthcoming Scala 3 (Dotty), and even Java 14 all do the same. A common feature of most languages mentioned so far is union and/or intersection types, taking their name from set-theoretic operations, or boolean algebra connectives. Algebraic data types are powerful tools from the world of functional programming, but are still not quite at mainstream adoption. Luckily, they are generalized by boolean connective types, which have a natural interpretation familiar to any CS major. If types are thought of as sets of values, subtyping can be thought of as subset inclusion. Furthermore, types (or sets of values) can be combined with familiar set-theoretic union and intersection operations, or union and intersection types (e.g. int is a subtype of Union[int, str]). Although such types may seem exotic, they are intuitive to use by just thinking about their set-theoretic interpretation.
PURITY OVER OO VS FP
While flow sensitivity is a safe way to appeal to familiar dynamically typed idioms, support for a more familiar object-oriented (OO) versus a functional programming (FP) programming style is still contentious. Languages like Scala and OCaml have demonstrated that with the right language design, both styles can elegantly be supported. Mechanisms such as record subtyping or row polymorphism demonstrate that there are more similarities to both approaches than meets the eye when rendered in terms of core language features. Furthermore, while pure functional programming (as popularized by Haskell) is associated with ease of static analysis / formal reasoning / formal methods, it is primarily _purity_ that enables this. Although it is not widely considered, pure object-oriented (or mixed OO/FP) programming also works well! SymPL's Core Language
We have found that it is possible to program in a mixed OO/FP style that is familiar to dynamically typed programmers, while also facilitating extensions to support formal reasoning. This is achieved by programming in a high-level, safe-but-familiar surface language (i.e., SymPL), which can be type checked in a principled way via translation (or, elaboration) to a well-designed, minimal core language.
SymPL's design should be contrasted in particular to that of Haskell-inspired blockchain languages like Plutus and DAML, which essentially use a variant of System Fω as their core language, which is similar to the core language used by the GHC compiler for Haskell. System Fω is a small and principled functional programming language that is well-studied and understood in the academic programming language theory (PLT) community. While System Fω is a beautiful core language for functional programming, that user base is small compared to that of programmers more comfortable with object-oriented and even dynamically typed languages. SymPL's core language is a strict generalization of System Fω, called System Fω<: (read as "System F-omega-sub"), which extends System Fω with subtyping. Additionally, the core language of SymPL supports boolean connective types (union, intersection, and even negation types), record subtyping, and equi recursive types. The combination of these allows for:
Safe implicit casts between compatible data, via subtyping.
A familiar object-oriented programming interface coexisting with a pure functional core language, via record subtyping.
Flow-sensitive refinement of types under branches of conditionals, via boolean connective types.
Safe representations of common datatypes like JSON without additional constructor wrappers, via equirecursive types.
An SML-inspired parameterized module system, supporting abstraction and information hiding, via higher-order and higher-rank type quantification.
SymPL as "Python-esque"
Having a powerful, principled core language is valuable, but you still need an approachable high-level surface language for users to work with on a daily basis. In designing the surface language of SymPL, we took particular inspiration from Python, which has in recent years grown to be among the most popular programming languages out there. Python, more than any other of the most popular languages, emphasizes strong readability and usability, having an almost pseudocode-like simplicity when implementing algorithms. This is especially important when smart contract code must be understood and agreed upon by multiple parties. Furthermore, mypy, Python's gradual type checker, already incorporates union types and flow-sensitive typing, though not intersection or negation types. This makes it easy for Python developers to learn SymPL due to familiarity.
SymPL is not Python, but was merely strongly inspired by it and has also historically used Python for various aspects of its implementation. It's definitely similar enough to make the learning process easy. Some of the biggest differences are as follows:
Execution and timeout, via gas, are deterministic to be able to run in a blockchain environment.
Gradual typing has been replaced by static typing to prevent unsafe dynamically typed code from being introduced.
Unsafe and difficult-to-reason-about features are not included, such as runtime redefinition/patching via metaprogramming, method overriding, and call-by-reference (we use call-by-value).
The addition of intersection types to express that a type implements multiple interfaces.
The addition of an SML-style module system to support creating abstract data types (ADT's), for contract authors to precisely control the interfaces they export to the public without access to implementation details via encapsulation.
The introduction of (equi)recursive types, which can be used to safely represent familiar untagged JSON (via untagged union types) data popular in dynamically typed language, but can also be used to represent algebraic data types from functional programming (by encoding disjoint unions as untagged unions with added tags).
Having described the motivations and features of the language, I'll now show some examples to help you get a taste of it. You'll notice the strong similarity in syntax to Python, which makes SymPL extremely easy to learn, read and write in.
Below we will demonstrate some noteworthy features of SymPL that help build a bridge between the functional, object-oriented, and dynamic programming language communities, none of which are available in conventional functional languages like Plutus or DAML:
Positive and negative flow-sensitive type refinement, appealing to idiomatic dynamically typed intuitions that do not manually unpack data (e.g. out of an Optional type).
(Equi)recursive and union types, also appealing to idiomatic dynamically typed intuitions that do not manually pack data (e.g. using explicit constructors of a JSON type).
Record subtyping, appealing to object-oriented programmers by enabling a safe and pure encoding of classes (e.g. a Person class with name attributes and a full_name method that performs formatting).
Intersection types, appealing to dynamic and object-oriented programmers by enabling a safe way to perform duck typing, or functions with generic arguments that satisfy some interface (e.g. taking an argument that at least has first_name and last_name attributes, but may have more arguments irrelevant to computing formal_name).
POSITIVE TYPE REFINEMENT
Let's start with an example of flow-sensitive typing, which most commonly occurs via an Optional type to represent potentially failing, or nullable, computations. In dynamically typed programs it is idiomatic for a function to either return a value of an expected type if it succeeds, or None if it does not. Functions using results of such partial/optional computations need to first inspect whether the expected data is really there, or else handle a default case. Consider the example below, which returns the string representation of a doubled number, or the empty string as a default case:
Let's dive a little deeper to understand what's going on underneath the hood of SymPL. The Optional[int] type is actually just a synonym for the union type Union[int, None]. In the "then" branch, the type of x is refined by intersecting it with the type in the conditional test. Hence, x has type Intersection[Union[int, None], int]. By distributivity, this is equivalent to Union[ Intersection[int, int], Intersection[None, int] ]. The left intersection simplifies to int and the right simplifies to Bottom, which is an error type that has no inhabiting values, leaving us with Union[int, Bottom]. Because Bottom acts as an identity value for set-theoretic union, x just has type int at the end, hence x * 2 is a well-typed expression. Without knowing all these laws, it's often easier to just appeal to set-theoretic intuitions by interpreting types as sets of values and Union and Intersection types as set-theoretic operations. This explains why Intersection[int, int] simplifies to int, because the set of integers intersected with itself is the entire set of integers. Similarly, Intersection[None, int] simplifies to Bottom because the intersection of the singleton set containing None and the set of integers is the empty set, which is represented by the type Bottom. Finally, Union[int, Bottom] simplifies to int because unioning the integers with the empty set (Bottom) does not add any new elements. Even though union and intersection types are somewhat exotic from an academic perspective, they are in practice easy to use for industry programmers because set theory is commonly taught in CS programs.
Finally, for experts I'll note that unlike Kotlin, SymPL's flow-sensitivity construct is compositional and does not break type preservation properties related to substitution. This is because flow-sensitive conditionals are desugared into a binding type-case construct that
shadows in the example above and avoids said issues, as done in CDuce.
NEGATIVE TYPE REFINEMENT
Next let's take a look at an alternative way we could have written the previous example by using another aspect of flow sensitivity. This time we will check whether the input is an instance of None instead of int:
In the "then" branch x has been refined to have type Intersection[Union[int, None], None], although x is not used so it doesn't matter. More interestingly, in the "else" branch x has been refined to have type Intersection[Union[int, None], Negation[None]], which intersects the optional with a negation type. A Negation type represents the complement of a set. The complement of None includes all types that are not None. You can work through the equivalences yourself using De Morgan's laws, or appeal to set-theoretic intuitions via set-complements. In the end, the result is that the expression x * 2 is well-typed in the "else" branch because x was originally int or None, and we know that it is not None, so it must be int!
(EQUI)RECURSIVE AND UNION TYPES
Previously we saw how to use a Union type to represent a potentially failing (Optional) computation. Now we will combine Union types with another SymPL feature, recursive types, to encode JSON:
Even though SymPL's core language of terms is functional, the type system still allows programming in an object-oriented style. Let's imagine creating an Person class in the high-level language with a full_name method:
Internally, this is translated into a primitive Record, which is like a Dict but with a static collection of fields. Python does not have a Record type, so below we are inventing syntax in order to demonstrate what happens when SymPL translates high-level code into its core language:
Now code can construct a value of type Person using the Person(...) constructor, and can access the first_name and last_name "properties" via record projections, and the full_name "method" by a record projection followed by a function call:
Also, note that in the definition of the Person(...) constructor, the full_name method does not take an explicit self argument, and can instead more efficiently access its properties by capturing them in a closure within a lambda.
Besides being used for flow-sensitivity, intersection types can also be used to enforce that a function argument implements multiple interfaces required for that function to be well-defined. For example, let's say that we defined separate interfaces for having a first_name and a last_name, and wanted to define a formal_name function that really only required those properties to be present, but didn't need to operate on more complex People objects (which may have many more irrelevant properties):
Record types like HasFirstName and HasLastName can act as interfaces specifying which properties or methods are required of classes. Taking the Intersection of two interfaces creates a new interface that requires both inner interfaces to be satisfied (i.e., having both the first_name and last_name attributes). Finally, record subtyping allows a Person to be passed into a function (formal_name) expecting a Intersection[HasFirstName, HasLastName]. This is because Person has both required attributes (first_name and last_name), even though it also has additional attributes (middle_name and full_name) that are are not required to satisfy the compound interface Intersection[HasFirstName, HasLastName].
We've barely scratched the surface when it comes to the features afforded to SymPL by its small but expressive core language System Fω<:, and its familiar and easy-to-use, Python-esque surface language. More complex classes can be encoded by combining records with either recursive or existential types, as demonstrated in TAPL. Algebraic data types can be encoded by combining recursive Union types with Record types, which act as tags for constructors. Exhaustive pattern match analysis can be performed by creating subtyping problems using union and intersection types, as done in Ceylon. Powerful abstraction and data hiding can be performed using an SML-style or OCaml-style parameterized module system via existential types. Polymorphic (or, generic) functions and parameterized types can be written to maximize code reuse, and so on and so forth.
Some other projects that SymPL takes inspiration from may be of interest to the curious reader, mainly CDuce) for its semantic subtyping and Whiley for its incorporation of formal verification into the language ecosystem (it also shares the Python-esque look-and-feel of SymPL). Subtyping, a way for terms of more specific types to be implicitly casted to terms of more general types, is fundamental in helping SymPL be more familiar to a broader audience. In the future we plan to add static analysis and formal verification capabilities, which also feel natural in a language with subtyping due to refinement (or liquid) types inducing a form of a subtyping relationship.
Even though much of the programming language industry's focus on safety and purity has
been limited to functional languages like Haskell and OCaml, it turns out that subtyping and boolean connective types can make a type safe and secure language, like SymPL, accessible to a wider audience coming from an object-oriented and dynamically typed background.