Nomenclature in Programming Languages Theory

technical

Somewhat Confident - 3/5

Dec 12, 2024

💡

This is a brief note on the course Principles of Programming Languages taught by Prof. Ethan Cecchetti

This is not at all related to how to program in practice, but on the fundamental theory of how to analyze, construct and formalize programming languages using the tools of mathematical logic.

It would be surprise to some people that programming languages is tightly tied to logic, which is what Curry-Howard correspondence states. With this in our mind, programming languages, a field of computer science, can be a subject of pure math, especially discrete math. The development of programming languages theory focuses on

developing new logic by adding new constructs and rules to the language

analyzing and proving the properties where the logic hold

Even though this subject does not necessarily need any coding, or say implementation, but purely working on the math side, we still need to keep in mind that the motivation and intuition beneath come from real programming problems.

induction

Why induction is powerful? It has the fundamentally succinct structure, just like the unary base numerals

weak (mathematical) induction

strong induction

structural induction

well-founded induction

Config is the state of the program

Config = Exp Store

Store = Var Int (a partial function)

operational semantics describes how states are reduced on an abstract machine

small-step operational semantics is a binary relationship Config Config, describing how to reduce a configuration one step per time, until (possibly) eventually cannot be reduced anymore

big-step operational semantics is a binary relationship Config Final, describing how to reduce a configuration directly to the final value

denotational semantics describes the relationship on a mathematical model, i.e. translating a programming term into a mathematical function. For example, where

axiomatic semantics describes the relationship between properties before and after execution

partial vs. total correctness

Hoare logic consists of Hoare triples written as which respectively stands for precondition, program and post-condition under partial correctness assertion (PCA)

separation logic extends Hoare logic by stating the propositions over heap or more generally resources where the ownership of these resources to propositions matters

predication transformers is the wlp(c,) which transforms the Hoare triple to an equivalent predicate

-calculus whose BNF grammar is , follows a greedy policy when matching the -body; it is left-associative when dealing with application

-conversion: renaming the function

-reduction: application by substitution
- however, this may lead to unsafe substitution due to name conflict, i.e. the parameter is in the free variables of the expression to apply. Therefore, we need -conversion to handle it

-calculus is confluent under -conversion and -reduction, which means that if and under different reduction strategies (so the reduction is nondeterministic), there always exists such that both and can reduce to

The reduction strategy in -calculus involves two deterministic CBV (call-by-values) and CBN (call-by-name)

CBV evaluates argument first, where most (functional) languages do

CBN evaluates function body first

full -reduction is generally non-deterministic since it can reduce any redex if possible

evaluation context simplifies the definition of semantics

The encoding of -calculus involves giving names to numerals, boolean, control statements, etc.

How about recursive functions? This uses the Y combinators which defines the fixed point to the recursive function, and hence encode the unbounded loops

To avoid infinite loops caused by CBV in Y combinators, one can use -expansion to wrap the self-application f f to delay its evaluation (since is already a value), but keep its original result when applying to any arguments. The modification is called Z combinators

There is also a corresponding -contraction by removing this pattern

There are infinite many combinators, among which combinators is coined by Turing

FL extends -calculus with some primitives and some clauses including lecrec in where the scope of each is the entire lecrec expression, so the recursive function is represented, and mutually recursive function is doable

Different from -calculus, there exists some terms that are neither value nor reducible, which are stuck, and will lead to a new thing, called runtime type error

The translation from FL to (CBV) -calculus requires not only semantics translation, but handling of the runtime type error

scope matters, where there are two types of scopes, take this as an example

static scope where the binding happens immediately, reduces to 3
- most modern languages use static scoping since it benefits modular programming

dynamic scope where the binding happens until evaluation, reduces to 2
- some dynamic languages use dynamic scoping and it is easier to implement

Notice that FL is stateless just as -calculus where there is no way to change the “variable” as soon as it gets assigned. Therefore, state is not needed for Turing-completeness, but a natural real-world reasoning tool, for example x := e and pointers ref e

continuation is the function taking the result of current expression e as the input, and perform the rest of computation. In essence, it is the evaluation context.

continuation passing style (CPS) is a recursive chain of continuations, which uniquely defines the order of evaluation, so it does not need evaluation context to specify it.

capture and save continuation provides a way to replace the current call stack with an old and saved one, e.g. call/cc and let/cc in Scheme (or Racket).

In CPS, recursive functions become tail-recursive, where in implementation, we substitute heap space for stack space.

typed -calculus has less expressive power as it constrains the validity of programs without ways to tackling around.

STLC is the one of the simplest typed -calculus

type inference is the process to infer types even if types are not explicitly specified. In a complicated type system, this problem can be undecidable.

unification is a substitution (or say unifier) such that for pairs of terms s and t, we have (s)=(t)

most general unification, if a unifier exists, is a unique substitution up to renaming, such that for every they are constructed with more substitutions from ; a recursive algorithm called Robinson's algorithm gives the MGU; this algorithm always gives MGU for any untyped expressions in STLC by performing the recursive substitution over the constraints from all subterms

subtyping along with subsume rules provide more flexibility to the expressiveness. The operational semantics for subtyping follows three types depending on the direction.

covariance

contravariance

invariance
where g is any operations over the type, for example ref

recursive typing can refers the type itself, e.g. , with which STLC can express nontermination. is the way to express the fixed point for this type.

polymorphism extends the type with a universal type parameter, denoted as . For STLC it cannot express type since there does not exist such a type that can on behalf of any type (the same for most interesting PL). However, is of type as recursive type is allowed

normal form means that a -term cannot step further

Church-Rosser property (confluence) says that if a term has normal form, then it is unique up to -equivalence

strong normalization defines a term such that every -reduction sequence eventually produces normal forms, that is terminated

STLC is a language such that every term is strong normalizing (that is ), and hence it is less powerful than untyped -calculus

information flow control tracks and restricts the flow of information with different levels. Information cannot flow from higher levels to lower levels. Every values can have type where there is a tag (or say label) specifying the visibility of their information. For instance, labels can be high and low (or secret vs. public), and in real life, it can be kernel memory vs. user space memory.

By adding typing rules and operational semantics to tagged types, we can enforce the flow as described above (which is known as noninterference).