Jimmy

💕 Jimmy 💗

Nomenclature in Programming Languages Theory

technical
Somewhat Confident - 3/5
Dec 12, 2024

💡
This is a brief note on the course Principles of Programming Languages taught by Prof. Ethan Cecchetti
This is not at all related to how to program in practice, but on the fundamental theory of how to analyze, construct and formalize programming languages using the tools of mathematical logic.
It would be surprise to some people that programming languages is tightly tied to logic, which is what Curry-Howard correspondence states. With this in our mind, programming languages, a field of computer science, can be a subject of pure math, especially discrete math. The development of programming languages theory focuses on
  • developing new logic by adding new constructs and rules to the language
  • analyzing and proving the properties where the logic hold
Even though this subject does not necessarily need any coding, or say implementation, but purely working on the math side, we still need to keep in mind that the motivation and intuition beneath come from real programming problems.
induction
  • Why induction is powerful? It has the fundamentally succinct structure, just like the unary base numerals
  • weak (mathematical) induction
  • strong induction
  • structural induction
  • well-founded induction
Config is the state of the program
Config = Exp Store
Store = Var Int (a partial function)
operational semantics describes how states are reduced on an abstract machine
  • small-step operational semantics is a binary relationship Config Config, describing how to reduce a configuration one step per time, until (possibly) eventually cannot be reduced anymore
  • big-step operational semantics is a binary relationship Config Final, describing how to reduce a configuration directly to the final value
denotational semantics describes the relationship on a mathematical model, i.e. translating a programming term into a mathematical function. For example, where
axiomatic semantics describes the relationship between properties before and after execution
  • partial vs. total correctness
Hoare logic consists of Hoare triples written as which respectively stands for precondition, program and post-condition under partial correctness assertion (PCA)
separation logic extends Hoare logic by stating the propositions over heap or more generally resources where the ownership of these resources to propositions matters
predication transformers is the wlp(c,) which transforms the Hoare triple to an equivalent predicate
-calculus whose BNF grammar is , follows a greedy policy when matching the -body; it is left-associative when dealing with application
  • -conversion: renaming the function
  • -reduction: application by substitution
    • however, this may lead to unsafe substitution due to name conflict, i.e. the parameter is in the free variables of the expression to apply. Therefore, we need -conversion to handle it
  • -calculus is confluent under -conversion and -reduction, which means that if and under different reduction strategies (so the reduction is nondeterministic), there always exists such that both and can reduce to
The reduction strategy in -calculus involves two deterministic CBV (call-by-values) and CBN (call-by-name)
  • CBV evaluates argument first, where most (functional) languages do
  • CBN evaluates function body first
  • full -reduction is generally non-deterministic since it can reduce any redex if possible
evaluation context simplifies the definition of semantics
The encoding of -calculus involves giving names to numerals, boolean, control statements, etc.
  • How about recursive functions? This uses the Y combinators which defines the fixed point to the recursive function, and hence encode the unbounded loops
  • To avoid infinite loops caused by CBV in Y combinators, one can use -expansion to wrap the self-application f f to delay its evaluation (since is already a value), but keep its original result when applying to any arguments. The modification is called Z combinators
  • There is also a corresponding -contraction by removing this pattern
  • There are infinite many combinators, among which combinators is coined by Turing
FL extends -calculus with some primitives and some clauses including lecrec in where the scope of each is the entire lecrec expression, so the recursive function is represented, and mutually recursive function is doable
  • Different from -calculus, there exists some terms that are neither value nor reducible, which are stuck, and will lead to a new thing, called runtime type error
  • The translation from FL to (CBV) -calculus requires not only semantics translation, but handling of the runtime type error
scope matters, where there are two types of scopes, take this as an example
  • static scope where the binding happens immediately, reduces to 3
    • most modern languages use static scoping since it benefits modular programming
  • dynamic scope where the binding happens until evaluation, reduces to 2
    • some dynamic languages use dynamic scoping and it is easier to implement
Notice that FL is stateless just as -calculus where there is no way to change the “variable” as soon as it gets assigned. Therefore, state is not needed for Turing-completeness, but a natural real-world reasoning tool, for example x := e and pointers ref e
continuation is the function taking the result of current expression e as the input, and perform the rest of computation. In essence, it is the evaluation context.
continuation passing style (CPS) is a recursive chain of continuations, which uniquely defines the order of evaluation, so it does not need evaluation context to specify it.
capture and save continuation provides a way to replace the current call stack with an old and saved one, e.g. call/cc and let/cc in Scheme (or Racket).
In CPS, recursive functions become tail-recursive, where in implementation, we substitute heap space for stack space.
typed -calculus has less expressive power as it constrains the validity of programs without ways to tackling around.
STLC is the one of the simplest typed -calculus
type inference is the process to infer types even if types are not explicitly specified. In a complicated type system, this problem can be undecidable.
  • unification is a substitution (or say unifier) such that for pairs of terms s and t, we have (s)=(t)
  • most general unification, if a unifier exists, is a unique substitution up to renaming, such that for every they are constructed with more substitutions from ; a recursive algorithm called Robinson's algorithm gives the MGU; this algorithm always gives MGU for any untyped expressions in STLC by performing the recursive substitution over the constraints from all subterms
subtyping along with subsume rules provide more flexibility to the expressiveness. The operational semantics for subtyping follows three types depending on the direction.
  • covariance
  • contravariance
  • invariance
    where g is any operations over the type, for example ref
recursive typing can refers the type itself, e.g. , with which STLC can express nontermination. is the way to express the fixed point for this type.
polymorphism extends the type with a universal type parameter, denoted as . For STLC it cannot express type since there does not exist such a type that can on behalf of any type (the same for most interesting PL). However, is of type as recursive type is allowed
normal form means that a -term cannot step further
Church-Rosser property (confluence) says that if a term has normal form, then it is unique up to -equivalence
strong normalization defines a term such that every -reduction sequence eventually produces normal forms, that is terminated
STLC is a language such that every term is strong normalizing (that is ), and hence it is less powerful than untyped -calculus
information flow control tracks and restricts the flow of information with different levels. Information cannot flow from higher levels to lower levels. Every values can have type where there is a tag (or say label) specifying the visibility of their information. For instance, labels can be high and low (or secret vs. public), and in real life, it can be kernel memory vs. user space memory.
By adding typing rules and operational semantics to tagged types, we can enforce the flow as described above (which is known as noninterference).
☝ Top