Formal Languages Automata and Computability CMPUT474

01 - Introduction

Recently, computing has escaped the need to happen within a formal system: LLMs let computation occur through natural language instructions. But, they are still computation, and thus must still obey its most general laws. This shows the utility of having abstract models of computation.

In the philosophy of this course, we define computation as mechanized mathematics. Math is the study of formal systems, described as mechanically checkable reasoning, i.e. by the vehicle of formal proof. Computer science studies the subset of this reasoning that can be mechanically generated as well. We also view a computer as a construct able to capture a universal formal system.

Fundamental lessons of this course:

Behavior of Simple Mechanisms

What is the difference between the description of a program and its behavior?

It is important to note that the description (e.g. code, etc.) of a program is something separate from the program's behavior. More on this later.

Cellular Automata

We can describe a 1D cellular automaton as a set of rewrite rules that act on a 1D array of bits over (discrete) time. To advance one generation, we iterate over memory and match the pattern around (below, \(n=3\)) each bit and replace it accordingly.

For some programs, predictable behavior emerges.

In theory, we could write a "simpler" program to output the same result; we can predict the state of any location in memory at a given time. In this sense, the pattern generated by the cellular automata is compressible.

However, some rules (famously, rule \(30\)) are "unpredictable" (although they may have some structure). There may be no way to predict the state of memory; executing the program is the only way to find out. So, the program cannot be "compressed" at all.

Simple Rules → Complex Behavior

In general, there is no way to predict the behavior of a program from its description, even when that description is simple.

Rule \(30\) specifically is a universal computer; it can be used to execute any computation (proven in 2000), and is thus the simplest known Turing-complete system.

Math is Hard

Mathematics is also, categorically, the study of consequences of simple sets of rules (axioms). Clearly, complex behavior can emerge from simple sets of axioms, so math may also be "incompressible", i.e. not amenable to general, overarching theories (sorry category theorists!).

Finiteness Vs Infiniteness

Every program is a string (a finite length sequence of characters from a finite alphabet \(\Gamma\)). Strings are a really nice way to describe programs; before this, we used the naturals, which was just gross. Read the original proof of Gödel's incompleteness theorem if you don't believe me.

Since strings are finite, the set of all programs must also be finite. But since there are infinitely many strings, there are infinitely many programs.

This does NOT mean that every problem can be solved by a program though.

Decision Problems

A decision problem is problem where it must be determined whether every string in an alphabet is "true" or "false" (e.g. graph isomorphism problem). Informally, a decision problem is a question that either has a true or false answer. This implies a subset, namely the subset of "true" strings. In fact, there is a one-to-one correspondence between subsets and decision problems.

decision : String -> bool

If we consider decision problems in general (i.e. over the set \(\mathcal{S}\) of all strings), we find that each subset of strings (i.e. each element of \(\mathcal{P}(\mathcal{S})\)) corresponds to exactly one decision problem. We know that \(\#\mathcal{S} < \# \mathcal{P}(\mathcal{S})\) even when \(\mathcal{S}\) is infinite, so there must be more decision problems than strings, and thus there must be more decision problems than programs.

Cantor's Theorem

We can't just compare the sizes of infinite sets "normally"; we find that sets have the same cardinality iff a bijection can be drawn between them.

Cantor's Theorem

Theorem
For any set \(S\), \(\# \mathcal{P}(S) > \# S\), i.e. no bijection \(f : S \to \mathcal{P}(S)\) exists.

The set of decision problems is the power set of the set of strings, since each string can either be true or false. A decision problem is a subset of the set of all strings. Programs are also strings. Every decision problem being describable by a program implies a bijection from the set of strings to the power set of strings, which we've just shown can't exist. Thus, there are decision problems that aren't describable with programs.

Self-Reference is Interesting

Cantor's theorem and the result that some decision problems don't have a program to describe them are both non-obvious; both problems are proving non-existence in an infinite domain. It is not a coincidence that both proofs involved self-reference:

Allowing and utilizing self-reference often cuts to the core of questions, yielding both weird behavior and important results. Particularly, self-reference is a shortcut to finding contradictions.

It is Always Possible to Write Non-halting Programs

As we've seen, it is possible to create universal programs (and relatedly, universal languages which can express any computation → whose interpreter must be a universal program). However, it is impossible to write a universal programming language in which every program halts. The existence of universal programs implies non-halting programs.

We use a diagonal proof. List all countably infinite programs expressible by the language \(p_1, \dots\), then through some traversal run each program with each program's string representation as input (a countably infinite traversal trivially exists). Define traversal \(V\) that acts as normal unless a program is run with itself as input, in which case it negates the result. \(V\) is a program we can write, so it is expressed by \(p_i\) for some \(i\in \mathbb{N}\), meaning we evaluate \(V(V)\) at some point. This calls \(V\) again, which eventually evaluates \(V(V)\), which calls \(V\) again, etc. Clearly, this program recurses forever and never returns.

Essentially, any language complicated enough to support universal computation also supports some recursive or loop structure, and thus infinite loops can be created in it.

Semantics is Hard Too

Questions about the semantics (meaning/behavior) of programs are susceptible to halting-style diagonal arguments, and thus often provably impossible to answer in the general case, i.e. write a program to answer.

Undecidable Questions about Semantics

Example

These limits exist on every formal system, e.g. programming languages. Your static code analyzer can only get so good.

However, questions about syntax (description) are less susceptible to these arguments.

02 - Number Systems and History of Computation

Real Numbers

Wait, why does this course have a real analysis co-req?

Construction

Before being able to use real numbers, we need to understand them, and thus we need to construct them. To bake an apple pie from scratch, one must invent the universe; we start by constructing \(\mathbb{N}\).

\(\mathbb{N}\) is constructed by the Peano axioms.

We extend \(\mathbb{N}\) to \(\mathbb{Z}\) by requiring closure under additive inverses.

We extend \(\mathbb{Z}\) to \(\mathbb{Q}\) by requiring closure under multiplicative inverses. \(\mathbb{Q}\) is defined as the classes defined by the equivalence relation \((a,b)\sim(c,d)\iff ad=bc\) over \(\mathbb{Z}\times \mathbb{Z}\), with definitions of addition, multiplication, etc. to match.

\(\mathbb{Q}\) is archimedean and dense in \(\mathbb{R}\), but almost all real numbers are not rational. Proving this is a nontrivial exercise in real analysis; we prove existence of an irrational number, namely \(\sqrt{2}\) as is customary.

Proof that \(\sqrt{2}\) is irrational

Proof
Suppose \(\sqrt{2}=\dfrac{p}{q}\) for coprime \(p,q\) (so \(\dfrac{p}{q}\) is in simplest form). Then \(p^2=2q^2\), so \(p^2\) and thus \(p\) are even. We assumed \(\dfrac{p}{q}\) is in simplest form, so \(q\) must be odd. \(p\) is even, so \(p=2r\) for \(r \in \mathbb{N}\). So, \(p^2=(2r)^2=2q^2 \implies 2r^2 =q^2\), so \(q^2\) and thus \(q\) are even. Absurd.

Any equation involving addition, multiplication, and their inverses can be solved in \(\mathbb{Q}\), but equations of the form \(x^n=a\) aren't solvable in general. Adding radicals \(\surd\) to \(\mathbb{Q}\) doesn't solve everything either; polynomials of the degree \(5\) and higher aren't generally expressible in terms of the operations we have (a result of Galois theory). Again, we can define the algebraic numbers \(\mathbb{A}\) as the algebraic closure of the rationals, but not only does this not contain all of \(\mathbb{R}\) (i.e. transcendentals like \(\pi\) and \(e\)), but we now have members of \(\mathbb{A}\) not in \(\mathbb{R}\), e.g. \(\sqrt{-1}=:i\).

We finally construct \(\mathbb{R}\) from \(\mathbb{Q}\) by considering all the possible limits of (Cauchy) sequences of rational numbers, and including these in the number system as well. E.g. although \(e\) is transcendental, we do have \(e:=\displaystyle\lim_{n\to \infty}\left(1+\dfrac{1}{n}\right)^n\). \(\mathbb{R}\) is the closure of \(\mathbb{Q}\) under the inclusion of limits of cauchy sequences.

There's really no reason to stop here. We define the complex numbers \(\mathbb{C}\) as the algebraic closure of \(\mathbb{R}\), i.e. the set of solutions to any algebraic equations in \(\mathbb{R}\) (much like \(\mathbb{A}\) is a algebraic closure of \(\mathbb{Q}\)).

In general, new number systems are motivated by closure under some operation or construction. We can generalize the idea of a closure operator.

\(\mathbb{R}\)? In My Computations? It's Less Likely than You Think

Not all numbers are computable, i.e. for which a terminating algorithm for approximating them to arbitrary precision exists. By definition, if a closed form limit exist, we can compute it. Clearly, it's ontologically difficult to describe an uncomputable number. But, since there are countably finitely many programs and uncountably infinitely many reals, most reals are uncomputable.

Clearly we can't use the cauchy sequences since we need infinitely many rationals to represent an uncomputable real, and dealing with limits is annoying for precise computation anyway. We have this problem with \(\mathbb{R}\) because its "closure operator" is real-analytic, as opposed to a clean, "algebraic" operator.

Cantor's Diagonal Argument

Proof
We consider the interval \([0, 1)\) in \(\mathbb{R}\). Each number in this interval can be represented as a infinite "base-2-decimal" sequence of \(1\)s and \(0\)s. Assume a bijection between \(\mathbb{N}\) and \([0,1)\). Construct a new real \(\alpha\) by toggling the first number at the first decimal place, the second one at the second decimal place, and the \(n\)th number at the \(n\)th decimal place. By construction, this real number is different from all the other ones in the bijection; so, the bijection cannot exist between \(\mathbb{N}\) and \([0, 1)\).

Around this time, paradoxes were appearing in math that threatened the foundational assumptions of the time, namely that all of math could be built atop set theory:

Emergence of Computation

The field of computer science emerged from a desire by the mathematical community to construct all of math from arithmetic on natural numbers, much like how \(\mathbb{Z}\), \(\mathbb{Q}\), \(\mathbb{R}\), etc. were constructed.

Church-Turing thesis

Theorem
An "effective" method for performing arithmetic on the natural numbers can be described as / reduced to a Turing machine.

03 - Finite State Computation

We start by defining the primitives we will use to construct a theory of computation

Finite State Computation

A finite state computation is a computation that takes a string \(w\in \Sigma^{*}\) and either accepts or rejects it following computation by a finite automaton.

Intuitively, a finite automaton is defined by a set of states (what the computation has come to at this point in time) and a function that defines a new state based on the current state and the current input being read from the input string.

Finite Automaton (FA)

Definition
A finite automaton \(M\) is a \(5\)-tuple \(M=(Q, \Sigma, \delta, q_0, F)\) with

An FA can be expressed as a table indexed by the set of states and the set of characters that can form strings; the entries define the transition function at those coordinates.

More commonly, an FA is expressed as a state diagram, where states are drawn as nodes and the transition functions is defined by arrows connecting the nodes. Accepting states are drawn with double stroke.

Computation

Definition
Given finite automaton \(M\) and input string \(w = \sigma_1\sigma_2\dots \sigma_n\), the execution of \(M\) on \(w\) is the sequence of states (configurations) \(c_0c_1\dots c_n\) such that for all \(0 \leq i \leq n-1\), we have \(c_0=q_0\) and \(c_{i+1}=\delta_M(c_i, \sigma_{i+1})\), i.e. these states could happen in sequence.

Since we assumed the input string is of finite length and the execution is in lockstep with the input string's symbols, all computations will halt! In fact, if there are \(n\) input symbols, there will be exactly \(n-1\) steps in the computation; this property is very useful, e.g. in the proof of THM 2 later.

A state can be thought of as an equivalence class over the entire history of the computation. Namely, the representative of the "history class" is determined by its final state. In other words, there is no "memory" or "side effects" that affect how state is interpreted; behavior is determined entirely by the current state and input symbol

Languages

Given an alphabet \(\Sigma\), a language \(\Gamma\) is a subset of the set \(\Sigma^{*}\) of all strings, i.e. \(\Gamma\subseteq \Sigma^*\)

Examples of languages:

Constructing a FA to recognize a language will be familiar to anyone who has taken CMPUT 415.

Cardinality Questions

A given automaton recognizes exactly one language, but a regular language has infinitely many FAs that recognize it (e.g. we can split edges, etc.)

The set of strings \(\Sigma^*\) of finite alphabet \(\Sigma\) is countably infinite, but since a program is characterized by which strings from \(\Sigma^{*}\) it accepts or rejects, the set of all languages over \(\Sigma\) is isomorphic to \(\mathcal{P}(\Sigma^{*})\), and thus is uncountably infinite. So, there are more languages than strings.

The set of all finite automata is also countably infinite since the input string, number of states and (therefore) size of transition functions are finite or countably infinite. By the same argument, there are more languages than finite automata. So, almost all languages are not regular.

Languages are isomorphic to decision problems over strings; languages \(\cong\) subsets of strings \(\cong\) decision problems \(\cong\) unary relations. In all of these cases, we can use the same \(|S| < |\mathcal{P}(S)|\) reasoning.

Regular Expressions

A regular operation is a binary operation \(\square : \mathcal{P}(\Sigma^{*}) \times \mathcal{P}(\Sigma^{*}) \to \mathcal{P}(\Sigma^{*})\) between languages that preserves regularity. I.e. if languages \(A\) and \(B\) are regular and \(\square\) is a regular operation, then \(A \, \square \, B\) is also a regular language.

Operation Name Symbol Definition
Complement \(\overline{\Gamma}\) \(\overline{\Gamma} := \set{w \mid w \not\in \Gamma}\)
Intersection \(\Gamma \cap \Lambda\) \(\Gamma \cap \Lambda:=\set{w\mid w \in \Gamma \land w \in \Lambda}\)
Union \(\Gamma \cup \Lambda\) \(\Gamma \cup \Lambda:=\set{w\mid w \in \Gamma \lor w \in \Lambda}\)
Concatenation \(\Gamma\Lambda\) or \(\Gamma \circ \Lambda\) \(\Gamma \Lambda :=\set{ab\mid a\in \Gamma \land b\in \Lambda}\)
(Kleene) Star \(\Gamma^{*}\) \(\Gamma^{*}:=\displaystyle\bigcup_{n=0}^{\infty}\set{w_1 w_2 \dots w_n \mid w_i \in \Gamma}\)

Theorem 1

Theorem
The class of regular languages is closed under complement \(\overline{\Gamma}\).

Theorem 2

Theorem
The class of regular languages is closed under intersection \(\Gamma\cap\Lambda\).

04 - Nondeterministic Computation

We generalize finite state machines into non-determinism; we now differentiate between deterministic FAs (DFAs) and nondeterministic FAs (NFAs).

Nondeterministic Finite Automaton (NFA)

Definition
A nondeterministic finite automaton \(M\) is a \(5\)-tuple \(M=(Q,\Sigma,\delta,q_0,F)\) where

We define \(\varepsilon\) as the null character, which allows \(\varepsilon\)-transitions (spontaneous state transitions) that don't process the next input symbol. We define \(\Sigma_\varepsilon := \Sigma\cup \set{\varepsilon}\).

Generalizing \(\delta\) from a function to a relation is what makes an automaton non-deterministic; a function is a special kind of relations where each "input" uniquely determines an output (i.e. all the subsets contain a single element), whereas a relation allows multiple elements that may be freely chosen between for a given "input".

Since a given state/input pair may have multiple successors, the shape of our program's execution generalizes from an execution sequence to an execution tree of state/input pairs, which is defined analogously to an execution sequence. Program \(M\) accepts string \(w\) iff the execution tree contains a configuration with an accepting state.

DFAs and NFAs Have the Same Computational Power

Two machines \(M_1\) and \(M_2\) are equivalent if they recognize the same language.

DFAs and NFAs are Equivalent

Theorem
Every NFA \(M\) has an equivalent DFA \(\tilde{M}\).

By corollary, the following statements are logically equivalent:

Regular Expressions (cont.)

By establishing an equivalence between DFAs and NFAs, we can use the new primitives of DFAs to prove statements NFAs/regular languages. In particular, we use \(\varepsilon\)-transitions to show that regular languages are closed under union, concatenation, and the Kleene star.

Theorem 3

Theorem
The class of regular languages is closed under union \(\Gamma\cup \Lambda\).

Theorem 4

Theorem
The class of regular languages is closed under concatenation \(\Gamma\Lambda\)

Theorem 5

Theorem
The class of regular languages is closed under the Kleene star \(\Gamma^{*}\)

05 - Regular Expressions

Regular Expressions

A regular expression is an expression formed of regular operators operating on regular languages. Thus, the language of regular expressions is itself a regular language; we see regular expressions as a meta-language that can construct regular languages syntactically, as opposed to the "semantic" way (listing the strings in the language).

We recursively (in order of precedence) define \(R\) as a regular expression if

We define \(\mathcal{L}(R)\) as the language specified by regular expression \(R\), implicitly with alphabet \(\Sigma\). We also define the shorthand \(R=\Sigma\) for shorthand \(R=\sigma_1 \cup\dots\cup \sigma_{|\Sigma|}\), i.e. the language formed by the symbols of \(\Sigma\).

Examples of regular expressions (for \(\Sigma=\set{0,1}\))

Identities for Regular Expressions

Lemma

Regular Expressions and Regular Languages are Equivalent

Theorem
A language \(\Gamma\) is regular if and only if there exists regular expression \(R\) such that \(\mathcal{L}(R)=\Gamma\).

Proof sketch \(\impliedby\): We simply construct and compose NFAs as we did when we defined the regular operators; we have more or less proven everything we need already. For the base cases, we have:

We must define more concepts to prove \(\implies\).

Generalized NFAs

Generalized NFA

Definition
A generalized NFA (GNFA) \(N\) is a \(5\)-tuple \(N=(Q,\mathcal{R},\delta,q_0,q_a)\) where

A GNFA accepts input \(w\) iff there exists a sequence \(q_0, c_1,\dots,c_{k-1},q_a\) of configurations where \({w_{i+1}\in \mathcal{L}(\delta(c_i, c_{i+1}))}\) for \(i\in\set{0, \dots, k-1}\). So, moving to a new state requires that the regular expression defined by the pair of states (in \(\delta\)) recognizes the input that it consumes.

DFAs are a subset of GNFAs

Lemma
For any DFA, we can construct a GNFA that recognizes the same language.

Reduction Procedure for a DNFA

Algorithm
If the DNFA has two states, we are done.

Otherwise, choose a state (that isn't the initial or final one) and contract (in the graph theory sense) the DNFA with respect to that state. This requires coalescing the regular expression (where \(q_\times\) is the state to be remove and \(q_i,q_j\) any pair of REs on "either side" of \(q_\times\)):

We keep contracting until there are only two states left.

\(\mathcal{L}\) is Invariant under Reduction

Lemma
For any GNFA \(N\), \(\mathcal{L}(\text{Reduce}(N))=\mathcal{L}(N)\). Furthermore, if \(N\) has two states, then \(\mathcal{L}(N)=\mathcal{L}(\delta(q_0, q_a))\)

diagrams

TODO: add a table with all the NFA diagrams to the appendix in a "constructing NFAs from regular expressions section

06 - Limits of Finite State Computation

Not Everything Can Be Finite-state Computed

The language \(\Gamma_1:=\set{0^{n}1^{n}\mid n\geq 0}\) (i.e. language formed by a substring of \(0\)s followed by a substring of \(1\)s of the same length) is cannot be regular because as \(n\to \infty\), we would need an infinite amount of states (or memory) to "remember" \(n\).

Clearly, any finite language is regular because we can "hard-code" each string into the FA directly, so all non-regular languages must be infinite. Furthermore, if a language is infinite, it has no bound on the length of strings, so its regular expression representation must use a Kleene star. Specifically:

Pumping Lemma for Regular Languages

Theorem
For regular language \(\Gamma\), there exists length \(p\) such that for any string \(w\in \Gamma\) longer than \(p\) (i.e. \(|w| > p\), we can partition \(w\) as \(w=xyz\) where

I.e. for sufficiently long strings in \(\Gamma\), there is always a substring that can be repeated arbitrarily.

Proof sketch (general idea): Let \(\Gamma\) be regular and recognized by DFA \(M\). Pick \(p = |Q_M|\). If we have some string \(w\) that is longer than \(p\), the computation of \(M\) with \(w\) as input must pass through at least one state (denoted \(q_!\)) twice by the pigeonhole principle, and thus a loop exists. Let \(\sigma_!\) be the input symbol when the loop is first encountered. If we were traversing the DFA "for fun", we could choose to follow the loop as many times as we wanted. Thus, any modification of \(m\) from arbitrarily duplicating the substring corresponding to the loop will also be recognized by \(M\).

Note that the pumping lemma is necessary for regularity, but not sufficient: there are irregular languages that conform to the pumping lemma.

Proving a Language Isn't Regular

Using the Pumping Lemma

If a language is regular, it must adhere to the pumping lemma. Thus, by contrapositive, if a language \(\Gamma\) has a string \(w\) such that all \(w=xyz\) decompositions have some \(i\) such that \(xy^{i}z\not\in \Gamma\), then \(\Gamma\) can't be a regular language.

Example: Consider \(\Gamma_1:=\set{0^{n}1^{n}\mid n\geq 0}\); assume towards contradiction that it is regular. Then a DFA \(M_{\Gamma_1}\) with some \(p\in \mathbb{N}\) states that recognizes \(\Gamma_1\) exists. Consider the string \(w_\times:=0^{p}1^{p}\in \Gamma_1\). Any decomposition \(w_\times = xyz\) where \(|y|>0\) and \(|xy|\leq p\) implies \(y=0^{k}\) for some \(k\leq p\), since \(xy\) itself must be a string of \(0\)s (since there are \(p\) \(0\)s). But, since all the \(1\)s in the string are in \(z\), the string \(xy^{2p}z\) must have more \(0\)s than \(1\)s, and thus \(xy^{2p}z\not\in \Gamma_1\). If \(\Gamma_1\) were regular, we would have \(xy^{2p}z\in \Gamma_1\) by the pumping lemma. So, \(\Gamma_1\) is not regular.

Using the Recursive Nature of Regular Expressions

Closure!

If \(\Box\) is a regular operator and \(\Gamma_{\text{Reg}}, \Lambda_{\text{Reg}}\) are regular languages, then we know \(\Xi:=\Gamma_{\text{Reg}} \Box \Lambda_{\text{Reg}}\) is also regular. Thus, by contrapositive, if \(\Xi_{\lnot \text{Reg}}:=\Gamma\Box\Lambda\) is not a regular language, then at least one of \(\Gamma\) and \(\Lambda\) isn't a regular language as well. More specifically, id \(\Xi_{\lnot \text{Reg}}:=\Gamma_\text{Reg}\Box\Lambda\) for regular \(\Delta_\text{Reg}\), then \(\Lambda\) must not be regular.

So, to prove some \(\Gamma\) is not regular, we define a language known not to be regular as the product of a regular operation on a regular language and \(\Gamma\).

Example: Consider \(\Gamma_2\) as the language of strings with alphabet \(\Sigma=\set{0,1}\) that have the same number of \(0\)s as \(1\)s. Note that \(\Gamma_1=\Gamma_2 \cap L(0^{*} 1^{*})\). Clearly \(L(0^{*} 1^{*})\) is regular (it is defined by a regular expression), and we know that \(\Gamma_1\) is not regular. So, \(\Gamma_2\) must be irregular as well.

Example: Consider \(\mathsf{R}\) as the language of regular expressions (specifically, where \(\varepsilon\) is a valid regular expression) and \(\mathsf{P}\) as the language consisting of strings of parens, i.e. \(\mathsf{P}:=\set{(,)}^{*}\). Clearly \(\mathsf{P}\) is regular. We note that \(\mathsf{R}\cap\mathsf{P}\) is the set of balanced parenthesizations; it is common knowledge that this language is not regular. Thus, since \(\mathsf{R}\cap\mathsf{P}\) is not regular but \(\mathsf{P}\) is, it must be that \(\mathsf{R}\) is not regular.

Note: closure arguments are recursive; proving a language is regular is achieved by proving some other (presumably simpler) language is regular. Using the pumping lemma is the only "base case" that we have other than direct proof, so all roads will lead to it eventually.

07 - Pushdown Automata

Finite-state computation can be generalized into a more powerful model of computation by removing the requirement that the set of states is finite. Of course, the program specification must still be finite, so we can't just use an infinite lookup table. But, if the state transition function/relation has structure, it may have a finite amount of information, and thus could be expressed finitely.

We extend our model by adding finitely-addressable memory (i.e. countably infinite memory, where addresses are in \(\mathbb{N}\)) that can be used to store the state of the computation.

A natural data structure to store in memory is a stack, where a stack counter stores the address of the top of the stack. Thus, although the stack can grow without bound, changes are local, i.e. always occur at the stack pointer, which moves "continuously in \(\mathbb{N}\)".

At each step of the computation, a pushdown automaton will advance \(1\) step on the input string, read the character, pop from the stack, and use this information to advance to a new state and push another symbol onto the stack.

Deterministic Pushdown Automata (DPDA)

Deterministic Pushdown Automata (DPDA)

Definition
A deterministic pushdown automaton (DPDA) is a \(6\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, F)\) with

I.e. it is an extension of a DFA to include a stack, which has its own alphabet and factors is a dimension of the transition function.

More on the transition function \(\delta\):

As with before, a configuration is a triple \(c=(q \in Q, u\in \Sigma^{*}, s\in \Gamma^{*})\) and an execution trace of string \(w\in \Sigma^{*}\) is the (unique) sequence of configurations generated by the computation on \(w\). \(M\) accepts \(w\) if it reaches a configuration \(c_\ell=(q_k , w, s)\) where \(q_k\in F\).

Clearly, the set of FAs is isomorphic to the subset of DPDAs that simply don't interact with the stack, i.e. given by \(M_\text{FA}:=(Q, \Sigma, \varnothing, \delta, q_0, F)\).

What's New?

Fundamentally, stacks let us traverse trees; in the degenerate cases of a tree being a path (i.e. a linked list, for the data structures brainrot enjoyers among us), a stack lets us count whatever patterns we want. So, languages like \(\set{0^{n}1^{n}\mid n\geq 0}\) can be recognized by DPDAs:

Nondeterministic Pushdown Automata (PDA)

Deterministic Pushdown Automata (PDA)

Definition
A deterministic pushdown automaton (PDA) is a \(6\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, F)\) with

Like before for DFA → NFA, \(\delta\) is extended such that a given configuration could lead to multiple other configurations, or none at all. So, computations are not deterministic and the execution structure is a tree, not a sequence.

Also like before, since the the set of functions \(A \to B\) is a subset of of the set of relations over \(A\times B\), the set of DPDAs is a subset of the set of PDAs.

However, unlike before, PDAs are strictly more powerful than DPDAs; they are not equivalent like DFAs and NFAs are! So, adding a stack augments the power of NFAs more than the power of DFAs.

Asides:

I'm guessing some of this will get covered later remove if it is

With the memory model, we can use a lot more than just one stack:

08 - Context-Free Grammars

Reese lecture :))

So far, we've used specifications to generate target languages and defined programs that can mechanically recognize strings against a specification. For a given "language class", bijective pairs of specification and program structures exist, i.e. where every specification can be implemented as a program and every program implements a specification.

This pattern will continue: context-free languages are described by context-free grammars and can be recognized by pushdown automata (more on this in this lecture)

Context-free Grammars

Context-Free Grammar

Definition
A context-free grammar (CFG) \(C\) is a \(4\)-tuple \(G:=(V, \Sigma, E,S)\) where:

A derivation in a CFG \(G\) is performed by picking a start variable and iteratively substituting variable occurrences using production rules. Each derivation implies a parse tree whose nodes are strings (possibly containing variables) and whose edges are production rules.

A string \(w\) is generated by \(G\) (denoted \(S_G  \overset{*}{\Rightarrow} w\)) if it can be derived from \(G\) and only contains terminals. The language \(\Gamma_G\) of grammar \(G\) is set set of strings generated by \(G\), i.e. \(\set{w\in \Sigma \mid S \overset{*}{\Rightarrow} w}\); if \(G\) is a context-free grammar, then \(\Gamma_G\) is a context-free language. A context-sensitive language is a language that isn't context-free.

Context-free grammars generalize regular expressions; CFGs are to pushdown automata as regular expressions are to finite automata. The extra power of CFGs comes from self-reference; we are allowed to bind a name to a syntactic construction (rule) and reference it later, which opens the door to recursion and the computational oomph that it entails.

Our regular (lol) question persists: given a language, can we write a regular expression grammar for it?

Examples and Applications

Canonical examples of context-free grammars that aren't regular include the language of palindromes (over some alphabet \(\Sigma\)), the language of well-balanced parenthesizations (the Dyck language), and languages like \(\set{a^{n}b^{n}\mid n\in \mathbb{N}}\).

Grammars are also notable because they can be used to describe programming languages (take CMPUT 415 for more on this in practice). Most languages are mostly context-free, although many have context-sensitive quirks.

CFGs also let us create decent models of natural language:

Notation

For larger grammars, it makes sense to write them over multiple lines, and adopt "code" typesetting (as opposed to mathematical typesetting). Backus-Naur form is standard, but I prefer to use ANTLR's syntax for its brevity and applicability. Note that ANTLR4 has some syntactic sugar, like regular expressions as nonterminals.

Example: Grammar for common LISP
Adapted from https://github.com/antlr/grammars-v4/blob/master/lisp/lisp.g4


lisp_ : s_expression+ EOF;

s_expression
    : ATOMIC_SYMBOL
    | '(' s_expression '.' s_expression ')'
    | list
    ;

list : '(' s_expression+ ')';

ATOMIC_SYMBOL: LETTER ATOM_PART?;

ATOM_PART : (LETTER | NUMBER) ATOM_PART;
LETTER: [a-z];
NUMBER: [1-9];
WS: [ \r\n\t]+ -> skip;

If we must use math notation, we can coalesce rules starting with the same variable by writing them as alternatives, delimited by \(\mid\). So, \(\set{S \to \varepsilon, S \to (S), S\to SS}\) can be written succinctly as \(S\to \varepsilon\mid (S) \mid SS\).

Elementary Results

CFGs are a superset of regular expressions

Theorem
Given a regular expression \(R\) that recognizes language \(\Gamma\), a context-free grammar \(G\) can be written that generates \(\Gamma\).

By corollary, the class of regular languages is a subset of the class of context-free languages.

Closures of CFGs

Theorem
The class of context-free languages is closed under union, concatenation, and the Kleene star.

Proof: for each operator \(\square\), we consider the language formed by combining the sets of variables and rules of the two languages \(\Gamma\) and \(\Lambda\); we define \(\Gamma\, \square \, \Lambda\) to have start variable \(s\).

Each time, just like with regular expressions, we defer the operations to the corresponding constructs for building grammars.

Whoopsie, made a oopsie

Warning
Context-free languages are not closed under intersection or complement.

Ambiguity

A grammar \(G\) is ambiguous if the same terminal string can be derived in different ways, i.e. if a given string in the language \(\Gamma_G\) has multiple different parse trees.

Context-free languages can be ambiguous; a context-free language is inherently ambiguous if every grammar that describes it is ambiguous.

Chomsky Normal Form

Canonical form time 😎

A context-free grammar is in Chomsky normal form when every rule is of the form \(A \to BC\), \(A \to a\), or \(S \to \varepsilon\) for variables \(A,B,C\) (where \(B,C \ne S\)) and terminal \(a\).

Chomsky normal form is a normal form

Theorem
Any context-free language can be generated by a context-free grammar in Chomsky normal form.

We prove by construction; the following is a complete list of language-preserving transformations over grammatical rules:

We can convert a grammar to Chomsky normal form by repeatedly applying these rules until no replacements can be made; the number of such replacements is finite. So, we are performing fixpoint iteration.

Chomsky normal form is useful because it requires parse trees to be binary. Thus, there is a strong relation between the length of a string and the height and/or number of nodes in its parse tree.

09 - Equivalence of PDAs and CFGs

Our central theorem:

PDAs and CFGs are Equivalent

Theorem
A language can be recognized by a pushdown automaton if and only if it can be generated by a context-free grammar.

For this proof, we use the notion of a generalized pushdown automaton (GPDA), which can push entire strings to the stack instead of just characters.

Equivalence of GPDAs and PDAs

Algorithm
GPDAs are equivalent to PDAs. All PDAs are trivially also GPDAs, so we prove by defining a procedure to construct an equivalent PDA given a GPDA:

We also define the restricted pushdown automaton (RPDA), which is a PDA with a single accepting state, empty stack upon termination, and where every stack action is either a single push or single pop.

Equivalence of RPDAs and PDAs

Theorem
RPDAs are equivalent to PDAs. All RPDAs are trivially also PDAs, so we prove by defining a procedure to construct an equivalent RPDA given a PDA. Specifically, we provide rewrite rules for "undesirable" constructs in a PDA:

Lemma I: CFG \(\implies\) PDA

We can draw an equivalence between the steps in a leftmost derivation performed in \(G\) and a sequence of configurations in some PDA \(M\). Then, we push build a string by pushing and popping strings on \(M\)'s stack.

We define a GPDA that traverses through a string and pushes the corresponding rule bodies from variables to the stack. In this way, it performs the leftmost derivation. So, an accepting sequence of configurations exists in \(M\)'s computation of \(w\) if and only if a leftmost derivation of \(w\) exists in \(G\).

We know that GPDAs are equivalent to PDAs, so can construct a PDA with the same behavior as the above GPDA. So, we've shown that given a CFG, we can construct a PDA that recognizes a string if and only if it can be generated by the CFG.

Lemma II: PDA \(\implies\) CFG

We prove again by construction. Given an RPDA, we can construct a CFG as follows:

review this and add more intuitive explanation!!

First, we claim that if \((q_a, w, \varepsilon) \overset{*}{\to} (q_b, wx, \varepsilon)\), in the RPDA/PDA, then \(A_{q_a,q_b}\overset{*}{\Rightarrow} x\) in the CFG, then the converse. Both can be proven by induction; our base case is when both configurations are actually the same (i.e. where \(x=\varepsilon\)).

add proofs!!

10 - Limits of stack augmented computation

Although they are more powerful than FAs, PDAs are not universal computers either; there exist algorithms that can't be adapted for PDAs. Since PDAs are equivalent to CFGs, we will reason about CFGs below.

Pumping Lemma for CFGs

Pumping Lemma for CFGs

Theorem
For context-free language \(\Gamma\) described by CFG \(G = (V, \Sigma, R,S)\), there exists length \(p\) such that for any string \(w\in \Gamma\) longer than \(p\) (i.e. \(|w| > p\)), we can partition \(w=uvxyz\) where

I.e. for sufficiently long strings in \(\Gamma\), there is always a substring that can be repeated arbitrarily.

Since \(G\) has a finite number of variables and each substitution increases the size of a generated string by a finite amount, there is an upper bound (here, \(p\)) on how long a string can get when each rule substitutes a given variable at most once. Thus, if a string is longer than this \(p\), there must be variable \(A\in V\) that gets substituted more than once over the course of a derivation, i.e. a rule \(R\ni \mathcal{R} : A \to \dots\) is applied more than once.

Proof

We consider the possible height of a parse tree for a string in language \(\Gamma(G)\) for CFG \(G\).

In this context, the branching factor \(b\) of the parse tree is the maximum length of an RHS rule in \(G\); a tree of height \(h\) can have at most \(b^{h}\) leaves. Also note that a string of length \(\ell\) needs ==height at least \(b/\ell\), and thus height of at least \(\log_b{\ell}\) to generate.==

If a rule isn't repeated, a tree of height at most \(|V|\) can exist. So, if \(h \geq \log_b{\ell} \geq |V| + 1\) \(\iff b^{h}\geq \ell \geq b^{|V| + 1}\), then some path \(S \to \text{leaf}\) exists where some variable \(A\) repeats.

Pick \(p:=b^{|V| + 1}\) and string \(w\in \Gamma(G)\) of length \(\ell\) where \(\ell \geq p \implies \ell \geq b^{|V|+1} \geq b^{|V|}+1\). The smallest (by number of nodes) parse tree for \(w\) must have height at least \(|V|+1\), and thus some path \(T\) of length \(\geq |V|+1\). So, by pigeonhole principle, some variable \(A\) exists that repeats in the path; denote the instances \(A^1\) and \(A^2\).

We can partition \(w\) into \(uvxyz\) where \(A^{1}\) produces \(vxy\) and \(A^{2}\) produces \(x\).

Proving a Language Isn't Context-free

The strategy for proving a language isn't context-free is more or less the same as proving it isn't regular using the pumping lemma: we

E.g. it can ergonomically be proven that \(\Gamma:=\set{ww \mid w\in \set{0,1}^* }\) is not context free by choosing string \(w:=0^{p}1^{p}0^{p}1^{p}\) (where \(p:=p_G\)).

Proving via closure under operations, while strictly possible, isn't as useful for context-free languages as it is for regular languages because context-free languages aren't closed under intersection. The most useful closure arguments involve intersections of regular languages.

11 - Turing Machines

q_cross and q_checkmark for accept/reject, better underline syntax, pick suitable font for left/right

Models of Computation

Earlier, we generalized from FAs to PDAs by equipping them with a stack. Now, we can further generalize PDAs to a Turing machine by replacing the stack with a more general data structure, i.e. one that can simulate a stack. Specifically, TMs have a tape: an infinite, finitely addressable tract of memory cells that an automaton can read from, write to, and move along in either direction.

Just like for PDAs, although their memory is unbounded, Turing machines allow infinite decision problems to be defined finitely. In fact, Turing machine are the strongest such automata:

Church-Turing Thesis (Lite)

Theorem
A Turing machine can simulate any algorithm, mechanical procedure, mechanical derivation, and program.

Coding an algorithm as a Turing machine is the theoretical CS equivalent of writing a computer program in assembly (or by pushing transistors around). In both cases, you are reaching into the entrails of a machine and building your program bit by bit by bit. Designing Turing machines is hard for the same reason that writing assembly is hard.

We also notice a pattern while designing Turing machines: most programs consists of a search/seek operation, then a write operation, then search/seek, then write, etc.

Turing Machines

Turing Machine (TM)

Definition
A Turing Machine \(M\) is a \(7\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, q_\text{accept}, q_\text{reject})\) with

The state of a TM at a moment in time (a configuration) can be described as \(aqb\) where \(a,b\in \Gamma^{*}\) are the strings that list the characters on either side of the read/write head and \(q\in Q\) is the current state. A computation is a "physically valid" sequence of configurations.

Turing machine \(M\) accepts string \(w\) if \(M\) can reach configuration \(c_k=uq_\text{accept}v\), for \(k\in \mathbb{N}\) (i.e. \(k < \infty\)) and strings \(u,v\in \Gamma^{*}\). \(M\) rejects if it can reach a configuration \(c_k=uq_\text{reject}v\). If neither of these happen, the computation does not halt on \(w\).

Example: \(\Lambda:=\set{w\# w\mid w\in \set{0,1}^{*}}\)

Recognizing and Deciding

Again, for Turing machine \(M\), we define \(\mathcal{L}(M)\) as the language of string accepted by \(M\). So, \(\overline{\mathcal{L}(M)}\) is the language of strings that are either rejected by \(M\) or for which \(M\) doesn't halt.

Turing machine \(M\) decides \(\mathcal{L}(M)\) iff \(M\) halts for all possible input, i.e. for all \(w\in \Sigma^{*}\).

Language \(\Lambda\) is Turing-recognizable if \(\Lambda=\mathcal{L}(M)\) for some Turing machine \(M\). \(\Lambda\) is Turing-decidable if it is recognizes by a Turing machine \(M\) that halts on every \(w\in \Sigma^{*}\supseteq \Lambda\).

Non-deterministic Turing Machines

As with every model of computation we've seen so far, non-determinism is achieved by generalizing the transition structure \(\delta\) from a function to a relation. Indeed, a non-deterministic Turing machine is defined the same way:

Non-deterministic Turing Machine

Definition
A Non-deterministic Turing machine is similar to a Turing Machine, but where the transition function \(\delta\) is replaced with a transition relation \(\delta \subseteq Q \setminus \set{q_\text{accept},q_\text{reject}} \times \Gamma\times Q\times \Gamma\times\set{\mathsf{L},\mathsf{R}}\), i.e. a function \(\delta : Q \setminus \set{q_\text{accept},q_\text{reject}} \times \Gamma\to \mathcal{P}(Q\times \Gamma\times\set{\mathsf{L},\mathsf{R}})\)

Again, like before, computations for nondeterministic Turing machines take the form of trees.

12 - Equivalence of Turing machine models

lots of this wording kinda sucks/isn't concise

The "basic" Turing machine described earlier can simulate any extended Turing machine below. So, when we reason about basic Turing machines, we can augment them with any of these extensions without weakening our statement:

Moreover, we can further restrict the basic model without giving up computational power

Extensions and Their Models

Adding a \(\text{STAY}\) Action

Every time we see a stay action, we simulate it with a left \(\mathsf{L}\) movement, then a right \(\mathsf{R}\) movement. The initial left movement writes the new symbol to the initial space; the right movement doesn't draw a new symbol.

Adding Insert/Delete

At any given point, we have passed some finite number \(n\) of configurations; our input string \(w\) has some finite length \(|w|=m\) as well. So, at every point in a computation, there is some \(M \leq w + m\) such that every symbol past address \(M\) on the tape is a blank.

To insert, we loop from our current position. Each iteration, we keep track of the current symbol and overwrite it with the previous one, moving back then forward to keep track of the symbols. Eventually, we reach \(M\); we loop back to our original position without changing everything.

To delete, we follow a similar process, but we replace the current symbol with the following one. This also requires taking two steps forward and one step back each iteration.

Adding \(k\) Different Tapes

There are a few ways we could go about this.

Idea \(1\): we define some delimiter symbol \(\#\) to separate the different tape models on our single tape. We also define a symbol \(\nabla\) that denotes the current position of our simulated read/write head. Each time we want to make a change to a tape, we

Idea \(2\): Define tape \(1\) as the cells of the tape equal to \(1\) \(\mod k\), tape \(2\) as the cells equal to \(2\) \(\mod k\), etc. Then, for a given tape, travel to the correct offset and navigate left and right by moving \(k\) spaces.

Strictly speaking, idea \(1\) is stronger because it can simulate an unbounded/arbitrary/changing number of tapes, whereas idea \(2\) can only support a fixed \(k\).

Nondeterministic Turing Machine

As we have seen, non-deterministic Turing machines have configuration trees. Briefly put, we can traverse this configuration tree with breadth-first search. Although this tree may have infinite depth, it has finite width (is this the term?), so BFS will reach every possible configuration eventually.

We can simulate a non-deterministic Turing machine \(\tilde{M}\) with \(3\) tapes:

  1. A copy of the input tape (never gets mutated; along for the ride)
  2. A copy of \(\tilde{M}\)'s tape in the current branch
  3. An encoding of the current location in \(\tilde{M}\)'s tree

Each loop, we increment the location in the tree according to BFS. Then, we simulate \(\tilde{M}\) from the point where we are currently located. If our simulation accepts, we return \(\text{accept}\); if our simulation rejects, we move on to the next location in the tree.

13 - Models of Computation

We've seen that a basic Turing machine can simulate any program with access to infinite, finitely addressable external storage, which is a reasonable abstraction for familiar kinds of computing systems. We've also seen that a basic Turing machine can simulate any algorithm, i.e. any mechanical procedure.

Now, we want to know:

This will help us understand the corresponding limits of Turing machines via the equivalence established by the Church-Turing thesis.

Levels of Abstraction

Describing Turing machines is the theoretical CS equivalent of writing machine code from scratch; it is the lowest possible level to describe a procedure. But, just like programmers (generally) choose not to express their ideas in assembly, we may opt to use higher-level descriptions of algorithms instead, as allowed by the Church-Turing thesis.

We assume that all inputs are strings and that any sort of object we might wish to use as input (e.g. a number, another program, etc) can be described as a string. Given object \(O\), the string description of \(O\) is denoted \(\langle O \rangle\).

Concentric Models of Computation

As we've seen each new family of models of computation (first finite automata, then pushdown automata, then Turing machines), we've noticed that each new model can simulate the previous one. This leads concentric sets of corresponding languages; a widely-accepted (sub)-taxonomy is known as the Chomsky Hierarchy.

better disambiguation about chomsky hierarchy

14 - Decidable Problems

Recall: a problem is decidable iff a halting algorithm exists that will accept or reject it.

Meta-Language Classes

Earlier, we constructed languages as a structure to collect and reason about sets of strings (sometimes with respect to a global alphabet \(\Sigma^{*}\)). A striking feature of theoretical CS is how the same structural patterns are repurposed over and over: equivalence, abstraction, reduction. So, reasoning about language classes and meta language classes are natural progressions.

A language class (or model of computation) \(\mathcal{M}\) is a set of languages. Language classes are an effective way to refer to sets of all possible types of structures.

A meta language class (with respect to \(\mathcal{M}\)) is a language classes of descriptions of languages that have some property with respect to \(\mathcal{M}\).

Symbol Name Set construction
\(\text{A}_\mathcal{M}\) Acceptance problem for \(\mathcal{M}\) \(\set{\langle {M},w \rangle\mid M\in \mathcal{M} \text{ accepts or generates }w}\)
\(\text{E}_\mathcal{M}\) Emptiness testing problem for \(\mathcal{M}\) \(\set{\langle {M}\rangle\mid M\in \mathcal{M} \text{ has } L(M) = \varnothing}\)
\(\text{EQ}_\mathcal{M}\) Equivalence testing problem for \(\mathcal{M}\) \(\set{\langle {A},B \rangle\mid A,B\in \mathcal{M} \text{ such that } L(A)=L(B)}\)
\(\text{ALL}_\mathcal{M}\) All string testing problem for \(\mathcal{M}\) \(\set{\langle {M}\rangle\mid M\in \mathcal{M} \text{ has } L(M) = \Sigma^*}\)

Decidability of Language Classes

We cover decidability results for languages classes pertaining to structures we've already seen. Below, we assume \(M\) is an arbitrary "member" of the language class. We also assume that we've checked if \(M\) is well-formed; trivially, we can use a TM to check this before the principal computation

Theorems: Finite Automata and Regular Expressions

\(\text{A}_\text{DFA}\) is decidable

Theorem
The acceptance problem for DFAs is decidable, i.e. given a (description of a) DFA and a string, we can decide whether it accepts or rejects the string.

Since DFAs must halt (i.e. when they have consumed all their input), we can simply use a TM to simulate \(M\) on \(w\) given a description \(\langle M,w \rangle\). If this simulation ends on an accept state, the TM accepts. Otherwise, it rejects.

We simulate by building a model of \(M\) from \(\langle M \rangle\) in our TM, then keeping track of \(M\)'s current state and the current input position in \(w\).

\(\text{A}_\text{NFA}\) is decidable

Theorem
The acceptance problem for NFAs is decidable, i.e. given a (description of a) NFA and a string, we can decide whether it accepts or rejects the string.

We construct a TM that converts the NFA to a DFA (\(O(2^{n})\)), then inherit from the fact that \(\text{A}_\text{DFA}\) is decidable.

\(\text{A}_\text{RE}\) is decidable

Theorem
The acceptance problem for regular expressions is decidable, i.e. given a regular expression and a string, we can decide whether it matches to the string.

Again, we construct a TM that converts the regular expression to an equivalent DFA, then inherit from the fact that \(\text{A}_\text{DFA}\) is decidable.

\(\text{E}_\text{DFA}\) is decidable

Theorem
The emptiness problem for DFAs is decidable, i.e. given a (description of a) DFA, we can decide whether or not the language it describes is \(\varnothing\).

Essentially, we want to search all the possible paths from the initial state \(q_0\) and see if any end on accepting states. This is done by marking \(q_0\), then iteratively marking every state that can be reached from \(q_0\) by consuming input until a fixpoint is reached. If (at some point) we reach an accepting state, we reject. Otherwise, if we reach a fixpoint without accepting states, we accept.

\(\text{EQ}_\text{DFA}\) is decidable

Theorem
The equivalence problem for DFAs is decidable, i.e. given a (descriptions of) two DFAs, we can decide whether or not they describe the same language.

Given \(\langle A,B \rangle\), we construct new DFA \(C\) that decides \(L(A)\,\triangle\, L(B)\), where \(\triangle\) is the symmetric difference operator (set-theoretic XOR) defined by \(S \,\triangle\, T := (S\cup T) \setminus (S\cap T)\); such a \(C\) must exist because regular languages are closed under union, intersection, and complement. Then, since \(\text{E}_\text{DFA}\) is decidable, we decide whether \(L(C)=\varnothing\). Since \(L(C)=\varnothing\iff L(A)=L(B)\) by construction, we have decided either \(L(A)=L(B)\).

Theorems: Pushdown Automata and Context-free Grammars

Bound on derivation size for a CFGs in CNF

Lemma
Let \(\tilde{G}\) be a context-free grammar in CNF. Then, for any \(w\in L(\tilde{G})\) where \(|w| =:n > 0\), exactly \(2n-1\) substitutions are required to derive \(w\) using \(\tilde{G}\).

\(\text{A}_\text{CFG}\) is decidable

Theorem
The acceptance problem for CFGs is decidable, i.e. given a CFG and a string, we can decide whether it string can be derived in the CFG.

Note that we can't just try every derivation, since there could be infinitely many. If that were the case, our procedure wouldn't halt for \(w\not\in L(G)\).

Instead, we derive \(G\) from description \(\langle G \rangle\), then convert it into its Chomsky normal form (denote with \(\tilde{G}\)). From the bound introduced above, we know our derivation has exactly \(2n-1\) steps, so we essentially search a tree of bounded width (since the CFG is finite) of (bounded) height \(2n-1\). If we find a path that generates our input string, we accept; otherwise, we reject.

\(\text{A}_\text{PDA}\) is decidable

Theorem
The acceptance problem for PDAs is decidable, i.e. given a description of a PDA and a string, we can decide whether it accepts or rejects the string.

Again, we can't just simulate the PDA directly, since PDAs don't always terminate. Instead, we construct the PDA's corresponding CFG, then inherit from the fact that \(\text{A}_\text{CFG}\) is decidable.

\(\text{E}_\text{CFG}\) is decidable

Theorem
The emptiness problem for CFGs is decidable, i.e. given a CFG, we can decide whether or not the language it describes is \(\varnothing\).

Again, we can't just search through strings, since we wouldn't halt if the language is indeed empty.

Instead, we proceed similarly to how we determined \(\text{E}_\text{DFA}\). We mark all of the terminals, then iteratively mark any rules whose bodies consists entirely of marked atoms, repeating until we reach a fixpoint. If the initial rule \(S\) has been marked, we reject; otherwise, we accept.

Note: We can't use our union/intersection/complement proof strategy to determine if \(\text{EQ}_\text{CFG}\) is decidable, since context-free languages aren't generally closed under intersection and complement. In fact, \(\text{EQ}_\text{CFG}\) is unrecognizable.

15 - Undecidable and Unrecognizable Languages

So far, we've proven that meta language classes are decidable. We will see examples and proofs of languages that aren't decidable.

Undecidability of Language Classes

Theorems: Turing Machines

\(\text{A}_\text{TM}\) is not decidable (i.e. is undecidable)

Theorem
The acceptance problem for Turing machines is undecidable, i.e. given a description of a Turing machine and a string, it is not generally possible to decide whether the Turing machine accepts or rejects the string.

Proof: assume towards a contradiction that \(\text{A}_\text{TM}\) is decidable. Then, there exists TM \(D\) that decides whether \(\langle M,w \rangle\in \text{A}_\text{TM}\). Define TM \(\text{Evil} : \langle M \rangle\mapsto \lnot D(\langle M, \langle M \rangle \rangle)\), which returns the opposite of output of \(D\) when run with some \(\langle M, \langle M \rangle \rangle\) as input. Consider \(\text{Evil}(\langle \text{Evil} \rangle)\), which reduces to \(\lnot D(\langle \text{Evil}, \langle \text{Evil} \rangle \rangle)\).

Thus, by contradiction, no such \(D\) can exist. Thus, \(\text{A}_\text{TM}\) is undecidable.

We can view this proof as a diagonalization argument. We enumerate all (countably many) machine descriptions \(\langle M_1 \rangle, \langle M_2 \rangle, \dots, \langle\text{Evil} \rangle, \dots\) as the inputs (columns), and machines \(M_1, M_2, \dots, \text{Evil}, \dots\) as the columns. As we will in the columns, we see the row corresponding to \(\text{Evil}\) has entries \(\lnot D_{k,1}, \lnot D_{k,2}, \dots\). We will be unable to fill in the entry for \([\langle \text{Evil} \rangle, \text{Evil}]\).

\(\text{A}_\text{TM}\) is recognizable

Theorem
The acceptance problem for Turing machines can be recognized, i.e. given a description of a Turing machine and a string, it is possible to describe an algorithm that accepts the string if and only if the TM accepts the string.

I.e. it is possible to construct a universal Turing machine.

Proof: trivially, given the description \(\langle M \rangle\), we construct a TM that simulates \(M\) on input \(w\). If this simulation hits an accept state we accept, and if it hits a reject state, we reject. By the definition of "recognizability", we don't need to worry about the case where the simulation of \(M\) doesn't terminate.

We say Turing machines are universal.

Co-recognizability

A language \(L\) is co-recognizable if its complement \(\overline{L}\) is recognizable.

Decidable iff recognizable and co-recognizable

Theorem
Language \(L\) is decidable if and only if \(L\) is recognizable and co-recognizable

As a corollary:

\(\overline{\text{A}_\text{TM}}\) is not recognizable (i.e. not recognizable)

Theorem
It is not possible to recognize whether a TM does not accept a given string.

Proving Undecidability Using Reduction

If language \(\Lambda\) is known to be undecidable, and a decider for language \(\Gamma\) can be adapted to decide \(\Lambda\), then we can conclude (by contradiction) that \(\Gamma\) must also be undecidable. This is a proof by reduction.

We often reduce to the Halting problem since it is the canonical undecidable problem.

Theorems: Turing Machines

We define the halting problem over a meta-language of machine descriptions \(\star{\mathcal{M}}\) as the problem \(\text{HALT}_{\star{\mathcal{M}}} :=\set{\langle M,w \rangle \mid M\in \star{\mathcal{M}} \text{ and } M \text{ halts on } w}\).

\(\text{HALT}_\text{TM}\) is undecidable

Theorem
It is not possible to decide the halting problem for Turing machines, i.e. given a description of a Turing machine and an input, it is not possible to decide whether the TM halts on the input.

Proof: we reduce to \(\text{A}_\text{TM}\), which we know is undecidable. Assume towards a contradiction that Turing machine \(H\) decides the halting problem. Then, we construct \(D\) that takes input \(\langle M,w \rangle\) as follows:

We have just constructed a machine that decides \(\text{A}_\text{TM}\), a contradiction. Thus, \(H\) cannot exist as described. Thus, the halting problem is undecidable.

\(\text{E}_\text{TM}\) is undecidable

Theorem
The emptiness problem for Turing machines is undecidable, i.e. we cannot generally decide whether the language described by a Turing machine is empty.

Proof: we reduce \(\text{A}_\text{TM}\) to \(\text{E}_\text{TM}\). Towards a contradiction, let \(E\) be a decider for \(\text{E}_\text{TM}\) that takes input \(\langle M,w \rangle\). Construct TM \(D\) also taking input \(\langle M,w \rangle\) that:

This decides \(\text{A}_\text{TM}\), a contradiction. So, \(\text{E}_\text{TM}\) is undecidable.

\(\text{EQ}_\text{TM}\) is undecidable

Theorem
The equality problem for Turing machines is undecidable, i.e. we cannot generally decide whether two Turing machines describe the same language.

Proof: we reduce \(\text{E}_\text{TM}\) to \(\text{EQ}_\text{TM}\). Given decider \(D_\text{EQ}\) for \(\text{EQ}_\text{TM}\), we construct a decider for \(\text{E}_\text{TM}\) by first constructing a null Turing machine \(N(x)\) that rejects all input (i.e. such that \(L(N)=\varnothing\)). Then, we use \(D_\text{EQ}\) to test equality between our input \(M\) and \(N\) (i.e. we give \(D\) input \(\langle M,N \rangle\)). Clearly, we have built a decider for \(\text{E}_\text{TM}\), which we know to be undecidable. So, \(\text{EQ}_\text{TM}\) is undecidable by reduction.

Rice's Theorem

Notice how it seems like any meta language of Turing machine descriptions falls apart under self-reference. This isn't a coincidence, but a general, provable fact about semantics.

A language \(P\) is a nontrivial language over Turing machine descriptions if the following apply:

Rice's Theorem

Theorem
Any nontrivial language over Turing machine descriptions is undecidable.

Proof

Proof: Towards a contradiction, let \(D_P\) be a decider for \(P\). By non-triviality, there exist Turing machines \(I\) and \(E\) where \(\langle I \rangle\in P\), \(L(I)\ne \varnothing\) and \(\langle E \rangle\not\in P\), \(L(E)=\varnothing\) (if \(E\in P\), then assign \(E\) to be a decider for \(\overline{P}\)). We construct a decider \(D_\text{A}\) for \(\text{A}_\text{TM}\) that takes input \(\langle M,w \rangle\) as follows:

We see that:

So, \(D_\text{A}\) decides \(\text{A}_\text{TM}\), a contradiction of the fact that \(\text{A}_\text{TM}\) is not decidable. Thus, \(D_P\) cannot exist as described, and thus \(P\) must be undecidable. Finally, by generality, any nontrivial language over Turing machine descriptions is undecidable.

Interpretation

Rice's theorem shows that every nontrivial property of a program is undecidable. This generalizes all the undecidability results we've seen (including the halting problem) since meta language classes are defined by non-trivial properties. In doing so, Rice's theorem establishes a boundary between the syntax and semantics of a program: syntactic properties are decidable, while semantic ones aren't.

Rice's theorem Wikipedia page

16 - Recursive Functions

What if we did Turing machines, but worse?

Our models of computation so far have been based on Turing machines and strings; this was Turing's innovation that cleaned everything up. But, we will also explore the earlier model based on the naturals \(\mathbb{N}\) and recursive functions instead.

Primitive Recursive Functions

A function is computable if there exists an algorithm to calculate that calculates the output given an input. In this sense, all algorithms are computable functions. But, since there are countably infinitely many algorithms and uncountably many functions \(f : \mathbb{N}\to \mathbb{N}\), there exist uncomputable functions.

Every computable function \(f : \mathbb{N}^{d}\to \mathbb{N}\) can be build from the following atomic operations:

Name Definition
Zero function \(0(x):=0\) for all \(x\)
Successor function \(S(x) := x+1\) for all \(x\)
Identity function \(\text{Id}(x)=x\) for all \(x\)
Projection/indexing \(\pi_i(x_1, \dots, x_n):=x_i\) for all \(\vec{x}\)
Composition \((f\circ g)(x)=f(g(x))\) for all \(x\)

A primitive recursive function \(h\) is any function \(h : \mathbb{N}^{a}\to \mathbb{N}^{b}\) that can be described as a base case of the form \(h(x,0) :=f(x)\) and a recursive case of the form \(h(x,S(y)) := g(x,y,h(x,y))\) for some functions \(f,g\).

Primitive recursion is equivalent to a programming language with for loops, which are sometimes easier to work with. We see \(\implies\) by implementing the recursion above in imperative pseudocode:

// what is this syntax. python is rotting my brain
function h(x, i):
    z := f(x)
    if (i > 0):
        for y := 0 to i - 1
            z := g(x, y, z)
    return f(x)

Limits of Primitive Recursion

The fact that we only use (terminating) for loops should raise a red flag. We've learned that any universal model of computation will have a non-halting program. So, by contrapositive, since we know any for loop program (and by equivalence primitively recursive program) must terminate, primitive recursion cannot be a universal model of computation. No for loop program can simulate all for loop programs.

Symbolically, if we have universal program \(U\) such that \(U(f,x)=f(x)\) for all \(x\in \mathbb{N}\) and primitive-recursive \(f : \mathbb{N}\to \mathbb{N}\), then \(U\) cannot be primitive-recursive.

Non-primitive Recursion

The Ackermann Function

Are all computable, halting functions primitive? Turns out, nope!

We define the Ackermann function(s) as \(A_1(x,y)=x+y=x+1+\dots + 1\), \(A_2(x,y)=x\times y=x+ \dots + x\), \(A_3(x,y)=x^{y}=x\times \dots\times x\), etc. More generally, we have the recursive definition:

Clearly, the Ackermann function is primitive recursive for constant \(n\); we just defined it (primitively) recursively. But, this is not the case if we let \(n\) be an argument.

Theorem

Theorem
The function \(f :\mathbb{N}^{2}\to \mathbb{N}\) given by \(f(x,y):=A_y(x,y)\) is not primitive recursive, even though it halts and is computable for all inputs.

Partial/\(\mu\) Recursion

As we remove constraints, the expressivity of our model increases.

We define a new constructor \(\mu(x):=\min\set{y\mid f(x,y)=0}\) given function \(f(x,y)\). This may be a partial function because an algorithm to find this minimum may not halt, i.e. when no such \(f(x,y)=0\) exists. A partially recursive function is a function with the same structure as \(\mu(x)\) (formalize).

Clearly, we can't implement \(\mu\) with for loops, but we can implement them with while loops, namely one with the condition f(x,y) =/= 0.

Hierarchy of Recursion

We observe the following hierarchy of strict inclusion for types of recursive functions:

Importantly, partial recursion is computationally equivalent to a Turing machine. By corollary, all algorithms can be expressed as while loop programs.

Function Loop type Automaton Language Description
Primitive recursive For loops Finite automaton Regular language Regular expression
Total recursive Halting while loops Pushdown automaton Context-free language Context-free grammar
Partial recursive While loops Turing machine Turing-recognizable / recursively enumerable Turing machine description

Universal Partial Recursion and Gödel Numbering

It seems feasible that partial recursive functions could be universal. However, we first need to arithmetize the syntax of functions (i.e. describe a way to encode functions as numbers) so that we can pass them to the universal function as input. It is much less trivial to do this for natural numbers than strings.

We define the Gödel numbering function \(g : (\mathbb{N}\to \mathbb{N})\to \mathbb{N}\) that maps a function \(f\) to its Gödel number \(\bar{f}\). These numbers are encoded as powers of prime numbers, i.e. along the prime factor \(\mathbb{N}\to \text{multisets}\) bijection. \(\bar{f}\) is defined as:

I.e. the encoding recursively follows the structures of primitive and partial recursion. Since prime factorizations are unique (fundamental theorem of arithmetic), we can always recover the original structure.

We claim the following:

==What is a configuration of \(f\) on \(x\)? Also, proofs?==

Kleene Normal Form Theorem

Theorem
There exist primitive recursive function \(f,g\) such that every partial recursive function \(h\) can be expressed as \(h(x)=g(x,\min\set{y\mid f(p,x,y)=0})\) for some \(p\in \mathbb{N}\).

By corollary, any while loop program can be expressed as a single while loop.

It turns out that for and while loops together form a Turing-complete model computation (1936). So, all programs can be written with just for and while loops (i.e. just with while loops, since a while loop can simulate a for loop).

17 - Undecidability and Incompleteness

Logical Systems

Mathematical logic employs mechanically checkable reasoning to determine and prove mathematical statements. Different logical systems may be used (commonly, first-order logic), but all logical systems require:

Language

The language of FOL statements can be expressed as a context-free grammar with variables (e.g. \(x\)) and constants (e.g. \(\pi\)) as terminals, and boolean operators (e.g. \(\lnot\)), functions (e.g. \(+\)), relations (e.g. \(=\)), parentheses, and quantifiers (\(\forall, \exists\)) as nonterminal rules and/or parts of nonterminal rules. Thus, every formula is a tree.

Reasoning

A proof in FOL is a finite, directed, tree (maybe a DAG actually?) where vertices are sentences (statements without free variables) and (directed) edges correspond to reasoning rules. An edge joins two sentences when one is derivable from another via a valid reasoning rule. Thus, the leaves of the tree are premises and the root is the conclusion.

Since there are countably many sentences and proof trees, we can enumerate them mechanically to generate a set of provable consequences. This, the language \(C(\Gamma) :=\set{\sigma\mid \Gamma\vdash \sigma}\) is recognizable.

Rules for reasoning are syntactic.

Semantics

We need to define a way to interpret the strings in our language.

Model

Definition
A model \(M\) is a universe \(U\) such that all constants are in \(U\), all function symbols are bound to functions on \(U\), and all relation symbols are bound to relations on \(U\) (i.e. subsets of \(U^n\) for an \(n\)-ary relation).

Given a model, the truth or falsity of a FOL sentence is well-defined.

If sentence \(\sigma\) is true under some model \(M\), we denote \(\vDash_M \sigma\). We define the theory \(\text{Th}(M)\) of \(M\) as the language of all statements made true by \(M\), i.e. \(\text{Th}(M):=\set{\sigma\mid \, \vDash_M \sigma}\). Finally, we write \(\Gamma\vDash \sigma\) if we have \(\vDash_M \Gamma\) implies \(\vDash_M \sigma\) under any model \(M\).

With \(\vdash\) and \(\vDash\), we can now describe two properties of logical systems:

The Big Questions

Is it possible to find proofs mechanically?

Is there a formal procedure for deciding truth or falsity?

Results about Arithmetics

Define model \(A_+\) to be the arithmetic with \(+\), \(\mathbb{N}\) and relations \(<, \leq, =, \geq, >\). Define \(A_{+,\times}\) as \(A_+ \cup \set{\times}\).

\(\text{Th}(A_+)\) is decidable

Theorem
Given an expression in \(A_+\), it is possible to decide whether it is true or false.

\(\text{Th}(A_{+,\times})\) is not decidable, i.e. undecidable

Theorem
Given an expression in \(A_{+,\times}\), it is not generally possible to decide whether it is true or false.

This indicates the general result that any system with the capacity to implement/simulate Gödel numbering must inherit undecidability.

Model \(M\) is axiomatizable if there exists set of premises \(\Gamma\) such that \(C(\Gamma)=\text{Th}(M)\), i.e. if we can describe \(M\) as a set of axioms \(\Gamma\).

\(\text{Th}(A_{+,\times})\) is not axiomatizable

Theorem
It is not possible to construct a set of axioms for \(A_{+,\times}\).

As a corollary, if we have sound axioms \(\Gamma\subseteq \text{Th}(A_{+,\times})\), then there must exist some \(\sigma\in \text{Th}(A_{+,\times})\) such that \(\sigma\not\in C(\Gamma)\).

18 - Asymptotics and Algorithm Efficiency

Thus far, all terminating problems have been labelled "decidable", but we can reason further about "how long" a decidable program might take to terminate and "how much space" it would use. We've neglected such measures when designing our algorithms, and now we must pay for our sins.

Complexity

We define the worst-case runtime for Turing machine \(M\) deciding language \(L\) as a function \(t:\mathbb{N}\to \mathbb{N}\) where \(t(n)\) is defined as the maximum number of steps it takes \(M\) to halt on an input string \(w\) of length \(n\).

This definition is concrete, but unnecessarily jagged: the precise definition of \(t(n)\) might be chaotic and categorically difficult to compute. To smooth things out, we partition worst-case runtime functions into classes by their asymptotic efficiency, i.e. their behavior as \(n\to\infty\).

Common Asymptotics Definitions

Definition
We have \(f(n)\in O(g(n))\) when there exists \(n_0<\infty\) such that for all \(n\geq n_0\), we have \(f(n)\leq c\cdot g(n)\) for some constant \(c > 0\).

We have \(f(n)\in o(g(n))\) when for all \(\varepsilon > 0\), there exists some \(n_0<\infty\) such that for all \(n\geq n_0\), we have \(f(n)<\varepsilon\cdot g(n)\), implying \(\displaystyle\lim_{n\to \infty}\dfrac{f(n)}{g(n)}=0\).

The partitioning by the classes \(O\), \(\Omega\), \(o\), \(\omega\) defines an order relations over functions, acting respectively like \(\leq, \geq, <, >\). Partitioning based on \(\Theta\) defines an equivalence relation over functions.

Important Examples

Equivalence of logs: \(O(\log_k{n})=O(\ln{n})\)

Polynomials of a given degree: \(\displaystyle\sum\limits_{i=0}^{d}a_i n^{i}=O(n^{d})\)

Class of polynomial functions: \(2^{O(\log{n})}=n^{O(1)}=\displaystyle\bigcup_{k\in \mathbb{N}}O(n^{k})\). So, \(O(n^k)\subseteq n^{O(1)}\).

Complexity vs. Efficiency

Although complexity and efficiently are measured with the same units, they are different things: complexity is a property of a problem, whereas efficiency is a property (namely the worst-case runtime) of an algorithm.

Since complexity is global, we can construct language classes based on complexity. We denote with \(\text{TIME}(t(n))\) the class of languages decidable by a multi-tape (standard) TM in \(O(t(n))\).

Relating Worst Cases of Different Models

Single-tape \(\leftrightarrow\) multi-tape TM

Theorem
A multi-tape TM with worst-case runtime \(t(n)\) has an equivalent single-tape TM with worst-case runtime \(O(t(n)^{2})\).

To reason about non-deterministic TMs, we amend our definition of \(t\): for ND TMs, \(t(n)\) is the maximum number of steps taken on any computational branch given an input of length \(n\).

Deterministic \(\leftrightarrow\) non-deterministic TM

Theorem
A non-deterministic TM with worst-case runtime \(t(n)\) has an equivalent deterministic TM with worst-case runtime \(2^{O(t(n))}\).

19 - The Classes P and NP

The Class \(\mathbf{P}\)

The class \(\mathbf{P}\)

Definition
We define polynomial time \(P\) as the class of all languages that can be decided in polynomial time, i.e. \(P:=\displaystyle\bigcup_{k\in \mathbb{N}}\mathbf{TIME}(n^{k})\).

We consider polynomial time to be "reasonable" when designing algorithms, at least compared to exponential time.

When reasoning about problems, by social contract, we assume that structures are encoded in a "reasonable" way, e.g. graphs are adjacency matrices or adjacency lists. We can make a problem trivially easy by picking an "adversarial" encoding. E.g. primality testing is \(O(1)\) (loose bound) if we store numbers as products of prime powers.

Context-free languages are polynomial

Theorem
All context-free languages can be decided in polynomial time. So, \(L\in \text{CF}\implies L\in P\).

Examples

We prove by describing a polytime algorithm.


Define the problem \(\text{PATH}\) as the language \(\set{\langle G,s,t \rangle\mid G \text{ is a directed graph with path from } s \text{ to } t}\).

\(\text{PATH}\) is in \(P\)

Theorem
\(\text{PATH}\) can be solved in polynomial time.

Define the problem \(\text{RELPRIME}\) as the language \(\set{\langle x,y \rangle \mid x,y\in \mathbb{N}, \text{GCD}(x,y)=1}\).

\(\text{RELPTIME}\) is in \(P\).

Theorem
\(\text{RELPTIME}\) can be solved in polynomial time.

The Class \(\mathbf{NP}\)

A verifier \(V\) for language \(L\) is an algorithm (TM) \(V\) such that \(L\) can be defined \(L:=\set{w\mid V\text{ accepts} \langle w,c \rangle \text{ for some } c}\), where \(c\) is a certificate string. \(L\) is polynomially verifiable if \(V\) runs in polynomial time with respect to \(|w|\) (in which case \(V\) is a polynomial time verifier).

The class \(\mathbf{NP}\)

Definition
We define \(\mathbf{NP}\) as the class of languages with polynomial-time verifiers. Every language \(L\in \mathbf{NP}\) can be decided by a non-deterministic TM in polynomial time

We define the class \(\mathbf{coNP}\) as the class \(\set{L \mid \overline{L}\in\mathbf{NP}}\).

==Note that being in NP is analogous (though not equivalent) to being recognizable. For recognizable \(L\), the recognizer may never halt when \(w\not\in L\) is used as input. However, for \(L\in \mathbf{NP}\), no certificate will exist for \(w\not\in L\), so it can't be verified in polynomial time==.

We define \(\mathbf{NTIME}(t(n))\) as the set of languages decidable in \(O(t(n))\) time by some non-deterministic TM. We can then characterize NP as \(\mathbf{NP}:=\displaystyle\bigcup_{k\in \mathbb{N}}\mathbf{NTIME}(n^{k})\).

Examples


Define the problem \(\text{HAMPATH}\) as the language \(\set{\langle G,s,t \rangle\mid G \text{ is a directed graph with a hamiltonian path from } s \text{ to } t}\). Recall that a hamiltonian path visits every vertex exactly once.


Define the problem \(\text{COMPOSITES}\) as the language \(\set{\langle x \rangle\mid x=pq \text{ for } p,q\in \mathbb{N} \text{ where } p,q\ne 1 \text{ and } p,q \ne x}\), i.e. the language consisting of composite numbers.


Define the problem \(\text{CLIQUE}\) as the language \(\left\{ \langle G, k \rangle \mid G \text{ is an undirected graph that contains a clique of size } k \right\}\)


Define the problem \(\text{SUBSET-SUM}\) as the language \(\left\{ \langle S, t \rangle \mid S \text{ is a finite set of integers and some subset of } S \text{ sums to } t \right\}\)

Interpretation

In most of these examples, the corresponding decision problem can't be done in polynomial time because of the search aspect. For example, there are \(2^{|S|}\in O(2^{|S|})\) subsets of a set \(S\), so any decision problem expressible as finding a subset is likely exponential.

Many algorithms proceed by brute-forcing all the possible solutions, then checking them one by one until a valid one is found; it's usually the brute-force step that is super-exponential.

\(\mathbf{P}\) Vs. \(\mathbf{NP}\)

Clearly, \(\mathbf{P}\subseteq\mathbf{NP}\); if a problem can be decided in polynomial time, clearly an instance can be verified in polynomial time. On the other hand, we've yet to find a problem in \(\mathbf{NP}\) that is provably not in \(\mathbf{P}\). The fundamental question \(\mathbf{P}\overset{?}{=} \mathbf{NP}\) remains famously unsolved.

20 - NP-Completeness

Polytime Reductions

A function \(f : \Sigma^{*}\to \Sigma^{*}\) is polytime-computable if there exists a TM that takes input \(w\), halts with \(f(w)\) written the the tape, and runs in polynomial time with respect to \(|w|\).

Language \(A\) is polytime-reducible to language \(B\), denoted \(A \leq_p B\) iff there exists polytime-computable \(f:A\to B\) such that \(w\in A\) if and only if \(f(w)\in B\).

\(\mathbf{P}\) is closed under polytime reduction

Theorem
For languages \(A,B\), if we have \(A\leq_p B\) and \(B\in \mathbf{P}\), then we will also hae \(A\in \mathbf{P}\).

Satisfiability Problems

Problems of satisfiability were among the first problems to be analyzed for complexity, and play a role in defining complexity classes. They are the canonical hard problem.

Define the problem \(\text{SAT}\) as the language \(\set{\langle \varphi \rangle \mid \varphi\text{ is a satisfiable boolean formula}}\), and the problem \(\text{3-SAT}\) as the language \(\set{\langle \varphi \rangle \mid \varphi\text{ is a satisfiable boolean formula in 3-CNF}}\), where \(\text{3-CNF}\) means "in conjunctive normal form, with each clause having \(3\) atoms". It is fairly trivial to see that both problems are in \(\mathbf{NP}\), since booleans formulas can be evaluated in polytime.

\(\text{SAT} \leq_p \text{3-SAT}\)

Theorem
\(\text{SAT}\) is reducible to \(\text{3-SAT}\)

We describe a procedure to convert a boolean formula \(\varphi\) to 3-CNF formula \(\psi\) such that \(\varphi\) is satisfiable if and only if \(\psi\) is satisfiable. For every internal node in \(\varphi\), we add to \(\psi\):

Each time, we encode the operator into multiple clauses by using \(a\) to "bind" the 3CNF clauses together. E.g. for \(\land\), we either have \(a\) or \(\lnot a\), so we either have \(b\land c\) (from clauses \(1\) and \(2\)) for \(a\) or \(\lnot b \lor \lnot c \equiv\lnot(b\land c)\) for \(\lnot a\), as expected. So, \(\psi\) is satisfiable if and only if \(\varphi\) is satisfiable.

\(\text{3-SAT}\leq_p \text{CLIQUE}\)

Theorem
\(\text{3-SAT}\) is reducible to \(\text{CLIQUE}\)

We construct a graph \(G\) from a boolean formula \(\varphi\) as follows:

Then, \(G\) has a \(k\)-clique if and only if \(\varphi\) is satisfiable, since a \(k\)-clique in \(G\) corresponds to a satisfying assignment of \(\varphi\).

This construction is also given in CMPUT 304.

NP-Completeness

\(\mathbf{NP}\)-Completeness

Definition
Language/problem \(L\) is \(\mathbf{NP}\)-complete if \(L\) is in \(\mathbf{NP}\) and every other problem in \(\mathbf{NP}\) can be polytime-reduced to \(L\)

\(\mathbf{NP}\)-completeness is closed under polytime reduction

Theorem
If language \(A\) is \(\mathbf{NP}\)-complete, language \(B\) is in \(\mathbf{NP}\), and \(A \leq_\mathbf{P} B\), then \(B\) is \(\mathbf{NP}\)-complete.

We also fund that if there exists from \(\mathbf{NP}\)-complete language \(L\) that is also in \(\mathbf{P}\), then we have \(\mathbf{P}=\mathbf{NP}\), since we could reduce any \(\mathbf{NP}\)-complete \(A\) to \(L\in P\).

Cook-Levin Theorem (1971)

Theorem
\(\text{SAT}\) is \(\mathbf{NP}\)-complete.

21 - Complexity Hierarchy

Space Complexity

Recall the definition of worst-case time complexity. There is an analogous definition for space: A TM has space complexity \(s(n)\) it never visits more than \(s(n)\) tape cells on input \(|w|<n\). Again, for a nondeterministic TM, no sequence of configurations can visit more than \(s(n)\) spaces.

We have corresponding space complexity classes:

Relating Space and time Complexity

Trivially, \(\text{TIME}(t(n))\subseteq \text{NTIME}(t(n))\) and \(\text{SPACE}(s(n))\subseteq \text{NSPACE}(s(n))\).

Time complexity bound by space complexity

Theorem
\(\text{TIME}(f(n))\subseteq \text{SPACE}(f(n))\) and \(\text{NTIME}(f(n))\subseteq \text{NSPACE}(f(n))\).

Space complexity bound by time complexity

Theorem
\(\text{SPACE}(f(n))\subseteq \text{TIME}(2^{O(f(n))})\) and \(\text{NSPACE}(f(n))\subseteq \text{NTIME}(2^{O(f(n))})\)

Basically, space can be reused whereas time cannot.

Savitch's Theorem

Theorem
For \(s(n)\geq n\), we have \(\text{NSPACE}(s(n))\subseteq \text{SPACE}(s(n)^2)\)

Constructibility

Function \(t(n) :\mathbb{N}\to \mathbb{N}\) is time-constructible if \(t(n)\geq n\log{n}\) and if the function that maps \(1^{n}\mapsto \text{binary representation of } t(n)\) can be computed in \(O(t(n))\).

Function \(s(n) :\mathbb{N}\to \mathbb{N}\) is space-constructible if \(s(n)\geq \log{n}\) and if the function that maps \(1^{n}\mapsto \text{binary representation of } s(n)\) can be computed in \(O(s(n))\).

These rule out slow-growing functions and uncomputable functions.

Space Hierarchy Theorem

Theorem
For any space-constructible \(s(n)\), there exists language \(L\) decidable in \(O(s(n))\) space but not \(o(s(n))\) space.

Time Hierarchy Theorem

Theorem
For any time-constructible \(t(n)\), there exists language \(L\) decidable in \(O(t(n))\) time but not \(o\left(\dfrac{t(n)}{\log{t(n)}}\right)\) time.

Important Classes

Concentrically, in order of increasing generality

Name Expression Example
\(L\) \(\text{SPACE}(\log{n})\) \(0^k 1^k\)
\(\mathbf{NL}=\mathbf{coNL}\) \(\text{NSPACE}(\log{n})\) \(\text{PATH}\)
\(P\) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{TIME}(n^k)\) \(A_\text{CFG}\), \(\text{RELPRIME}\), \(\text{CIRCUIT-VALUE}\)
\(NP\) or \(coNP\) (not equal!) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{NTIME}(n^k)\) \(\text{SAT}, \text{3-SAT}, \text{CLIQUE}\)
\(PSPACE=NPSPACE\) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{SPACE}(n^k)\) \(\text{TQBF}\)
\(EXPTIME\) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{TIME}(2^{n^k})\) \(\text{HALT IN }k\), when coded in \(\log{k}\) size
\(NEXPTIME\) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{NTIME}(2^{n^k})\)
\(EXPSPACE\) \(\displaystyle\bigcup_{k\in \mathbb{N}}\text{SPACE}(2^{n^k})\) \(\text{EQ}_{\text{REX}\uparrow}\), i.e. equality for regular expressions with exponents

Reductions

The space complexity equivalent to a polytime reduction is a log space reduction

Log space reduction

Definition
A log space reduction \(A\leq_{L}B\) exists between languages \(A\) and \(B\) if there exists log-space computable function \(f\) such that \(w\in A\iff f(w)\in B\).

A log space reduction implies a polytime reduction since \(\text{SPACE}(f(n))\subseteq \text{TIME}(2^{O(f(n))})\)

In turn, a function \(f\) is log space computable if there exists TM with read-only input, write-only output, and working memory that uses \(O(\log{|w|})\) working space to produce \(f(w)\) from \(w\).

Completeness

We can define completeness for each class we've seen in the same way we defined \(\mathbf{NP}\)-completeness earlier.

Also, it is worth noting which separations we know, i.e. which classes we know are not the same:

X22 - Kolmogorov Complexity

All this talk of "compressibility" in the first lecture and we don't cover Kolmogorov complexity? This shall not stand.

prefix vs. prefix-free code

strings are x or w?

In these notes, we reason about the Kolmogorov complexity of strings, but the Kolmogorov complexity of any object with a description language can be reasoned about. E.g. if \(\mathbb{T}\) is the set of all Turing machines, then \(\set{T\in \mathbb{T} \mid \langle T \rangle}\) is the description language of Turing machines (with respect to \(\langle \dots \rangle\)).

random access notes

- depends on a model of computation
- see also: code golf
- prefix-free has to do with nondeterminism? → can nondeterministically compute (?)
- diagonal argument
- bound by actual length of the string → $| \langle M,w \rangle |$ 
- entropy

Our definitions of time and space complexity describe the behavior of an algorithm while it is running (i.e. at runtime). There is also a notion of complexity the describes how compressible a problem is by considering bounds on the size of the description of an algorithm that solves it.

Plain Kolmogorov Complexity

Given a target string \(x\), consider combinations of Turing machines \(M\) and input strings \(w\) such that \(M\) leaves \(x\) on the tape when run with input \(w\). Thus, \(d:=\langle M,w \rangle\) is a description of \(x\). Let \(\star{d}\) be a minimal description of \(x\) with respect to length.

Then, the plain Kolmogorov complexity \(C(x)\) of string \(x\) is the minimal length \(|\star{d}|\) of a description of \(x\).

By the Church-Turing thesis (and theoretical CS common sense), we can reason about Kolmogorov complexity at a higher level than Turing machines. E.g. K-complexity puts a bound on how succinctly ideas can be expressed with natural language.

Sneaky Sneaky Shenanigans

There's almost a law of the universe that the first solution you try is almost always the worst one

Plain Kolmogorov complexity makes an assumption that it shouldn't: it assumes we know the lengths of strings. This seems reasonable, but becomes a problem when we reason about concatenation. Namely, we can't relate \(C(xy)\) and \(C(x)+C(y)\) without knowing "where to divide \(xy\) into \(x\) and \(y\)". We store the length of strings or use a delimiter, but both of these solutions effectively add characters to the input string.

With plain Kolmogorov complexity, \(C(xy) < C(x)+C(y) + O(1)\) is not generally true as would be expected. We would need a \(O(\min\set{\ln{x},\ln{y}})\) term on the right to balance the inequality.

Prefix-free Kolmogorov complexity solves this problem.

Prefix-free Kolmogorov Complexity

A prefix-free code is a code (in the coding theory sense; here, a subset of \(\set{0,1}^*\)) such that no codeword is a prefix of another. In such a code, we can always determine where a codeword ends, so we don't need to use delimiters or length encoding. A prefix-free Turing machine writes prefix-free codewords (using alphabet \(\set{0,1}\)) to its tape; it is required to read its input exactly once, left to right.

We formally describe the prefix-free Kolmogorov complexity as follows:

The prefix-free Kolmogorov complexity \(K(x)\) of string \(x\) is the shortest prefix-free codephrase that makes Turing machine \(M\) output \(x\), i.e. \(K(x):=\min\set{|w| \mid w\in S^{*}, U(c)=x}\).

Invariance Theorem

A optimal description language is a description language such that any description of an object in an alternate description language can be adapted to the constant language with at most constant overhead. Namely, this constant overhead is a description of the alternate description language.

Invariance Theorem

Theorem
Let \(L_1, L_2\) be Turing-complete description languages, and let \(K_{L_1}\) and \(K_{L_2}\) be prefix-free complexity functions for \(L_1\) and \(L_2\) respectively. Then, for all strings \(w\), we have \({-c \leq K_{L_1}(w) - K_{L_2}(w) \leq c}\) for some constant \(c\).

Basic Results

Relating plain and prefix-free complexity

Theorem
\(K(x)\leq C(x) + 2\lg{C(x)} + O(1)\).

Proof: We define the following procedure for converting a basic Turing machine program to a prefix-free Turing machine program:

  1. Determine the length of the basic Turing machine program (\(O(n)\))
  2. Encode the length to a prefix-free encoding, e.g. by doubling each digit and adding a termination code (e.g. \(9_{10}\to 1001_2 \to (11)(00)(00)(11)\underline{(01)}\)).

The description for the prefix-free Turing machine program must include the following:

  1. A description of how to simulate a basic Turing machine with a prefix-free one (\(O(1)\))
  2. The coded length of the program (\(O(\lg(C(x))\))
  3. The program itself \(O(C(x))\)

So, \(K(x)\in O(C(x))\). The precise inequality comes from the encoding scheme we picked for the length of the prefix-free encoding.


Bound on plain complexity relative to length of string

Theorem
There exists constant \(c\) such that for all strings \(x\), we have \(C(x) \leq |x|+c\).

As a corollary, we have \(K(x)\leq |x|+2\lg{|x|}\) (from the above-above theorem).


We define the conditional Kolmogorov complexity \(K(x|y)\) of the string \(x\) given \(y\) as the length of the shortest program (in some fixed description language) that outputs \(x\) when given \(y\) as additional input.

For convenience, we will \(K((x,y))\) (i.e. the complexity of the tuple of strings \((x,y)\)) as \(K(x,y)\).

Extra information bound

Theorem
\(K(x| y) \leq K(x)\leq K(x,y) \leq \max\set{K(x|y) + K(y), K(y|x) + K(x)}\leq K(x)+K(y)\)

Proof sketches:

Sub-additivity

Theorem
\(K(xy)\leq K(x,y)\)

Note that there isn't a way to compare \(K(xy)\) to \(K(x|y)\), \(K(x)\), etc.


Symmetry of Information

Theorem
\(K(x,y)=K(y,x)=K(x|y, K(y)) + K(y)\)

Here, the last equality is proved with a counting argument.


Information non-increase

Theorem
For any computable function \(f\), we have \(K(f(x))\leq K(x)+K(f)\)

Kolmogorov Complexity is Uncomputable

Note that we can't compute Kolmogorov complexity by simply enumerating all valid programs and simulating them because not all programs can halt.

Proposition 1: Kolmogorov complexity can get arbitrarily large

Proposition
For every \(n\in \mathbb{N}\), there exists string \(x\) such that \(K(x)\geq n\)

Proof: if this weren't true, there would be some \(n\) such that every possible string can be produced by a program expressible with less than \(n\) bits. There are finitely many such programs but infinitely many strings, a contradiction.

Uncomputability of Kolmogorov complexity

Theorem
The function \(K(x)\) is not computable

Interpretations

Current LLMs make real an interesting thought experiment: code can now be described with natural language. There is an attitude that LLMs make programming "easier" because the programmer must know less to implement their program. However, the results of programs have some (categorically minimal) Kolmogorov complexity, which puts a bound on just how "easy" such programs could be to write. Describing the behavior of a program (as would need to be done to prompt an LLM to write one) cannot be unboundedly easy.

X23 - Randomized Algorithms and their Complexity Classes

Randomized algorithms and their consequences have been a disaster for my GPA Or they would be, if the science faculty would let me take the damn 499 Shut up and take my money!

Appendix

implementations of functions with ptimive recursion notes 16


  1. In my (admittedly basic) experience, algebraic structures are described as a tuple of sets and operators defined over those sets, as well as a collection of axioms that describe the behavior of the operators, e.g. a ring is a \(3\)-tuple \((R, +, \cdot)\) where \(+,\cdot\) are binary operators over \(R\) such that [ring axioms]. I've found this structure of description to be a useful constraint for understanding because it forces me to reduce a structure to its essence when describing it this way, much how implementing a program in assembly requires absolute understanding of each internal mechanism. Anyway, this kind of description probably exists in every field of math, but I associate it with algebra the most.↩︎