Formal Languages Automata and Computability `CMPUT474`

01 - Introduction

Recently, computing has escaped the need to happen within a formal system: LLMs let computation occur through natural language instructions. But, they are still computation, and thus must still obey its most general laws. This shows the utility of having abstract models of computation.

In the philosophy of this course, we define computation as mechanized mathematics. Math is the study of formal systems, described as mechanically checkable reasoning, i.e. by the vehicle of formal proof. Computer science studies the subset of this reasoning that can be mechanically generated as well. We also view a computer as a construct able to capture a universal formal system.

Fundamental lessons of this course:

A program having a simple description doesn't guarantee it has simple behavior
Dealing with infinity requires careful reasoning
Self-reference often leads to weird behavior that provides deeper insight

Behavior of Simple Mechanisms

What is the difference between the description of a program and its behavior?

It is important to note that the description (e.g. code, etc.) of a program is something separate from the program's behavior. More on this later.

Cellular Automata

We can describe a 1D cellular automaton as a set of rewrite rules that act on a 1D array of bits over (discrete) time. To advance one generation, we iterate over memory and match the pattern around (below, \(n=3\)) each bit and replace it accordingly.

The description is the set of rewrite rules, and possibly the initial state of the array
The behavior of the program is the evolution of the array over time
Patterns consist of \(3\) bits → \(2^{3}=8\) patterns → \(2^8=256\) distinct rewrite rules from those patterns
Rules are named by ordering the patterns in lexicographic order and reading the bits of the re-write results. So, rule \(0\) replaces every pattern with \(0\) and rule \(255\) replaces them all with \(1\).
We could also conceptualize the cellular automaton describing a pattern in 2D memory (i.e. exchanging the time dimension for another spatial one)

For some programs, predictable behavior emerges.

Periodic behavior implies a location in memory cycles between the same values
The behavior may be aperiodic, but have recursive structure (canonically, the Sierpinski triangle generated by rule \(90\))
Behavior may be symmetric

In theory, we could write a "simpler" program to output the same result; we can predict the state of any location in memory at a given time. In this sense, the pattern generated by the cellular automata is compressible.

Aside: it seems like here we're using "compressibility" as a characterization of complexity, since by definition complex (high-entropy) behavior cannot be compressed. Reminds me of information theory

However, some rules (famously, rule \(30\)) are "unpredictable" (although they may have some structure). There may be no way to predict the state of memory; executing the program is the only way to find out. So, the program cannot be "compressed" at all.

Rule \(30\) is described as "being at the edge of chaos"

Simple Rules → Complex Behavior

In general, there is no way to predict the behavior of a program from its description, even when that description is simple.

We note with the 1D cellular automata, the "complexity" of behavior was wildly different despite the descriptions sharing the exact same structure
So, simple rules can lead to complex behavior

Rule \(30\) specifically is a universal computer; it can be used to execute any computation (proven in 2000), and is thus the simplest known Turing-complete system.

A universal computer is "closed under simulation" (nice quote!)
Perhaps "universal computability" is the true form of "incompressibility"
Conway's game of life is also a universal computer (here it is simulating a Turing machine)
Interpreters (e.g. LISP interpreter) are generally universal programs
The "interpreter" implied by the circuitry of a CPU executing instructions is a physical universal computer

Math is Hard

Mathematics is also, categorically, the study of consequences of simple sets of rules (axioms). Clearly, complex behavior can emerge from simple sets of axioms, so math may also be "incompressible", i.e. not amenable to general, overarching theories (sorry category theorists!).

Math tries to predict outcomes based on rules; science tries to predict rules based on outcomes. So, science is also hard since it grapples with the same rules/outcomes paradigm, just in reverse

Finiteness Vs Infiniteness

Every program is a string (a finite length sequence of characters from a finite alphabet \(\Gamma\)). Strings are a really nice way to describe programs; before this, we used the naturals, which was just gross. Read the original proof of Gödel's incompleteness theorem if you don't believe me.

Strings are not bounded in length; they can be arbitrarily long, just not infinitely long
So, there are infinitely many strings
Using strings as the basic structure for reasoning about programs is a contribution of Turing

Since strings are finite, the set of all programs must also be finite. But since there are infinitely many strings, there are infinitely many programs.

Turing machines must have finite state controllers
If programs can be infinitely long, any computation could devolve into a giant if-statement/lookup table that spans all possible behavior. Thus, all programs would be \(O(1)\). Absurd.

This does NOT mean that every problem can be solved by a program though.

Decision Problems

A decision problem is problem where it must be determined whether every string in an alphabet is "true" or "false" (e.g. graph isomorphism problem). Informally, a decision problem is a question that either has a true or false answer. This implies a subset, namely the subset of "true" strings. In fact, there is a one-to-one correspondence between subsets and decision problems.

decision : String -> bool

If we consider decision problems in general (i.e. over the set \(\mathcal{S}\) of all strings), we find that each subset of strings (i.e. each element of \(\mathcal{P}(\mathcal{S})\)) corresponds to exactly one decision problem. We know that \(\#\mathcal{S} < \# \mathcal{P}(\mathcal{S})\) even when \(\mathcal{S}\) is infinite, so there must be more decision problems than strings, and thus there must be more decision problems than programs.

So, not every decision problem has a program that can compute its answer
Equivalently, there are programs that do not have a finite description
Equivalently, not every set can be described in a finite way (since a program is a description of a subset)
- Aside: link to semantics here

Cantor's Theorem

We can't just compare the sizes of infinite sets "normally"; we find that sets have the same cardinality iff a bijection can be drawn between them.

\(\mathbb{N}\), \(\mathbb{Z}\), \(\mathbb{Q}\) all have the same cardinality, but \(\mathbb{R}\) is larger (take MATH 216 or 217 for this proof)

Cantor's Theorem

Theorem

For any set \(S\), \(\# \mathcal{P}(S) > \# S\), i.e. no bijection \(f : S \to \mathcal{P}(S)\) exists.

Proof: Towards a contradiction, let \(f : S \to \mathcal{P}(S)\) be a bijection. Consider \(T :=\set{s \in S \mid s \not\in f(s)}\). Let \(s_0\in S\) such that \(f(s_0)=T\) (this must exist since \(T \in \mathcal{P}(S)\) and \(f\) is bijective). The case \(s_0 \in T\) contradicts the definition of \(T\), but \(s_0 \not\in T\) implies that \(s_0 \in f(s_0)=T\), another contradiction. So, no such \(f\) can exist, so \(S\) and \(\mathcal{P}(S)\) have different cardinalities.

The set of decision problems is the power set of the set of strings, since each string can either be true or false. A decision problem is a subset of the set of all strings. Programs are also strings. Every decision problem being describable by a program implies a bijection from the set of strings to the power set of strings, which we've just shown can't exist. Thus, there are decision problems that aren't describable with programs.

Self-Reference is Interesting

Cantor's theorem and the result that some decision problems don't have a program to describe them are both non-obvious; both problems are proving non-existence in an infinite domain. It is not a coincidence that both proofs involved self-reference:

Cantor's theorem: considering a bijection involving a set defined in terms of the bijection itself
Decision problems: considering programs themselves as being part of the domain over which decision problems are defined

Allowing and utilizing self-reference often cuts to the core of questions, yielding both weird behavior and important results. Particularly, self-reference is a shortcut to finding contradictions.

It is Always Possible to Write Non-halting Programs

As we've seen, it is possible to create universal programs (and relatedly, universal languages which can express any computation → whose interpreter must be a universal program). However, it is impossible to write a universal programming language in which every program halts. The existence of universal programs implies non-halting programs.

We use a diagonal proof. List all countably infinite programs expressible by the language \(p_1, \dots\), then through some traversal run each program with each program's string representation as input (a countably infinite traversal trivially exists). Define traversal \(V\) that acts as normal unless a program is run with itself as input, in which case it negates the result. \(V\) is a program we can write, so it is expressed by \(p_i\) for some \(i\in \mathbb{N}\), meaning we evaluate \(V(V)\) at some point. This calls \(V\) again, which eventually evaluates \(V(V)\), which calls \(V\) again, etc. Clearly, this program recurses forever and never returns.

Essentially, any language complicated enough to support universal computation also supports some recursive or loop structure, and thus infinite loops can be created in it.

Semantics is Hard Too

Questions about the semantics (meaning/behavior) of programs are susceptible to halting-style diagonal arguments, and thus often provably impossible to answer in the general case, i.e. write a program to answer.

A central tenet of theoretical computer science is that there is no way to understand the behavior of programs in general.

Undecidable Questions about Semantics

Example

Will a given program halt on a given input? (Halting Problem)
Are two programs semantically equivalent, i.e. do they always provide the same output given the same input?
Can any program with a finite description be solved by another program, i.e. can its behavior be determined without simulating it?

We know these are undecidable by reducing the halting problem to them
Often, we can answer specific cases or subsets of cases, but never in the general case. E.g. it's easy to write a program that checks if a goto-branch loop exists; that type of non-halting behavior can be detected.

These limits exist on every formal system, e.g. programming languages. Your static code analyzer can only get so good.

However, questions about syntax (description) are less susceptible to these arguments.

E.g. we can prove that languages with while loops are more powerful than those with just for loops
Aside: the syntax description and semantics behavior equivalence is useful to keep in mind.

02 - Number Systems and History of Computation

Real Numbers

Wait, why does this course have a real analysis co-req?

Construction

Before being able to use real numbers, we need to understand them, and thus we need to construct them. To bake an apple pie from scratch, one must invent the universe; we start by constructing \(\mathbb{N}\).

\(\mathbb{N}\) is constructed by the Peano axioms.

We extend \(\mathbb{N}\) to \(\mathbb{Z}\) by requiring closure under additive inverses.

We extend \(\mathbb{Z}\) to \(\mathbb{Q}\) by requiring closure under multiplicative inverses. \(\mathbb{Q}\) is defined as the classes defined by the equivalence relation \((a,b)\sim(c,d)\iff ad=bc\) over \(\mathbb{Z}\times \mathbb{Z}\), with definitions of addition, multiplication, etc. to match.

Aside: Equivalence relations are useful when we have multiple "overlapping" structures; they are easy to work with because we define what's the same and let the structure inherit from that

\(\mathbb{Q}\) is archimedean and dense in \(\mathbb{R}\), but almost all real numbers are not rational. Proving this is a nontrivial exercise in real analysis; we prove existence of an irrational number, namely \(\sqrt{2}\) as is customary.

Proof that \(\sqrt{2}\) is irrational

Proof

Suppose \(\sqrt{2}=\dfrac{p}{q}\) for coprime \(p,q\) (so \(\dfrac{p}{q}\) is in simplest form). Then \(p^2=2q^2\), so \(p^2\) and thus \(p\) are even. We assumed \(\dfrac{p}{q}\) is in simplest form, so \(q\) must be odd. \(p\) is even, so \(p=2r\) for \(r \in \mathbb{N}\). So, \(p^2=(2r)^2=2q^2 \implies 2r^2 =q^2\), so \(q^2\) and thus \(q\) are even. Absurd.

Any equation involving addition, multiplication, and their inverses can be solved in \(\mathbb{Q}\), but equations of the form \(x^n=a\) aren't solvable in general. Adding radicals \(\surd\) to \(\mathbb{Q}\) doesn't solve everything either; polynomials of the degree \(5\) and higher aren't generally expressible in terms of the operations we have (a result of Galois theory). Again, we can define the algebraic numbers \(\mathbb{A}\) as the algebraic closure of the rationals, but not only does this not contain all of \(\mathbb{R}\) (i.e. transcendentals like \(\pi\) and \(e\)), but we now have members of \(\mathbb{A}\) not in \(\mathbb{R}\), e.g. \(\sqrt{-1}=:i\).

We finally construct \(\mathbb{R}\) from \(\mathbb{Q}\) by considering all the possible limits of (Cauchy) sequences of rational numbers, and including these in the number system as well. E.g. although \(e\) is transcendental, we do have \(e:=\displaystyle\lim_{n\to \infty}\left(1+\dfrac{1}{n}\right)^n\). \(\mathbb{R}\) is the closure of \(\mathbb{Q}\) under the inclusion of limits of cauchy sequences.

This definition mostly works enough to be a foundation for real analysis

There's really no reason to stop here. We define the complex numbers \(\mathbb{C}\) as the algebraic closure of \(\mathbb{R}\), i.e. the set of solutions to any algebraic equations in \(\mathbb{R}\) (much like \(\mathbb{A}\) is a algebraic closure of \(\mathbb{Q}\)).

In general, new number systems are motivated by closure under some operation or construction. We can generalize the idea of a closure operator.

\(\mathbb{R}\)? In My Computations? It's Less Likely than You Think

Not all numbers are computable, i.e. for which a terminating algorithm for approximating them to arbitrary precision exists. By definition, if a closed form limit exist, we can compute it. Clearly, it's ontologically difficult to describe an uncomputable number. But, since there are countably finitely many programs and uncountably infinitely many reals, most reals are uncomputable.

Clearly we can't use the cauchy sequences since we need infinitely many rationals to represent an uncomputable real, and dealing with limits is annoying for precise computation anyway. We have this problem with \(\mathbb{R}\) because its "closure operator" is real-analytic, as opposed to a clean, "algebraic" operator.

Cantor's Diagonal Argument

Proof

We consider the interval \([0, 1)\) in \(\mathbb{R}\). Each number in this interval can be represented as a infinite "base-2-decimal" sequence of \(1\)s and \(0\)s. Assume a bijection between \(\mathbb{N}\) and \([0,1)\). Construct a new real \(\alpha\) by toggling the first number at the first decimal place, the second one at the second decimal place, and the \(n\)th number at the \(n\)th decimal place. By construction, this real number is different from all the other ones in the bijection; so, the bijection cannot exist between \(\mathbb{N}\) and \([0, 1)\).

This structure is used to show many theoretical CS results, e.g. that there are more decision problems than strings.

Around this time, paradoxes were appearing in math that threatened the foundational assumptions of the time, namely that all of math could be built atop set theory:

Weierstrass function: everywhere continuous, nowhere differentiable. Intuition for calculus is based on everything being smooth and differentiable, with points of non-differentiability. Tools did not exist to deal with functions like that.
Russel's paradox (set theory): For \(R := \set{S \mid S \not\in S}\), is \(R \in R\)? This paradox comes from the axiom of choice, which is often omitted nowadays
Vitali set: partition of an open interval such that each partition has measure \(0\), but the union of the entire interval has measure \(3\).

Emergence of Computation

The field of computer science emerged from a desire by the mathematical community to construct all of math from arithmetic on natural numbers, much like how \(\mathbb{Z}\), \(\mathbb{Q}\), \(\mathbb{R}\), etc. were constructed.

Hilbert's problems (including 10th problem, which was proven unsolvable)
Russell and Whitehead attempt to build all of math on logic and set theory. To avoid paradoxes, they lay the foundations of what would become type theory (yay!)
- They are doing this with naturals, not strings. It turns out that natural numbers are really annoying to work with compared to strings.
- So corresponding every object with a natural number requires typing support
Hilbert tries to reduce everything to natural number arithmetic
Entscheidungsproblem: can a program be written to decide whether a logical sentence is true?
- Remember that number theory, etc. can be reduced to first order logic
- So, if this were true, "mathematics would be finished" because all of math could be proven mechanically
Gödel's incompleteness theorem: There is no decidable way to axiomatize the arithmetic of the natural numbers, i.e. any system complex enough to express \(\mathbb{N}\) would contain statements that it could not prove.
- Mathematics cannot be complete. Womp womp.
- This theorem also defined the simplest decidable systems
Church's undecidability theory
Turing: turing machines, strings, universality, halting problem
- Turing is also responsible for shifting the language for these problems to strings instead of the naturals: this marks the transition to "computer science" as it is known today.
1936: Church-Turing thesis unifies previous work and theorems:

Church-Turing thesis

Theorem

An "effective" method for performing arithmetic on the natural numbers can be described as / reduced to a Turing machine.

The primitives of computation are understood, so we start learning how to build computers.
- Bad timing: WW2 is about to start. So research is secret, not shared, and pointed towards ballistics, cryptography, etc.
Computers required upfront investment, so projects started at universities, but many profs left and created companies (startups), which got acquired by larger companies
Computation becomes commodified since it is useful for accounting, etc.
1980s: Personal Computers
- Spreadsheets, word processing, games
- Apple, IBM, Microsoft
1990s: Internet
- Browsers
- Google, amazon
2000s: Social and Media
- Facebook, youtube, netflix
- Data starts getting collected en-masse
2010s: machine learning
- Speech recognition, image understanding, machine translation, NLP, computer vision, starts to dominate CS
- We have enough data and computing power to make ML better at solving problems than bespoke human-designed systems
2020s AI
- OpenAI, etc
- Diffusion, transformer models
- Approaching human ability

03 - Finite State Computation

We start by defining the primitives we will use to construct a theory of computation

Are there simple languages we can write that correspond directly to primitives?
How restrictive does a model need to be before it stops becoming useful?

Finite State Computation

A finite state computation is a computation that takes a string \(w\in \Sigma^{*}\) and either accepts or rejects it following computation by a finite automaton.

Intuitively, a finite automaton is defined by a set of states (what the computation has come to at this point in time) and a function that defines a new state based on the current state and the current input being read from the input string.

Finite Automaton (FA)

Definition

A finite automaton \(M\) is a \(5\)-tuple \(M=(Q, \Sigma, \delta, q_0, F)\) with

A finite set \(Q\) of states, which includes the initial state \(q_0\)
A finite alphabet \(\Sigma\) from which input strings are constructed
A transition function \(\delta : Q \times \Sigma \to Q\)
A set of accepting states \(F \subseteq Q\) where the computation is allowed to halt

An FA can be expressed as a table indexed by the set of states and the set of characters that can form strings; the entries define the transition function at those coordinates.

More commonly, an FA is expressed as a state diagram, where states are drawn as nodes and the transition functions is defined by arrows connecting the nodes. Accepting states are drawn with double stroke.

Computation

Definition

Given finite automaton \(M\) and input string \(w = \sigma_1\sigma_2\dots \sigma_n\), the execution of \(M\) on \(w\) is the sequence of states (configurations) \(c_0c_1\dots c_n\) such that for all \(0 \leq i \leq n-1\), we have \(c_0=q_0\) and \(c_{i+1}=\delta_M(c_i, \sigma_{i+1})\), i.e. these states could happen in sequence.

\(M\) accepts \(w\) if the final state \(c_n\) is in \(F_M\). Otherwise, \(M\) rejects \(w\).

A finite automaton is a description of a program; a computation is the behavior of a program (given a specific input).

Since we assumed the input string is of finite length and the execution is in lockstep with the input string's symbols, all computations will halt! In fact, if there are \(n\) input symbols, there will be exactly \(n-1\) steps in the computation; this property is very useful, e.g. in the proof of THM 2 later.

This ceases to be true when considering infinite strings as input.

A state can be thought of as an equivalence class over the entire history of the computation. Namely, the representative of the "history class" is determined by its final state. In other words, there is no "memory" or "side effects" that affect how state is interpreted; behavior is determined entirely by the current state and input symbol

Languages

Given an alphabet \(\Sigma\), a language \(\Gamma\) is a subset of the set \(\Sigma^{*}\) of all strings, i.e. \(\Gamma\subseteq \Sigma^*\)

An FA \(M\) recognizes a language \(\Gamma\) if \(M\) accepts every string in \(\Gamma\), i.e. \(\Gamma=\set{w \in \Sigma^* \mid M \text{ accepts } w}\)
A language \(\Gamma\) is regular if and only if there exists some automaton \(M_\Gamma\) that recognizes it

Examples of languages:

The set of binary strings: \(\set{0,1}^\mathbb{N}\)
The set of strings with more \(1\)s than \(0\)s
The set of strings over ASCII that are valid lisp programs
The set of strings of even length
The set of periodic strings (strings that repeat some segment after a certain point)

Constructing a FA to recognize a language will be familiar to anyone who has taken CMPUT 415.

Cardinality Questions

A given automaton recognizes exactly one language, but a regular language has infinitely many FAs that recognize it (e.g. we can split edges, etc.)

Aside: we can define languages as equivalence classes over automata; automata are equivalent if they recognize the same language

The set of strings \(\Sigma^*\) of finite alphabet \(\Sigma\) is countably infinite, but since a program is characterized by which strings from \(\Sigma^{*}\) it accepts or rejects, the set of all languages over \(\Sigma\) is isomorphic to \(\mathcal{P}(\Sigma^{*})\), and thus is uncountably infinite. So, there are more languages than strings.

The set of all finite automata is also countably infinite since the input string, number of states and (therefore) size of transition functions are finite or countably infinite. By the same argument, there are more languages than finite automata. So, almost all languages are not regular.

Languages are isomorphic to decision problems over strings; languages \(\cong\) subsets of strings \(\cong\) decision problems \(\cong\) unary relations. In all of these cases, we can use the same \(|S| < |\mathcal{P}(S)|\) reasoning.

Regular Expressions

A regular operation is a binary operation \(\square : \mathcal{P}(\Sigma^{*}) \times \mathcal{P}(\Sigma^{*}) \to \mathcal{P}(\Sigma^{*})\) between languages that preserves regularity. I.e. if languages \(A\) and \(B\) are regular and \(\square\) is a regular operation, then \(A \, \square \, B\) is also a regular language.

So, regular operations are closed under regularity
A regular expression is an expression of languages and regular operations

Operation Name	Symbol	Definition
Complement	\(\overline{\Gamma}\)	\(\overline{\Gamma} := \set{w \mid w \not\in \Gamma}\)
Intersection	\(\Gamma \cap \Lambda\)	\(\Gamma \cap \Lambda:=\set{w\mid w \in \Gamma \land w \in \Lambda}\)
Union	\(\Gamma \cup \Lambda\)	\(\Gamma \cup \Lambda:=\set{w\mid w \in \Gamma \lor w \in \Lambda}\)
Concatenation	\(\Gamma\Lambda\) or \(\Gamma \circ \Lambda\)	\(\Gamma \Lambda :=\set{ab\mid a\in \Gamma \land b\in \Lambda}\)
(Kleene) Star	\(\Gamma^{*}\)	\(\Gamma^{*}:=\displaystyle\bigcup_{n=0}^{\infty}\set{w_1 w_2 \dots w_n \mid w_i \in \Gamma}\)

As a point of notation, \(w^n\) is shorthand for the \(n\)-times concatenation: \(ww\dots w\). We have \(w^0:=\varepsilon\)

Theorem 1

Theorem

The class of regular languages is closed under complement \(\overline{\Gamma}\).

Proof sketch: Let \(M\) recognize \(\Gamma\); define \(\overline{M}\) as the same as \(M\), but with accepting states \(Q_M \setminus F_M\) (as opposed to just \(F_M\)). If \(c_0\dots c_n\) is the sequence of configurations that accepts some input string \(w\), then the same sequence will be recognized by \(\overline{M}\). But, \(c_n\in F_M\) by definition, so we must have \(c_n\ne Q_M \setminus F_M \iff c_n\ne F_{\overline{M}}\), so \(w\) is not accepted by \(\overline{M}\). We can make the same argument in reverse to show a string accepted to \(\overline{M}\) is not accepted by \(M\). So, if \(\Gamma\) is regular, so is \(\overline{\Gamma}\).
Aside: a class is a collection of sets with a shared property; in this case, the collection of regular languages is a class because each regular language is a set.

Theorem 2

Theorem

The class of regular languages is closed under intersection \(\Gamma\cap\Lambda\).

Proof sketch: If \(M_\Gamma\) recognizes \(\Gamma\) and \(M_\Lambda\) recognizes \(\Lambda\), then we can construct \(M\) as "\(M_\Gamma\times M_\Lambda\)", i.e. such that \(Q_M:=Q_{M_\Gamma} \times Q_{M_\Lambda}\), \({q_0}_M=({q_0}_\Gamma, {q_0}_\Lambda)\), \(\delta_M((q_\Gamma, q_\Lambda), \sigma)=(\delta_{M_\Gamma}(q_\Gamma, \sigma), \delta_{M_\Lambda}(q_\Lambda, \sigma))\), and \((q_\Gamma, q_\Lambda)\in F_M \iff q_\Gamma\in F_{M_\Gamma} \land q_\Lambda\in F_{M_\Lambda}\). This executes both DFAs "in parallel". We know that for a given string \(w \in \Gamma\cup \Lambda\), a sequence of configurations of length \(n\) (since both times, all the input symbols get consumed) that accepts it exists \(\implies\) a corresponding sequence exists in \(M\) by construction.
Note: we can't just simulate \(M_1\), then \(M_2\) after; they need to be evaluated at the same time. This is why the proof isn't trivially conjoining the FAs to create the intersection.

04 - Nondeterministic Computation

We generalize finite state machines into non-determinism; we now differentiate between deterministic FAs (DFAs) and nondeterministic FAs (NFAs).

NFAs can model parallelism and nondeterminism
NFAs are usually more compact and expressive then their corresponding DFAs, and are thus easier to write and understand (although not necessarily to reason about)

Nondeterministic Finite Automaton (NFA)

Definition

A nondeterministic finite automaton \(M\) is a \(5\)-tuple \(M=(Q,\Sigma,\delta,q_0,F)\) where

A finite set \(Q\) of states, which includes the initial state \(q_0\)
A finite alphabet \(\Sigma_\varepsilon\) from which input strings are constructed
A transition relation \(\delta : Q \times \Sigma_\varepsilon \to \mathcal{P}(Q)\) (i.e. \(\delta\subseteq Q \times \Sigma_\varepsilon \times Q\) with the additional constraint that choices of (the initial) \(Q\) and \(\Sigma_\varepsilon\) must be unique)
A set of accepting states \(F \subseteq Q\) where the computation is allowed to halt

We define \(\varepsilon\) as the null character, which allows \(\varepsilon\)-transitions (spontaneous state transitions) that don't process the next input symbol. We define \(\Sigma_\varepsilon := \Sigma\cup \set{\varepsilon}\).

Our transition relation \(\delta\) specifies a set of possible states given the current state and the input character, i.e. \(\delta(q, \sigma)\subseteq Q\) for all \(q,\in Q\), \(\sigma\in \Sigma\). Note that this set can be empty, i.e. there are no possible states left for a given state/input pair.
An NFA accepts a string if an accepting execution trace exists

Generalizing \(\delta\) from a function to a relation is what makes an automaton non-deterministic; a function is a special kind of relations where each "input" uniquely determines an output (i.e. all the subsets contain a single element), whereas a relation allows multiple elements that may be freely chosen between for a given "input".

Since a given state/input pair may have multiple successors, the shape of our program's execution generalizes from an execution sequence to an execution tree of state/input pairs, which is defined analogously to an execution sequence. Program \(M\) accepts string \(w\) iff the execution tree contains a configuration with an accepting state.

If the NFA contains an \(\varepsilon\)-loop, the will be infinitely deep (see infinite data structures)
Every node in the tree has finite out-degree since \(Q\) and \(\Sigma\) are finite
Sequences only have one traversal, so their traversal is deterministic. Trees have many traversals. (Aside: how many? take combinatorics)

DFAs and NFAs Have the Same Computational Power

Two machines \(M_1\) and \(M_2\) are equivalent if they recognize the same language.

DFAs and NFAs are Equivalent

Theorem

Every NFA \(M\) has an equivalent DFA \(\tilde{M}\).

Proof sketch: since each node has finite out-degree, we can write a DFA that traverses the NFA's execution tree in every possible way and simulates it.
The full construction and proof are in the [[Notes 4.pdf|Lecture 4 Course Notes]].

By corollary, the following statements are logically equivalent:

Language \(\Gamma\) is regular
There exists a DFA the recognizes \(\Gamma\)
There exists an NFA that recognizes \(\Gamma\)

Regular Expressions (cont.)

By establishing an equivalence between DFAs and NFAs, we can use the new primitives of DFAs to prove statements NFAs/regular languages. In particular, we use \(\varepsilon\)-transitions to show that regular languages are closed under union, concatenation, and the Kleene star.

Theorem 3

Theorem

The class of regular languages is closed under union \(\Gamma\cup \Lambda\).

Proof sketch: Let \(M_\Gamma\) recognize \(\Gamma\) and \(M_\Lambda\) recognize \(\Lambda\). Define \(M\) such that \(Q=Q_\Gamma\cup Q_\Lambda \cup \set{q_0}\), \(q_0\) is a new state, \(F=F_\Gamma\cup F_\Lambda\), and define \(\delta\) to relate \(q_0\) to both initial states \(q_\Gamma\) and \(q_\Lambda\) and defer to the corresponding \(\Gamma\) or \(\Lambda\) in the other cases. Then, \(M\) recognizes \(\Gamma\cup\Lambda\) since it has the capacity to recognize both \(\Gamma\) and \(\Lambda\) "in parallel", as two branches in the execution tree.

Theorem 4

Theorem

The class of regular languages is closed under concatenation \(\Gamma\Lambda\)

Proof sketch: Let \(M_\Gamma\) recognize \(\Gamma\) and \(M_\Lambda\) recognize \(\Lambda\). Define \(M\) such that all of the accepting states \(F_{\Gamma}\) of \(M_\Gamma\) are connected to the initial state of \(M_\Lambda\) via \(\varepsilon\)-transitions, and the accepting states of \(M\) are the the accepting states of \(M_\Lambda\). Clearly \(M\) recognizes \(\Gamma\) then \(\Lambda\) "sequentially", so it recognizes \(\Gamma\Lambda\).

Theorem 5

Theorem

The class of regular languages is closed under the Kleene star \(\Gamma^{*}\)

Proof sketch: let \(M_\Gamma\) recognize \(\Gamma\). Define \(M\) with a new initial state \(q_0\), a set of states \(Q:=Q_\Gamma\cup\set{q_0}\), a set of accepting states \(F:=F_\Gamma\cup\set{q_0}\), and a transition function that adds \(\varepsilon\)-transitions from all the accepting states of \(M_\Gamma\) back to the initial state of \(M_\Gamma\) and adds an \(\varepsilon\)-transition from the new \(q_0\) to the initial state of \(M_\Gamma\). Then, we can choose to "skip" execution of \(M_\Gamma\) by terminating at \(q_0\) (since it is accepting), or choose to execute \(M_\Gamma\) and "return" to it an arbitrary number of times using the \(\varepsilon\)-transitions from its accepting states to its initial states. Thus, \(M\) recognizes \(\Gamma^{\star}\) by construction.

05 - Regular Expressions

Regular Expressions

A regular expression is an expression formed of regular operators operating on regular languages. Thus, the language of regular expressions is itself a regular language; we see regular expressions as a meta-language that can construct regular languages syntactically, as opposed to the "semantic" way (listing the strings in the language).

Aside: this is an instance of the fundamental "types vs. sets" dichotomy: specifying a language by listing strings is the "set theory" approach, whereas constructing languages syntactically from other languages is the "type theory" approach.

We recursively (in order of precedence) define \(R\) as a regular expression if

\(R\) is \(\varepsilon\) (the empty string), \(\varnothing\) (the empty expression), or a symbol \(a \in \Sigma\) in the alphabet (base cases)
\(R=(R_1)\) for regular expression \(R_1\) (precedence/grouping)
\(R=R_1^{*}\) for regular expression \(R_1\) (Kleene star)
\(R=R_1 R_2\) for regular expressions \(R_1, R_2\) (concatenation)
\(R=R_1\cup R_2\) for regular expressions \(R_1, R_2\) (union)

We define \(\mathcal{L}(R)\) as the language specified by regular expression \(R\), implicitly with alphabet \(\Sigma\). We also define the shorthand \(R=\Sigma\) for shorthand \(R=\sigma_1 \cup\dots\cup \sigma_{|\Sigma|}\), i.e. the language formed by the symbols of \(\Sigma\).

Examples of regular expressions (for \(\Sigma=\set{0,1}\))

\(\mathcal{L}((0\cup 1)^{*})\) is the language of strings all strings \(\set{0,1}^\mathbb{N}\) over \(\set{0,1}\). We also have \(\mathcal{L}((0\cup 1)^{*})=\mathcal{L}(\Sigma)\)
\(\mathcal{L}((0\cup 1)(0\cup 1)(0\cup 1))\) is the languages of strings of length \(3\) (\(\set{0,1}^3\)), \(\cong\) the set of vertices of a \(3\)-dimensional cube
\(\mathcal{L}(0^{*}\cup 1^{*})\) is the language of strings starting with a string of zeroes, followed by a final string of \(1\)s
\(\mathcal{L}((\Sigma \Sigma)^*)\) is the language of strings of even length

Identities for Regular Expressions

Lemma

\(R\cup \varnothing=R\) (note: \(R \varnothing\ne R\) in general)
\(R\varepsilon=R\) (note: \(R \cup \varepsilon\ne R\) in general)

Regular Expressions and Regular Languages are Equivalent

Theorem

A language \(\Gamma\) is regular if and only if there exists regular expression \(R\) such that \(\mathcal{L}(R)=\Gamma\).

Any problem specified by a regular expression can be implemented by a finite automaton
Any decision problem that can be implemented by a finite automaton can be specified as a regular expression

So, we now have a four-way equivalence between regular languages, regular expressions, DFAs, and NFAs.

Proof sketch \(\impliedby\): We simply construct and compose NFAs as we did when we defined the regular operators; we have more or less proven everything we need already. For the base cases, we have:

\(R=a\) for \(a\in \Sigma\): The initial state leads directly to the accepting state; the one transition consumes \(a\)
\(R=\varepsilon\): our initial state is the accepting state. There are no transitions.
\(R =\varnothing\): the initial state is the only state, but it is not accepting (since the empty regular expression specifies a language that doesn't accept any string)

We must define more concepts to prove \(\implies\).

Generalized NFAs

Generalized NFA

Definition

A generalized NFA (GNFA) \(N\) is a \(5\)-tuple \(N=(Q,\mathcal{R},\delta,q_0,q_a)\) where

A finite set \(Q\) of states, with initial state \(q_0\in Q\) and single accepting state \(q_a\in Q\).
A finite set \(\mathcal{R}\) of regular expressions, implicitly over alphabet \(\Sigma\)
A transition function \(\delta:(Q\setminus\set{q_a})\times(Q\setminus\set{q_0})\to \mathcal{R}\) that maps appropriate pairs of states to a regular expression in \(\mathcal{R}\).

A GNFA accepts input \(w\) iff there exists a sequence \(q_0, c_1,\dots,c_{k-1},q_a\) of configurations where \({w_{i+1}\in \mathcal{L}(\delta(c_i, c_{i+1}))}\) for \(i\in\set{0, \dots, k-1}\). So, moving to a new state requires that the regular expression defined by the pair of states (in \(\delta\)) recognizes the input that it consumes.

DFAs are a subset of GNFAs

Lemma

For any DFA, we can construct a GNFA that recognizes the same language.

Proof sketch: any DFA corresponds to a regular expression, so we can construct a GNFA with two states (one initial state and one final state). These states are connected via a transition that corresponds to the (a) regular expression defined by the DFA. Full proof in [[Notes 5.pdf|course notes]].

Reduction Procedure for a DNFA

Algorithm

If the DNFA has two states, we are done.

Otherwise, choose a state (that isn't the initial or final one) and contract (in the graph theory sense) the DNFA with respect to that state. This requires coalescing the regular expression (where \(q_\times\) is the state to be remove and \(q_i,q_j\) any pair of REs on "either side" of \(q_\times\)):

\(\delta'(q_i,q_j):=\delta(q_i, q_\times)\delta(q_\times,q_\times) ^* \delta(q_\times,q_j)\cup \delta(q_i,q_j)\), i.e. we append \(\delta(q_i, q_\times)\) and \(\delta(q_\times, q_j)\) together while accounting for possible self-referring \(\delta(q_\times, q_\times)\) and existing paths \(\delta(q_i,q_j)\).'

We keep contracting until there are only two states left.

\(\mathcal{L}\) is Invariant under Reduction

Lemma

For any GNFA \(N\), \(\mathcal{L}(\text{Reduce}(N))=\mathcal{L}(N)\). Furthermore, if \(N\) has two states, then \(\mathcal{L}(N)=\mathcal{L}(\delta(q_0, q_a))\)

Proof sketch: we proceed on induction; the IH this that the part of the DNFA we replace is equivalent to the regular expression we replace it with.
Full proof in [[05 - Regular Expressions|course notes]].
Aside: this property can be roughly stated as "repeatedly applying reduction yields a canonical form with respect to \(\mathcal{L}\)"

diagrams

TODO: add a table with all the NFA diagrams to the appendix in a "constructing NFAs from regular expressions section

06 - Limits of Finite State Computation

Not Everything Can Be Finite-state Computed

The language \(\Gamma_1:=\set{0^{n}1^{n}\mid n\geq 0}\) (i.e. language formed by a substring of \(0\)s followed by a substring of \(1\)s of the same length) is cannot be regular because as \(n\to \infty\), we would need an infinite amount of states (or memory) to "remember" \(n\).

Note that \(\Gamma_1 :\set{0^{n}1^{n}}\) for some fixed \(n\in \mathbb{N}\) is regular since \(n\) is finite in this case

Clearly, any finite language is regular because we can "hard-code" each string into the FA directly, so all non-regular languages must be infinite. Furthermore, if a language is infinite, it has no bound on the length of strings, so its regular expression representation must use a Kleene star. Specifically:

Pumping Lemma for Regular Languages

Theorem

For regular language \(\Gamma\), there exists length \(p\) such that for any string \(w\in \Gamma\) longer than \(p\) (i.e. \(|w| > p\), we can partition \(w\) as \(w=xyz\) where

\(|y|>0\), i.e. \(y\) is not \(\varepsilon\)
\(|xy|\leq p\), implying the lengths of \(x\) and \(y\) are less than \(p\) \(\implies\) \(z\) is not empty either
\(xy^{i}z\in \Gamma\) for all \(i\geq 0\). So, \(\Gamma(xy^{*}z)\subseteq \Gamma\)

I.e. for sufficiently long strings in \(\Gamma\), there is always a substring that can be repeated arbitrarily.

Here, \(x\) is a prefix, \(y\) is "looped" with \(^*\), and \(z\) is a suffix
This is trivially true for the finite case: we simply find \(x=w\) and \(y,z=\varepsilon\)
"Pumping down" refers to not taking the loop at all, i.e. \(i=0\). "Pumping up" refers to traversing the loop many times, i.e. \(i \geq 2\).
Aside: this seems related to block decomposition from MATH 239

Proof sketch (general idea): Let \(\Gamma\) be regular and recognized by DFA \(M\). Pick \(p = |Q_M|\). If we have some string \(w\) that is longer than \(p\), the computation of \(M\) with \(w\) as input must pass through at least one state (denoted \(q_!\)) twice by the pigeonhole principle, and thus a loop exists. Let \(\sigma_!\) be the input symbol when the loop is first encountered. If we were traversing the DFA "for fun", we could choose to follow the loop as many times as we wanted. Thus, any modification of \(m\) from arbitrarily duplicating the substring corresponding to the loop will also be recognized by \(M\).

Note that the pumping lemma is necessary for regularity, but not sufficient: there are irregular languages that conform to the pumping lemma.

So, it cannot be used (alone) to prove regularity, just irregularity

Proving a Language Isn't Regular

Using the Pumping Lemma

If a language is regular, it must adhere to the pumping lemma. Thus, by contrapositive, if a language \(\Gamma\) has a string \(w\) such that all \(w=xyz\) decompositions have some \(i\) such that \(xy^{i}z\not\in \Gamma\), then \(\Gamma\) can't be a regular language.

Since input symbols are consumed in lockstep with state transitions in a DFA, a counterexample DFA should have \(p\) states for a proof with string \(w\) where \(|w|\geq p\).
Often, the constraints \(|y|>0\), \(|xy|\leq p\) coincide with the definition of the language to make \(y\) easier to deal with

Example: Consider \(\Gamma_1:=\set{0^{n}1^{n}\mid n\geq 0}\); assume towards contradiction that it is regular. Then a DFA \(M_{\Gamma_1}\) with some \(p\in \mathbb{N}\) states that recognizes \(\Gamma_1\) exists. Consider the string \(w_\times:=0^{p}1^{p}\in \Gamma_1\). Any decomposition \(w_\times = xyz\) where \(|y|>0\) and \(|xy|\leq p\) implies \(y=0^{k}\) for some \(k\leq p\), since \(xy\) itself must be a string of \(0\)s (since there are \(p\) \(0\)s). But, since all the \(1\)s in the string are in \(z\), the string \(xy^{2p}z\) must have more \(0\)s than \(1\)s, and thus \(xy^{2p}z\not\in \Gamma_1\). If \(\Gamma_1\) were regular, we would have \(xy^{2p}z\in \Gamma_1\) by the pumping lemma. So, \(\Gamma_1\) is not regular.

Using the Recursive Nature of Regular Expressions

Closure!

If \(\Box\) is a regular operator and \(\Gamma_{\text{Reg}}, \Lambda_{\text{Reg}}\) are regular languages, then we know \(\Xi:=\Gamma_{\text{Reg}} \Box \Lambda_{\text{Reg}}\) is also regular. Thus, by contrapositive, if \(\Xi_{\lnot \text{Reg}}:=\Gamma\Box\Lambda\) is not a regular language, then at least one of \(\Gamma\) and \(\Lambda\) isn't a regular language as well. More specifically, id \(\Xi_{\lnot \text{Reg}}:=\Gamma_\text{Reg}\Box\Lambda\) for regular \(\Delta_\text{Reg}\), then \(\Lambda\) must not be regular.

This all follows from the fact that regular languages are closed under regular operations

So, to prove some \(\Gamma\) is not regular, we define a language known not to be regular as the product of a regular operation on a regular language and \(\Gamma\).

If we know a specific language is not regular and we are trying to prove a more general superset of that language is irregular, then we try to define this subset as an intersection between the superset and a known regular language.

Example: Consider \(\Gamma_2\) as the language of strings with alphabet \(\Sigma=\set{0,1}\) that have the same number of \(0\)s as \(1\)s. Note that \(\Gamma_1=\Gamma_2 \cap L(0^{*} 1^{*})\). Clearly \(L(0^{*} 1^{*})\) is regular (it is defined by a regular expression), and we know that \(\Gamma_1\) is not regular. So, \(\Gamma_2\) must be irregular as well.

Example: Consider \(\mathsf{R}\) as the language of regular expressions (specifically, where \(\varepsilon\) is a valid regular expression) and \(\mathsf{P}\) as the language consisting of strings of parens, i.e. \(\mathsf{P}:=\set{(,)}^{*}\). Clearly \(\mathsf{P}\) is regular. We note that \(\mathsf{R}\cap\mathsf{P}\) is the set of balanced parenthesizations; it is common knowledge that this language is not regular. Thus, since \(\mathsf{R}\cap\mathsf{P}\) is not regular but \(\mathsf{P}\) is, it must be that \(\mathsf{R}\) is not regular.

Note: closure arguments are recursive; proving a language is regular is achieved by proving some other (presumably simpler) language is regular. Using the pumping lemma is the only "base case" that we have other than direct proof, so all roads will lead to it eventually.

07 - Pushdown Automata

Finite-state computation can be generalized into a more powerful model of computation by removing the requirement that the set of states is finite. Of course, the program specification must still be finite, so we can't just use an infinite lookup table. But, if the state transition function/relation has structure, it may have a finite amount of information, and thus could be expressed finitely.

Aside: information theory link and helpful, look at this again
Aside: We've already seen how thinking about things in terms of "compression" can be helpful. This is true of information theory in general; information and entropy come to mind. If the transition function has low (enough) entropy, it carries a finite amount of information, and thus can be expressed finitely.

We extend our model by adding finitely-addressable memory (i.e. countably infinite memory, where addresses are in \(\mathbb{N}\)) that can be used to store the state of the computation.

A natural data structure to store in memory is a stack, where a stack counter stores the address of the top of the stack. Thus, although the stack can grow without bound, changes are local, i.e. always occur at the stack pointer, which moves "continuously in \(\mathbb{N}\)".

This locality lets us put useful bounds on a program's behavior
Aside: I'm defining "continuously in \(\mathbb{N}\)" as stepwise motion by \(1\) unit. ==Any useful definition of continuity isn't defined over \(\mathbb{N}\) because limits can't really be defined without rationals==.

At each step of the computation, a pushdown automaton will advance \(1\) step on the input string, read the character, pop from the stack, and use this information to advance to a new state and push another symbol onto the stack.

Deterministic Pushdown Automata (DPDA)

Definition

A deterministic pushdown automaton (DPDA) is a \(6\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, F)\) with

A finite set \(Q\) of states, which includes the initial state \(q_0\)
A finite input alphabet \(\Sigma\), extended to \(\Sigma_\varepsilon:=\Sigma\cup \set{\varepsilon}\)
A finite stack alphabet \(\Gamma\), extended to \(\Gamma_\varepsilon:=\Gamma\cup\set{\varepsilon}\)
A transition function \(\delta : Q \times\Sigma_\varepsilon \times \Gamma_\varepsilon \to \set{Q \times \Gamma_\varepsilon}\cup \set{\varnothing}\) that maps the current overall state of the computation (over the state \(\in Q\), input symbol \(\in \Sigma_\varepsilon\), and stack \(\Gamma_\varepsilon\) to a new state and stack action)
A set of accepting states \(F \subseteq Q\)

I.e. it is an extension of a DFA to include a stack, which has its own alphabet and factors is a dimension of the transition function.

More on the transition function \(\delta\):

Just pushing is achieved by popping \(\varepsilon\), and just popping is achieved by pushing \(\varepsilon\).
If \(\delta(q, \sigma, \gamma)=(q', \gamma')\), then at state \(q\) with input \(\sigma\) and \(\gamma\) popped from the stack, the DPDA will advance to state \(q'\) and push \(\gamma'\) onto the stack
For each possible input triple \((q, \sigma, \gamma)\), exactly one of \(\delta(q, \sigma, \gamma)\), \(\delta(q, \sigma, \varepsilon)\), \(\delta(q, \varepsilon, \gamma)\), and \(\delta(q, \varepsilon, \varepsilon)\) is not \(\varnothing\); so, every input triple will meaningfully interact with the stack
- We could define an equivalence relation over \(Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon\) where classes comprise of these \(4\) corresponding triples; the canonical form of each class is the non-\(\varnothing\) triple.

As with before, a configuration is a triple \(c=(q \in Q, u\in \Sigma^{*}, s\in \Gamma^{*})\) and an execution trace of string \(w\in \Sigma^{*}\) is the (unique) sequence of configurations generated by the computation on \(w\). \(M\) accepts \(w\) if it reaches a configuration \(c_\ell=(q_k , w, s)\) where \(q_k\in F\).

Clearly, the set of FAs is isomorphic to the subset of DPDAs that simply don't interact with the stack, i.e. given by \(M_\text{FA}:=(Q, \Sigma, \varnothing, \delta, q_0, F)\).

What's New?

Fundamentally, stacks let us traverse trees; in the degenerate cases of a tree being a path (i.e. a linked list, for the data structures brainrot enjoyers among us), a stack lets us count whatever patterns we want. So, languages like \(\set{0^{n}1^{n}\mid n\geq 0}\) can be recognized by DPDAs:

I gotta make my own illustration

Nondeterministic Pushdown Automata (PDA)

Deterministic Pushdown Automata (PDA)

Definition

A deterministic pushdown automaton (PDA) is a \(6\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, F)\) with

A finite set \(Q\) of states, which includes the initial state \(q_0\)
A finite input alphabet \(\Sigma\), extended to \(\Sigma_\varepsilon:=\Sigma\cup \set{\varepsilon}\)
A finite stack alphabet \(\Gamma\), extended to \(\Gamma_\varepsilon:=\Gamma\cup\set{\varepsilon}\)
A transition relation \(\delta \subseteq: Q\times \Sigma_\varepsilon\times \Gamma_\varepsilon \times Q \times \Gamma_\varepsilon\), which can be thought of as a function \(\delta : Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\)
A set of accepting states \(F \subseteq Q\)

Like before for DFA → NFA, \(\delta\) is extended such that a given configuration could lead to multiple other configurations, or none at all. So, computations are not deterministic and the execution structure is a tree, not a sequence.

Aside: just a like a sequence over domain \(D\) is a function \(f : \mathbb{N}\to D\), a tree over domain \(D\) can be thought of as a function \(f : \mathbb{N}^{*}\to D\).

Also like before, since the the set of functions \(A \to B\) is a subset of of the set of relations over \(A\times B\), the set of DPDAs is a subset of the set of PDAs.

However, unlike before, PDAs are strictly more powerful than DPDAs; they are not equivalent like DFAs and NFAs are! So, adding a stack augments the power of NFAs more than the power of DFAs.

Asides:

I'm guessing some of this will get covered later remove if it is

With the memory model, we can use a lot more than just one stack:

\(k\) stacks: we keep track of \(k\) tops of stacks at the start of memory. Then, each stack occupies the slice of memory mod \(k\), i.e. stack \(m\) has addresses \([m]_k\), namely \(\mathbb{N}k+m\).
Arbitrary number of stacks: we pick some \(k\) and store it and the heads of each stack. If we need more than \(k\) stacks, we copy all the memory into the first suitable region where it won't be overwritten and copy the stacks back into memory mod \(k+1\). Since the amount of memory accessed/used will always be finite (since computation is finite), this can always be performed.
- Note: having \(k\) → heads of \(k\) stacks → actual stacks is like the superblock/node structure of file systems a la CMPUT 379.

08 - Context-Free Grammars

Reese lecture :))

So far, we've used specifications to generate target languages and defined programs that can mechanically recognize strings against a specification. For a given "language class", bijective pairs of specification and program structures exist, i.e. where every specification can be implemented as a program and every program implements a specification.

E.g. Regular languages are described by regular expressions and can be recognized by finite automata.

This pattern will continue: context-free languages are described by context-free grammars and can be recognized by pushdown automata (more on this in this lecture)

Context-free Grammars

Context-Free Grammar

Definition

A context-free grammar (CFG) \(C\) is a \(4\)-tuple \(G:=(V, \Sigma, E,S)\) where:

A is a finite set \(V\) of variables/non-terminals
A finite alphabet \(\Sigma\) of terminal symbols; \(\Sigma\) is disjoint from \(V\), i.e. \(V\cap \Sigma=\varnothing\)
A finite set \(R\) of rules/productions of the form \(A\to w\), for variable \(A\in V\) and string \(w\) formed of variables and terminals, i.e. \(w\in (V\cup \Sigma)^*\). So, \(R\subseteq V\times (V\cup \Sigma)^*\)
A start variable \(S\in V\)

Aside: structurally, we see that the set \(R\) of rules is a binary relation over \(V\) and \((V\cup \Sigma)^*\)
Also, since rules must start with variables, there can be at most as many rules as there are variables. We have \(|R| \leq |V|\).

A derivation in a CFG \(G\) is performed by picking a start variable and iteratively substituting variable occurrences using production rules. Each derivation implies a parse tree whose nodes are strings (possibly containing variables) and whose edges are production rules.

A leftmost derivation is a sequence of substitutions where the leftmost variable is repeatedly substituted; a rightmost derivation is defined analogously.
These concepts are important for parsing theory; top-down parsers attempt to find leftmost derivations and bottom-up parsers attempt to find rightmost derivations.

A string \(w\) is generated by \(G\) (denoted \(S_G \overset{*}{\Rightarrow} w\)) if it can be derived from \(G\) and only contains terminals. The language \(\Gamma_G\) of grammar \(G\) is set set of strings generated by \(G\), i.e. \(\set{w\in \Sigma \mid S \overset{*}{\Rightarrow} w}\); if \(G\) is a context-free grammar, then \(\Gamma_G\) is a context-free language. A context-sensitive language is a language that isn't context-free.

Context-free grammars generalize regular expressions; CFGs are to pushdown automata as regular expressions are to finite automata. The extra power of CFGs comes from self-reference; we are allowed to bind a name to a syntactic construction (rule) and reference it later, which opens the door to recursion and the computational oomph that it entails.

Re: self-reference is interesting
So, all regular languages are also context-free. But, not all context-free languages are regular.

Our regular (lol) question persists: given a language, can we write a ~~regular expression~~ grammar for it?

Since CFGs can be expressed as strings, there are countably many CFGs, and thus countably many context-free languages. Thus, most languages aren't context-free.

Examples and Applications

Canonical examples of context-free grammars that aren't regular include the language of palindromes (over some alphabet \(\Sigma\)), the language of well-balanced parenthesizations (the Dyck language), and languages like \(\set{a^{n}b^{n}\mid n\in \mathbb{N}}\).

Recognizing these languages requires "keeping track" what has been seen before, like the number of previous characters or the actual string of characters themselves.
Aside: the number of well-balanced parenthesizations of length \(2n\) is given by the Catalan numbers \(C_n\)

Grammars are also notable because they can be used to describe programming languages (take CMPUT 415 for more on this in practice). Most languages are mostly context-free, although many have context-sensitive quirks.

E.g. C++'s template system is Turing-complete. So, the C++ syntax isn't just context-sensitive, it's undecidable (!!!); a C++ parser is a universal program.

CFGs also let us create decent models of natural language:

Notation

For larger grammars, it makes sense to write them over multiple lines, and adopt "code" typesetting (as opposed to mathematical typesetting). Backus-Naur form is standard, but I prefer to use ANTLR's syntax for its brevity and applicability. Note that ANTLR4 has some syntactic sugar, like regular expressions as nonterminals.

As a general convention, terminals are in ALL_CAPS and rules are lowercase
CMPUT 415 notes contain an overview of ANTLR4's syntax

Example: Grammar for common LISP
Adapted from https://github.com/antlr/grammars-v4/blob/master/lisp/lisp.g4


lisp_ : s_expression+ EOF;

s_expression
    : ATOMIC_SYMBOL
    | '(' s_expression '.' s_expression ')'
    | list
    ;

list : '(' s_expression+ ')';

ATOMIC_SYMBOL: LETTER ATOM_PART?;

ATOM_PART : (LETTER | NUMBER) ATOM_PART;
LETTER: [a-z];
NUMBER: [1-9];
WS: [ \r\n\t]+ -> skip;

If we must use math notation, we can coalesce rules starting with the same variable by writing them as alternatives, delimited by \(\mid\). So, \(\set{S \to \varepsilon, S \to (S), S\to SS}\) can be written succinctly as \(S\to \varepsilon\mid (S) \mid SS\).

Elementary Results

CFGs are a superset of regular expressions

Theorem

Given a regular expression \(R\) that recognizes language \(\Gamma\), a context-free grammar \(G\) can be written that generates \(\Gamma\).

By corollary, the class of regular languages is a subset of the class of context-free languages.

Closures of CFGs

Theorem

The class of context-free languages is closed under union, concatenation, and the Kleene star.

Proof: for each operator \(\square\), we consider the language formed by combining the sets of variables and rules of the two languages \(\Gamma\) and \(\Lambda\); we define \(\Gamma\, \square \, \Lambda\) to have start variable \(s\).

Union: Include the rule \(S \to S_\Gamma \mid S_\Lambda\) in the set of rules for \(\Gamma \cup \Lambda\). Then, we can construct any string in \(\Gamma\) or \(\Lambda\) by picking the corresponding alternative; thus, we can construct anything in \(\Gamma\cup \Lambda\).
Concatenation: Include the rule \(S \to S_\Gamma S_\Lambda\) (same reasoning)
Kleene star (unary): Include the rule \(S \to S_\Gamma S\)

Each time, just like with regular expressions, we defer the operations to the corresponding constructs for building grammars.

Whoopsie, made a oopsie

Warning

Context-free languages are not closed under intersection or complement.

This is unlike regular languages, which are closed under both.
Note that increasing the generality of a specification decreases the strength of results; specificity is strength and generality is weakness, in terms of results.

Ambiguity

A grammar \(G\) is ambiguous if the same terminal string can be derived in different ways, i.e. if a given string in the language \(\Gamma_G\) has multiple different parse trees.

E.g. if we don't define an order of operations, an expression like \(a \, \square \, b \, \triangle \, c\) is ambiguous; our parse tree be \((a \, \square \, b) \, \triangle \, c\) or \(a \, \square \, (b \, \triangle \, c)\), depending on what order we pick
In some applications (e.g. ANTLR4), the collection of rules is not a set, but an ordered set. Thus, the order of the rules matter; they imply an order of operations.

Context-free languages can be ambiguous; a context-free language is inherently ambiguous if every grammar that describes it is ambiguous.

E.g. \(\set{a^{i}b^{j}c^{k}\mid i=j \lor j=k}\) is inherently ambiguous

Chomsky Normal Form

Canonical form time 😎

A context-free grammar is in Chomsky normal form when every rule is of the form \(A \to BC\), \(A \to a\), or \(S \to \varepsilon\) for variables \(A,B,C\) (where \(B,C \ne S\)) and terminal \(a\).

Chomsky normal form is a normal form

Theorem

Any context-free language can be generated by a context-free grammar in Chomsky normal form.

We prove by construction; the following is a complete list of language-preserving transformations over grammatical rules:

We can always add a new start variable: \(S \to S_1\)
Form \(A \to \varepsilon\) (for \(A \ne S\)): remove the rule, change each other rule that produces a string with \(A\) to a version with \(A\) removed
Form \(A \to B\): desugar \(B\), i.e. replace \(A\to B\) with \(A\to b_1, \dots, A\to b_n\) where \(b_1, \dots, b_n\) terminals and \(B\to b_1, \dots, B\to b_n\) are rules
Form \(A \to aB\): Replace \(A \to aB\) with the rules \(A\to A_1 B\) and \(A_1 \to a\) (\(A_1\) is a new variable); we are wrapping the terminal \(a\) in the rule \(A_1 \to a\)
- Analogous procedure for form \(A \to Ba\)
Form \(A \to a_1\dots a_k\) for \(k \geq 3\): remove the rule and add rules \(A\to a_1 A_1, \dots, A_1 \to a_2 A_2, A_{k-2} \to a_{k-1}a_k\); we are chaining rules already in Chomsky normal form to construct the string
- Aside: this is similar to how 3SAT is SAT (CMPUT 304)

We can convert a grammar to Chomsky normal form by repeatedly applying these rules until no replacements can be made; the number of such replacements is finite. So, we are performing fixpoint iteration.

Proof: clearly, the number of possible rules \(|R|\) in a CFG is bounded above by \(|V| \displaystyle\sum\limits_{n=0}^{N}{{N}\choose n}n!\) (by combinatorial identity, \(= \displaystyle |V| \sum\limits_{n:=0}^{N}\sum\limits_{k:=N-n}^{N}k\)) where \(N:=|V|+|\Sigma|\), accounting for the possible RHS and strings in the disjoint union \(V\cup \Sigma\). The first \(4\) CNF rules are "terminal", and the last rule isn't recursive, so eventually we will run out of rules to replace (termination). correctness!!
Aside: since the CNF process is a set of rewrite rules, it acts like a grammar in some ways. But it really isn't a grammar (although it's similar) because it transforms an existing language instead of generating a new one.
Aside: the language of strings that describe CFGs in CNF isn't regular (because we need to check dependencies between rules). Could be a fun problem to figure out if it's context-free as well.
Aside: can we characterize part of this problem as a lattice?

Chomsky normal form is useful because it requires parse trees to be binary. Thus, there is a strong relation between the length of a string and the height and/or number of nodes in its parse tree.

09 - Equivalence of PDAs and CFGs

Our central theorem:

PDAs and CFGs are Equivalent

Theorem

A language can be recognized by a pushdown automaton if and only if it can be generated by a context-free grammar.

Recall that PDAs aren't deterministic; this lines up with CFGs. With a PDA, we nondeterministically choose a transition; with a CFG, we nondeterministically choose a substitution.

For this proof, we use the notion of a generalized pushdown automaton (GPDA), which can push entire strings to the stack instead of just characters.

So, a configuration update looks like \((q, w, s\gamma)\to (q', w \sigma, ss')\) where \(\gamma\in \Gamma\) and \(s'\) is a string in \(\Gamma^{*}\)

Equivalence of GPDAs and PDAs

Algorithm

GPDAs are equivalent to PDAs. All PDAs are trivially also GPDAs, so we prove by defining a procedure to construct an equivalent PDA given a GPDA:

Leave \(\Sigma\), \(\Gamma\), and \(F\) unchanged
Split members of \(\delta\) as follows to form \(\tilde{\delta}\). For transition rule \((q, \sigma, \gamma)\to (q', s')\) in \(\delta\):
- If \(s\) is of length \(\leq 1\), simply leave it as is (i.e. include it in \(\delta\))
- If \(s := \gamma_1 \dots \gamma_k\) has length \(k>1\), add new states \(\tilde{q}_1, \dots, \tilde{q}_{k-1}\) and split the rule in \(\delta\) into \((q, \sigma, \gamma)\to (\tilde{q}_1, \gamma_1), (\tilde{q}_1, \varepsilon, \varepsilon)\to (\tilde{q}_2, \gamma_2), \dots, (\tilde{q}_{k-1}, \varepsilon, \varepsilon)\to (q', \gamma_k)\), i.e. we split the state transition \(q\to q'\) into a longer transition with intermediate states that push each character in \(s\) to the stack one-by-one.

For the \(|s|>1\) case, the clause in \(\delta\) is split and chained in \(\tilde{\delta}\) in the same way that \(A\to a_1 a_2\dots\) rules are split and chained when converting to Chomsky normal form.

We also define the restricted pushdown automaton (RPDA), which is a PDA with a single accepting state, empty stack upon termination, and where every stack action is either a single push or single pop.

Equivalence of RPDAs and PDAs

Theorem

RPDAs are equivalent to PDAs. All RPDAs are trivially also PDAs, so we prove by defining a procedure to construct an equivalent RPDA given a PDA. Specifically, we provide rewrite rules for "undesirable" constructs in a PDA:

Multiple accepting states: make all the states non-accepting; make a new accepting states and link all the previous ones to them (pop \(\perp\) from stack)
Accepting state doesn't leave stack empty: insert state before accepting state that pops everything except \(\perp\) from the stack. Then, pop \(\perp\) from the stack when transitioning to the accepting state
Transition pushes and pops: insert intermediate state; pop in the first transition and push in the second one
Transition neither pushes nor pops: insert intermediate state; transitions push, then pop a dummy symbol from the stack

Lemma I: CFG \(\implies\) PDA

We can draw an equivalence between the steps in a leftmost derivation performed in \(G\) and a sequence of configurations in some PDA \(M\). Then, we push build a string by pushing and popping strings on \(M\)'s stack.

Aside: this an example of how evolving stacks and trees are two projections of the same structure. Trees have two spatial dimensions (down via children and sideways via siblings), whereas evolving stacks have one spatial dimension (measured by the height of the stack) and one time dimension.

We define a GPDA that traverses through a string and pushes the corresponding rule bodies from variables to the stack. In this way, it performs the leftmost derivation. So, an accepting sequence of configurations exists in \(M\)'s computation of \(w\) if and only if a leftmost derivation of \(w\) exists in \(G\).

We know that GPDAs are equivalent to PDAs, so can construct a PDA with the same behavior as the above GPDA. So, we've shown that given a CFG, we can construct a PDA that recognizes a string if and only if it can be generated by the CFG.

Lemma II: PDA \(\implies\) CFG

We prove again by construction. Given an RPDA, we can construct a CFG as follows:

For every pair of transitions (\(\delta(q_{1,a}, \sigma_1, \varepsilon)\to (q_{1,b}, \gamma), \delta(q_{2,a}, \sigma_2, \gamma)\to (q_{2,b}, \varepsilon)\)) that push and pop the same symbol \(\gamma\), we add a rule \(A_{q_{1,a},q_{2,a}}\to \sigma_1 A_{q_{1,b},q_{2,b}} \sigma_2\)
- Since these stack actions cancel each other out, consuming \(\sigma_1\) and \(\sigma_2\) on either end has a cancelling effect as well
For every triple of states \(q_a, q_b, q_c\in Q\), we add a rule \(A_{q_a,q_b}\to A_{q_a,q_b}A_{q_b,q_c}\)
- Chaining
For every state \(q\in Q\), we add a rule \(A_{qq}\to \varepsilon\) to the CFG
We define the start variable \(S:=A_{q_0 q_F}\), where \(\set{q_F}=F\) is the unique accepting state

review this and add more intuitive explanation!!

First, we claim that if \((q_a, w, \varepsilon) \overset{*}{\to} (q_b, wx, \varepsilon)\), in the RPDA/PDA, then \(A_{q_a,q_b}\overset{*}{\Rightarrow} x\) in the CFG, then the converse. Both can be proven by induction; our base case is when both configurations are actually the same (i.e. where \(x=\varepsilon\)).

add proofs!!

10 - Limits of stack augmented computation

Although they are more powerful than FAs, PDAs are not universal computers either; there exist algorithms that can't be adapted for PDAs. Since PDAs are equivalent to CFGs, we will reason about CFGs below.

Pumping Lemma for CFGs

Theorem

For context-free language \(\Gamma\) described by CFG \(G = (V, \Sigma, R,S)\), there exists length \(p\) such that for any string \(w\in \Gamma\) longer than \(p\) (i.e. \(|w| > p\)), we can partition \(w=uvxyz\) where

\(|vy| > 0\), i.e. at least one of \(v,y\) is not \(\varepsilon\)
\(|vxy| \leq p\), implying the lengths \(v,x,y\) are less than \(p\) \(\implies\) \(yz\) is not empty either
\(uv^{i}xy^{i}z \in \Gamma\) for all \(i \geq 0\). So, \((uv^{i}xy^{i}z)\subseteq \Gamma\)

I.e. for sufficiently long strings in \(\Gamma\), there is always a substring that can be repeated arbitrarily.

The pumping lemma for regular languages is a subcase of this, namely where \(u,v=\varepsilon\).

Since \(G\) has a finite number of variables and each substitution increases the size of a generated string by a finite amount, there is an upper bound (here, \(p\)) on how long a string can get when each rule substitutes a given variable at most once. Thus, if a string is longer than this \(p\), there must be variable \(A\in V\) that gets substituted more than once over the course of a derivation, i.e. a rule \(R\ni \mathcal{R} : A \to \dots\) is applied more than once.

This double-substitution is equivalent to a loop in an FA, since we could apply this rule arbitrarily many times if we can reach it again after applying it

Proof

We consider the possible height of a parse tree for a string in language \(\Gamma(G)\) for CFG \(G\).

In this context, the branching factor \(b\) of the parse tree is the maximum length of an RHS rule in \(G\); a tree of height \(h\) can have at most \(b^{h}\) leaves. Also note that a string of length \(\ell\) needs ==height at least \(b/\ell\), and thus height of at least \(\log_b{\ell}\) to generate.==

If a rule isn't repeated, a tree of height at most \(|V|\) can exist. So, if \(h \geq \log_b{\ell} \geq |V| + 1\) \(\iff b^{h}\geq \ell \geq b^{|V| + 1}\), then some path \(S \to \text{leaf}\) exists where some variable \(A\) repeats.

Pick \(p:=b^{|V| + 1}\) and string \(w\in \Gamma(G)\) of length \(\ell\) where \(\ell \geq p \implies \ell \geq b^{|V|+1} \geq b^{|V|}+1\). The smallest (by number of nodes) parse tree for \(w\) must have height at least \(|V|+1\), and thus some path \(T\) of length \(\geq |V|+1\). So, by pigeonhole principle, some variable \(A\) exists that repeats in the path; denote the instances \(A^1\) and \(A^2\).

We can partition \(w\) into \(uvxyz\) where \(A^{1}\) produces \(vxy\) and \(A^{2}\) produces \(x\).

Part I: If \(|vy|\) were to be \(0\), then we'd have \(v,y=\varepsilon\) and thus we'd have both \(A^{1}\overset{*}{\Rightarrow} x\) and \(A^{2}\overset{*}{\Rightarrow} x\). Thus, we could make a smaller parse tree by simply replacing \(A^{1}\) with \(A^2\). We assumed the smallest possible parse tree → a contradiction.
Part II: The height of \(A^{1}\overset{*}{\Rightarrow} vxy\) can be at most \(|V|+1\) (since the entire tree can be at most \(|V|+1\), we're assuming \(A^{1}\) is \(S\), the root), so the generated string can have length at most \(b^{|V|+1}=:p\).
Part III: We can generate \(wv^{i}xy^{i}z\) by simply picking \(A^2\) instead of \(A^{1}\) initially (pumping down), or \(A^{1}\) instead of \(A^{2}\) as many times as we like (pumping up)

Proving a Language Isn't Context-free

The strategy for proving a language isn't context-free is more or less the same as proving it isn't regular using the pumping lemma: we

Assume for contraction that language \(\Gamma\) is context-free, and thus that a CFG \(G\) exists for it
Define \(p_G := b^{|V|+1}\), then pick a sufficiently long string \(w\in \Gamma\), i.e. where \(|w| \geq p_G\) such that all splits have some \(i\) where \(uv^{i}xy^{i}z\not\in \Gamma(G)\).

E.g. it can ergonomically be proven that \(\Gamma:=\set{ww \mid w\in \set{0,1}^* }\) is not context free by choosing string \(w:=0^{p}1^{p}0^{p}1^{p}\) (where \(p:=p_G\)).

Proving via closure under operations, while strictly possible, isn't as useful for context-free languages as it is for regular languages because context-free languages aren't closed under intersection. The most useful closure arguments involve intersections of regular languages.

11 - Turing Machines

q_cross and q_checkmark for accept/reject, better underline syntax, pick suitable font for left/right

Models of Computation

Earlier, we generalized from FAs to PDAs by equipping them with a stack. Now, we can further generalize PDAs to a Turing machine by replacing the stack with a more general data structure, i.e. one that can simulate a stack. Specifically, TMs have a tape: an infinite, finitely addressable tract of memory cells that an automaton can read from, write to, and move along in either direction.

Note that the tape is one-sided: its memory addressing starts at \(0\).
Memory access is still local

Just like for PDAs, although their memory is unbounded, Turing machines allow infinite decision problems to be defined finitely. In fact, Turing machine are the strongest such automata:

Church-Turing Thesis (Lite)

Theorem

A Turing machine can simulate any algorithm, mechanical procedure, mechanical derivation, and program.

Coding an algorithm as a Turing machine is the theoretical CS equivalent of writing a computer program in assembly (or by pushing transistors around). In both cases, you are reaching into the entrails of a machine and building your program bit by bit by bit. Designing Turing machines is hard for the same reason that writing assembly is hard.

Turing machines are provably the most low level "programming environments"

We also notice a pattern while designing Turing machines: most programs consists of a search/seek operation, then a write operation, then search/seek, then write, etc.

This is like how all strings in \(\set{0,1}^{*}\) can be broken down into their blocks of \(0\)s and \(1\)s: \(\set{0,1}^{*}=0^{*}(1^{*} 10^{*}0)^{*}1^{*}=1^{*}(0^{*} 01^{*}1)^{*}0^{*}\)
This is seen in physical computers too: interacting with a hard drive involves repeated seek/write operations (CMPUT 379 time)

Turing Machines

Turing Machine (TM)

Definition

A Turing Machine \(M\) is a \(7\)-tuple \(M=(Q, \Sigma, \Gamma, \delta, q_0, q_\text{accept}, q_\text{reject})\) with

A finite set \(Q\) of states, which includes the initial state \(q_0\), the accept state \(q_\text{accept}\), and the reject state \(q_\text{reject}\).
- The accept and reject states are the terminal/halting states
A finite input alphabet \(\Sigma\), which excludes the blank symbol \(\underline{\,}\)
A finite tape alphabet \(\Gamma\subseteq \Sigma\cup\set{\underline{\,}}\)
A transition function \(\delta : (Q \setminus\set{q_\text{accept}, q_\text{reject}})\times \Gamma\to Q\times \Gamma\times \set{\mathsf{L}, \mathsf{R}}\)

Note that we are now reading and writing to/from the same place; this is why \(\Gamma\subseteq \Sigma\). This is a departure from our previous models of computations, which had read-only input.
What separates Turing machines is that they can write to their own input; another instance of self-reference is interesting.

The state of a TM at a moment in time (a configuration) can be described as \(aqb\) where \(a,b\in \Gamma^{*}\) are the strings that list the characters on either side of the read/write head and \(q\in Q\) is the current state. A computation is a "physically valid" sequence of configurations.

Computations begin in state \(q_0\), where input \(w:=\sigma_1\dots\sigma_n\) is written on the tape (the rest of the tape is filled with blanks \(\underline{\,}\)). The read-write head is initially located at \(\sigma_1\).

Turing machine \(M\) accepts string \(w\) if \(M\) can reach configuration \(c_k=uq_\text{accept}v\), for \(k\in \mathbb{N}\) (i.e. \(k < \infty\)) and strings \(u,v\in \Gamma^{*}\). \(M\) rejects if it can reach a configuration \(c_k=uq_\text{reject}v\). If neither of these happen, the computation does not halt on \(w\).

So, instead of dealing with two-valued accept/reject logic, we are now dealing with three-valued accept/reject/did-not-halt logic

Example: \(\Lambda:=\set{w\# w\mid w\in \set{0,1}^{*}}\)

Recognizing and Deciding

Again, for Turing machine \(M\), we define \(\mathcal{L}(M)\) as the language of string accepted by \(M\). So, \(\overline{\mathcal{L}(M)}\) is the language of strings that are either rejected by \(M\) or for which \(M\) doesn't halt.

Turing machine \(M\) decides \(\mathcal{L}(M)\) iff \(M\) halts for all possible input, i.e. for all \(w\in \Sigma^{*}\).

Language \(\Lambda\) is Turing-recognizable if \(\Lambda=\mathcal{L}(M)\) for some Turing machine \(M\). \(\Lambda\) is Turing-decidable if it is recognizes by a Turing machine \(M\) that halts on every \(w\in \Sigma^{*}\supseteq \Lambda\).

We will see that Turing-decidability is stronger than Turing recognizability; there exist language that are Turing-recognizable but not Turing-decidable

Non-deterministic Turing Machines

As with every model of computation we've seen so far, non-determinism is achieved by generalizing the transition structure \(\delta\) from a function to a relation. Indeed, a non-deterministic Turing machine is defined the same way:

Non-deterministic Turing Machine

Definition

A Non-deterministic Turing machine is similar to a Turing Machine, but where the transition function \(\delta\) is replaced with a transition relation \(\delta \subseteq Q \setminus \set{q_\text{accept},q_\text{reject}} \times \Gamma\times Q\times \Gamma\times\set{\mathsf{L},\mathsf{R}}\), i.e. a function \(\delta : Q \setminus \set{q_\text{accept},q_\text{reject}} \times \Gamma\to \mathcal{P}(Q\times \Gamma\times\set{\mathsf{L},\mathsf{R}})\)

Again, like before, computations for nondeterministic Turing machines take the form of trees.

12 - Equivalence of Turing machine models

lots of this wording kinda sucks/isn't concise

The "basic" Turing machine described earlier can simulate any extended Turing machine below. So, when we reason about basic Turing machines, we can augment them with any of these extensions without weakening our statement:

Read-write head can stay at the same place on the tape instead of having to move left or right
Insert and Delete operations on the tape
Unbounded tape in both directions
2D tape (and presumably an \(n\)-D tape by bootstrapping the 1D → 2D bijection)
Non-deterministic Turing machine

Moreover, we can further restrict the basic model without giving up computational power

A Turing machine that can only read and advance, with a working tape ??
A Turing machine with only two tape symbols, i.e. where \(|\Gamma|=2\)

Extensions and Their Models

Adding a \(\text{STAY}\) Action

Every time we see a stay action, we simulate it with a left \(\mathsf{L}\) movement, then a right \(\mathsf{R}\) movement. The initial left movement writes the new symbol to the initial space; the right movement doesn't draw a new symbol.

Adding Insert/Delete

At any given point, we have passed some finite number \(n\) of configurations; our input string \(w\) has some finite length \(|w|=m\) as well. So, at every point in a computation, there is some \(M \leq w + m\) such that every symbol past address \(M\) on the tape is a blank.

To insert, we loop from our current position. Each iteration, we keep track of the current symbol and overwrite it with the previous one, moving back then forward to keep track of the symbols. Eventually, we reach \(M\); we loop back to our original position without changing everything.

To delete, we follow a similar process, but we replace the current symbol with the following one. This also requires taking two steps forward and one step back each iteration.

Adding \(k\) Different Tapes

There are a few ways we could go about this.

Idea \(1\): we define some delimiter symbol \(\#\) to separate the different tape models on our single tape. We also define a symbol \(\nabla\) that denotes the current position of our simulated read/write head. Each time we want to make a change to a tape, we

Navigate to the position in the corresponding tape where our read/write head is (i.e. recover the initial config)
Operate on the tape. If we reach the end of the tape, we use our insert process (as defined earlier) to add another symbol

Idea \(2\): Define tape \(1\) as the cells of the tape equal to \(1\) \(\mod k\), tape \(2\) as the cells equal to \(2\) \(\mod k\), etc. Then, for a given tape, travel to the correct offset and navigate left and right by moving \(k\) spaces.

Strictly speaking, idea \(1\) is stronger because it can simulate an unbounded/arbitrary/changing number of tapes, whereas idea \(2\) can only support a fixed \(k\).

Nondeterministic Turing Machine

As we have seen, non-deterministic Turing machines have configuration trees. Briefly put, we can traverse this configuration tree with breadth-first search. Although this tree may have infinite depth, it has finite width (is this the term?), so BFS will reach every possible configuration eventually.

We can simulate a non-deterministic Turing machine \(\tilde{M}\) with \(3\) tapes:

A copy of the input tape (never gets mutated; along for the ride)
A copy of \(\tilde{M}\)'s tape in the current branch
An encoding of the current location in \(\tilde{M}\)'s tree

Each loop, we increment the location in the tree according to BFS. Then, we simulate \(\tilde{M}\) from the point where we are currently located. If our simulation accepts, we return \(\text{accept}\); if our simulation rejects, we move on to the next location in the tree.

13 - Models of Computation

We've seen that a basic Turing machine can simulate any program with access to infinite, finitely addressable external storage, which is a reasonable abstraction for familiar kinds of computing systems. We've also seen that a basic Turing machine can simulate any algorithm, i.e. any mechanical procedure.

Now, we want to know:

What are algorithms capable of?
What aren't algorithms capable of?

This will help us understand the corresponding limits of Turing machines via the equivalence established by the Church-Turing thesis.

Levels of Abstraction

Describing Turing machines is the theoretical CS equivalent of writing machine code from scratch; it is the lowest possible level to describe a procedure. But, just like programmers (generally) choose not to express their ideas in assembly, we may opt to use higher-level descriptions of algorithms instead, as allowed by the Church-Turing thesis.

Low-level: describing a Turing machine directly as an "algebraic structure"¹.
- E.g. "consider the Turing machine \(M:=(Q,\Sigma,\Gamma, \delta, q_0,q_\text{accept},q_\text{reject})\) where we define \(M\) as …".
- Analogy: machine code
Intermediate-level: describing how a Turing machine might be implemented and what it does.
- E.g. "associate each state with each natural number, then add \(k\) auxiliary states to …"
- Analogous to low-level programming
- Aside: this is often written as a list of imperatives, much like low-level programs are
High-level: describing an algorithm textually
- No analogy here, this is pseudocode/high-level programming

We assume that all inputs are strings and that any sort of object we might wish to use as input (e.g. a number, another program, etc) can be described as a string. Given object \(O\), the string description of \(O\) is denoted \(\langle O \rangle\).

We can also describe a sequence of objects: \(\langle O_1, O_2, \dots, O_n \rangle\).
Note: descriptions \(\langle O \rangle\) are used to formalize the idea that infinite behavior can be described finitely. We are using this finite description to "stand for" a (possibly infinite) language.

Concentric Models of Computation

As we've seen each new family of models of computation (first finite automata, then pushdown automata, then Turing machines), we've noticed that each new model can simulate the previous one. This leads concentric sets of corresponding languages; a widely-accepted (sub)-taxonomy is known as the Chomsky Hierarchy.

better disambiguation about chomsky hierarchy

14 - Decidable Problems

Recall: a problem is decidable iff a halting algorithm exists that will accept or reject it.

Meta-Language Classes

Earlier, we constructed languages as a structure to collect and reason about sets of strings (sometimes with respect to a global alphabet \(\Sigma^{*}\)). A striking feature of theoretical CS is how the same structural patterns are repurposed over and over: equivalence, abstraction, reduction. So, reasoning about language classes and meta language classes are natural progressions.

A language class (or model of computation) \(\mathcal{M}\) is a set of languages. Language classes are an effective way to refer to sets of all possible types of structures.

E.g. \(\mathcal{M}=\text{DFA}\) is the set of all descriptions of DFAs

A meta language class (with respect to \(\mathcal{M}\)) is a language classes of descriptions of languages that have some property with respect to \(\mathcal{M}\).

Symbol	Name	Set construction
\(\text{A}_\mathcal{M}\)	Acceptance problem for \(\mathcal{M}\)	\(\set{\langle {M},w \rangle\mid M\in \mathcal{M} \text{ accepts or generates }w}\)
\(\text{E}_\mathcal{M}\)	Emptiness testing problem for \(\mathcal{M}\)	\(\set{\langle {M}\rangle\mid M\in \mathcal{M} \text{ has } L(M) = \varnothing}\)
\(\text{EQ}_\mathcal{M}\)	Equivalence testing problem for \(\mathcal{M}\)	\(\set{\langle {A},B \rangle\mid A,B\in \mathcal{M} \text{ such that } L(A)=L(B)}\)
\(\text{ALL}_\mathcal{M}\)	All string testing problem for \(\mathcal{M}\)	\(\set{\langle {M}\rangle\mid M\in \mathcal{M} \text{ has } L(M) = \Sigma^*}\)

We kind of assume that all the languages in a language class have the same alphabet \(\Sigma\). This is a reasonable assumption for the kinds of languages we will be dealing with (i.e. descriptions of structures).

Decidability of Language Classes

We cover decidability results for languages classes pertaining to structures we've already seen. Below, we assume \(M\) is an arbitrary "member" of the language class. We also assume that we've checked if \(M\) is well-formed; trivially, we can use a TM to check this before the principal computation

Theorems: Finite Automata and Regular Expressions

\(\text{A}_\text{DFA}\) is decidable

Theorem

The acceptance problem for DFAs is decidable, i.e. given a (description of a) DFA and a string, we can decide whether it accepts or rejects the string.

Since DFAs must halt (i.e. when they have consumed all their input), we can simply use a TM to simulate \(M\) on \(w\) given a description \(\langle M,w \rangle\). If this simulation ends on an accept state, the TM accepts. Otherwise, it rejects.

We simulate by building a model of \(M\) from \(\langle M \rangle\) in our TM, then keeping track of \(M\)'s current state and the current input position in \(w\).

\(\text{A}_\text{NFA}\) is decidable

Theorem

The acceptance problem for NFAs is decidable, i.e. given a (description of a) NFA and a string, we can decide whether it accepts or rejects the string.

We construct a TM that converts the NFA to a DFA (\(O(2^{n})\)), then inherit from the fact that \(\text{A}_\text{DFA}\) is decidable.

\(\text{A}_\text{RE}\) is decidable

Theorem

The acceptance problem for regular expressions is decidable, i.e. given a regular expression and a string, we can decide whether it matches to the string.

Again, we construct a TM that converts the regular expression to an equivalent DFA, then inherit from the fact that \(\text{A}_\text{DFA}\) is decidable.

\(\text{E}_\text{DFA}\) is decidable

Theorem

The emptiness problem for DFAs is decidable, i.e. given a (description of a) DFA, we can decide whether or not the language it describes is \(\varnothing\).

Essentially, we want to search all the possible paths from the initial state \(q_0\) and see if any end on accepting states. This is done by marking \(q_0\), then iteratively marking every state that can be reached from \(q_0\) by consuming input until a fixpoint is reached. If (at some point) we reach an accepting state, we reject. Otherwise, if we reach a fixpoint without accepting states, we accept.

\(\text{EQ}_\text{DFA}\) is decidable

Theorem

The equivalence problem for DFAs is decidable, i.e. given a (descriptions of) two DFAs, we can decide whether or not they describe the same language.

Given \(\langle A,B \rangle\), we construct new DFA \(C\) that decides \(L(A)\,\triangle\, L(B)\), where \(\triangle\) is the symmetric difference operator (set-theoretic XOR) defined by \(S \,\triangle\, T := (S\cup T) \setminus (S\cap T)\); such a \(C\) must exist because regular languages are closed under union, intersection, and complement. Then, since \(\text{E}_\text{DFA}\) is decidable, we decide whether \(L(C)=\varnothing\). Since \(L(C)=\varnothing\iff L(A)=L(B)\) by construction, we have decided either \(L(A)=L(B)\).

Theorems: Pushdown Automata and Context-free Grammars

Bound on derivation size for a CFGs in CNF

Lemma

Let \(\tilde{G}\) be a context-free grammar in CNF. Then, for any \(w\in L(\tilde{G})\) where \(|w| =:n > 0\), exactly \(2n-1\) substitutions are required to derive \(w\) using \(\tilde{G}\).

Intuition: Ignore the trivial \(S\to \varepsilon\) case for brevity. Grammars in CNF have rules of the form \(A\to BC\), \(A\to a\). So, to derive a string of length \(n\), we need \(n-1\) rules of the form \(A\to BC\) to "allocate" the length of the string, and \(n\) rules of the form \(A\to a\) to "populate" that string with terminals.
Aside: I think we proved this as part of an assignment question

\(\text{A}_\text{CFG}\) is decidable

Theorem

The acceptance problem for CFGs is decidable, i.e. given a CFG and a string, we can decide whether it string can be derived in the CFG.

Note that we can't just try every derivation, since there could be infinitely many. If that were the case, our procedure wouldn't halt for \(w\not\in L(G)\).

Instead, we derive \(G\) from description \(\langle G \rangle\), then convert it into its Chomsky normal form (denote with \(\tilde{G}\)). From the bound introduced above, we know our derivation has exactly \(2n-1\) steps, so we essentially search a tree of bounded width (since the CFG is finite) of (bounded) height \(2n-1\). If we find a path that generates our input string, we accept; otherwise, we reject.

\(\text{A}_\text{PDA}\) is decidable

Theorem

The acceptance problem for PDAs is decidable, i.e. given a description of a PDA and a string, we can decide whether it accepts or rejects the string.

Again, we can't just simulate the PDA directly, since PDAs don't always terminate. Instead, we construct the PDA's corresponding CFG, then inherit from the fact that \(\text{A}_\text{CFG}\) is decidable.

\(\text{E}_\text{CFG}\) is decidable

Theorem

The emptiness problem for CFGs is decidable, i.e. given a CFG, we can decide whether or not the language it describes is \(\varnothing\).

Again, we can't just search through strings, since we wouldn't halt if the language is indeed empty.

Instead, we proceed similarly to how we determined \(\text{E}_\text{DFA}\). We mark all of the terminals, then iteratively mark any rules whose bodies consists entirely of marked atoms, repeating until we reach a fixpoint. If the initial rule \(S\) has been marked, we reject; otherwise, we accept.

Aside: performing iterative "marking" procedures ??

Note: We can't use our union/intersection/complement proof strategy to determine if \(\text{EQ}_\text{CFG}\) is decidable, since context-free languages aren't generally closed under intersection and complement. In fact, \(\text{EQ}_\text{CFG}\) is unrecognizable.

15 - Undecidable and Unrecognizable Languages

So far, we've proven that meta language classes are decidable. We will see examples and proofs of languages that aren't decidable.

Undecidability of Language Classes

Theorems: Turing Machines

\(\text{A}_\text{TM}\) is not decidable (i.e. is undecidable)

Theorem

The acceptance problem for Turing machines is undecidable, i.e. given a description of a Turing machine and a string, it is not generally possible to decide whether the Turing machine accepts or rejects the string.

This is implied by the Halting problem; more on that below

Proof: assume towards a contradiction that \(\text{A}_\text{TM}\) is decidable. Then, there exists TM \(D\) that decides whether \(\langle M,w \rangle\in \text{A}_\text{TM}\). Define TM \(\text{Evil} : \langle M \rangle\mapsto \lnot D(\langle M, \langle M \rangle \rangle)\), which returns the opposite of output of \(D\) when run with some \(\langle M, \langle M \rangle \rangle\) as input. Consider \(\text{Evil}(\langle \text{Evil} \rangle)\), which reduces to \(\lnot D(\langle \text{Evil}, \langle \text{Evil} \rangle \rangle)\).

If \(\text{Evil}\) accepts input \(\langle \text{Evil} \rangle\), then \(D(\langle \text{Evil}, \langle \text{Evil} \rangle \rangle)\) accepts, and thus \(\text{Evil}\) rejects \(\langle \text{Evil} \rangle\) by construction, a contradiction
Elsewise, if \(\text{Evil}\) rejects input \(\langle \text{Evil} \rangle\), then \(D(\langle \text{Evil}, \langle \text{Evil} \rangle \rangle)\) accepts and thus \(\text{Evil}\) rejects \(\langle \text{Evil} \rangle\) by construction, a contradiction

Thus, by contradiction, no such \(D\) can exist. Thus, \(\text{A}_\text{TM}\) is undecidable.

We can view this proof as a diagonalization argument. We enumerate all (countably many) machine descriptions \(\langle M_1 \rangle, \langle M_2 \rangle, \dots, \langle\text{Evil} \rangle, \dots\) as the inputs (columns), and machines \(M_1, M_2, \dots, \text{Evil}, \dots\) as the columns. As we will in the columns, we see the row corresponding to \(\text{Evil}\) has entries \(\lnot D_{k,1}, \lnot D_{k,2}, \dots\). We will be unable to fill in the entry for \([\langle \text{Evil} \rangle, \text{Evil}]\).

\(\text{A}_\text{TM}\) is recognizable

Theorem

The acceptance problem for Turing machines can be recognized, i.e. given a description of a Turing machine and a string, it is possible to describe an algorithm that accepts the string if and only if the TM accepts the string.

I.e. it is possible to construct a universal Turing machine.

Proof: trivially, given the description \(\langle M \rangle\), we construct a TM that simulates \(M\) on input \(w\). If this simulation hits an accept state we accept, and if it hits a reject state, we reject. By the definition of "recognizability", we don't need to worry about the case where the simulation of \(M\) doesn't terminate.

We say Turing machines are universal.

Co-recognizability

A language \(L\) is co-recognizable if its complement \(\overline{L}\) is recognizable.

Decidable iff recognizable and co-recognizable

Theorem

Language \(L\) is decidable if and only if \(L\) is recognizable and co-recognizable

\(\implies\): Clearly, if \(L\) is decidable, we can recognize it. We can also co-recognize it by construction, since decidable languages are closed under complement.
\(\impliedby\): If \(L\) is recognizable and co-recognizable, there exist TMs that simulate both \(L\) and \(\overline{L}\). Thus, we can construct a TM that simulates both in parallel. By definition of recognizability, one will eventually accept, so our machine can accept or reject accordingly.

As a corollary:

\(\overline{\text{A}_\text{TM}}\) is not recognizable (i.e. not recognizable)

Theorem

It is not possible to recognize whether a TM does not accept a given string.

Proof: if \(\overline{\text{A}_\text{TM}}\) were recognizable, it would also be decidable since it is co-recognizable. This would imply that \(\text{A}_\text{TM}\) is also decidable, clearly a contradiction.

Proving Undecidability Using Reduction

If language \(\Lambda\) is known to be undecidable, and a decider for language \(\Gamma\) can be adapted to decide \(\Lambda\), then we can conclude (by contradiction) that \(\Gamma\) must also be undecidable. This is a proof by reduction.

A notable use of this structure of argument is the definition for NP-completeness: a problem is NP complete iff it is both NP and any NP problem can be reduced to it in polynomial time. We will learn more about this later.

We often reduce to the Halting problem since it is the canonical undecidable problem.

Theorems: Turing Machines

We define the halting problem over a meta-language of machine descriptions \(\star{\mathcal{M}}\) as the problem \(\text{HALT}_{\star{\mathcal{M}}} :=\set{\langle M,w \rangle \mid M\in \star{\mathcal{M}} \text{ and } M \text{ halts on } w}\).

\(\text{HALT}_\text{TM}\) is undecidable

Theorem

It is not possible to decide the halting problem for Turing machines, i.e. given a description of a Turing machine and an input, it is not possible to decide whether the TM halts on the input.

Proof: we reduce to \(\text{A}_\text{TM}\), which we know is undecidable. Assume towards a contradiction that Turing machine \(H\) decides the halting problem. Then, we construct \(D\) that takes input \(\langle M,w \rangle\) as follows:

Call \(H\) with input \(\langle M,w \rangle\). By assumption, \(H\) will eventually accept or reject.
If \(H\) accepts, we simulate \(M\) on \(w\), which much halt by definition. If \(M\) accepts, our TM accepts, and if \(M\) rejects, our TM rejects.
If \(H\) rejects, we reject.

We have just constructed a machine that decides \(\text{A}_\text{TM}\), a contradiction. Thus, \(H\) cannot exist as described. Thus, the halting problem is undecidable.

\(\text{E}_\text{TM}\) is undecidable

Theorem

The emptiness problem for Turing machines is undecidable, i.e. we cannot generally decide whether the language described by a Turing machine is empty.

Proof: we reduce \(\text{A}_\text{TM}\) to \(\text{E}_\text{TM}\). Towards a contradiction, let \(E\) be a decider for \(\text{E}_\text{TM}\) that takes input \(\langle M,w \rangle\). Construct TM \(D\) also taking input \(\langle M,w \rangle\) that:

Constructs a temporary TM \(M_w(x)\) that rejects if \(x\ne w\) and returns \(M(w)\) otherwise. Then, run \(E\) on \(\langle M_w \rangle\).
Run \(E\) on \(\langle M_w \rangle\); this must halt by assumption
If \(E\) accepts \(\langle M_w \rangle\), then \(L(M_w)=\varnothing\), so it must be by construction that \(M\) doesn't accept \(w\). So, \(D\) will reject.
Otherwise, if \(E\) rejects \(\langle M_w \rangle\), then \(w\in L(M_w)\). So, by construction, \(M\) must accept \(w\), and thus \(D\) will accept.

This decides \(\text{A}_\text{TM}\), a contradiction. So, \(\text{E}_\text{TM}\) is undecidable.

\(\text{EQ}_\text{TM}\) is undecidable

Theorem

The equality problem for Turing machines is undecidable, i.e. we cannot generally decide whether two Turing machines describe the same language.

Proof: we reduce \(\text{E}_\text{TM}\) to \(\text{EQ}_\text{TM}\). Given decider \(D_\text{EQ}\) for \(\text{EQ}_\text{TM}\), we construct a decider for \(\text{E}_\text{TM}\) by first constructing a null Turing machine \(N(x)\) that rejects all input (i.e. such that \(L(N)=\varnothing\)). Then, we use \(D_\text{EQ}\) to test equality between our input \(M\) and \(N\) (i.e. we give \(D\) input \(\langle M,N \rangle\)). Clearly, we have built a decider for \(\text{E}_\text{TM}\), which we know to be undecidable. So, \(\text{EQ}_\text{TM}\) is undecidable by reduction.

Rice's Theorem

Notice how it seems like any meta language of Turing machine descriptions falls apart under self-reference. This isn't a coincidence, but a general, provable fact about semantics.

A language \(P\) is a nontrivial language over Turing machine descriptions if the following apply:

Non-triviality: \(\varnothing\subset P \subset \set{\text{the set of all Turing machine descriptions}}\) (this are strict subsets)
Correctness: If string \(w\) is not a valid description of a Turing machine, then \(w \not\in P\)
Semantic invariance: If \(L(M_1)=L(M_2)\), then \(\langle M_1 \rangle\in P\) if and only if \(\langle M_2 \rangle\in P\).

Rice's Theorem

Theorem

Any nontrivial language over Turing machine descriptions is undecidable.

Proof

Proof: Towards a contradiction, let \(D_P\) be a decider for \(P\). By non-triviality, there exist Turing machines \(I\) and \(E\) where \(\langle I \rangle\in P\), \(L(I)\ne \varnothing\) and \(\langle E \rangle\not\in P\), \(L(E)=\varnothing\) (if \(E\in P\), then assign \(E\) to be a decider for \(\overline{P}\)). We construct a decider \(D_\text{A}\) for \(\text{A}_\text{TM}\) that takes input \(\langle M,w \rangle\) as follows:

Construct a temporary TM \(M_w(x)\) that simulates \(M\) on \(w\), rejecting if \(M\) rejects \(w\) and returning \(I(x)\) (i.e. calling \(I\) on \(x\) and returning the result) if \(M\) accepts \(w\).
Computes the description \(\langle M_w \rangle\) from \(M_w\) and simulates \(D_P\) with \(\langle M_w \rangle\) as input, returning \(D_P(\langle M_w \rangle)\).

We see that:

If \(M\) accepts \(w\), then \(M_w\) accepts \(x\) by construction, and thus \(I\) must have accepted \(x\). By generality of \(x\), we have \(L(M_w)=L(I)\). Since \(\langle I \rangle \in P\), \(D_P\) accepts \(\langle I \rangle\), and thus \(\langle M_w \rangle\) by semantic invariance. So, by construction, \(D_\text{A}\) will accept \(\langle M,w \rangle\).
If \(M\) does not accept \(w\), then \(M_w\) doesn't accept any \(x\) since it rejects immediately. Thus, by construction, \(L(M_w)=\varnothing=L(E)\). So, by semantic invariance, since \(E\not\in P\), \(D_P\) must reject \(\langle M_w \rangle\), and thus \(D_\text{A}\) rejects \(\langle M,w \rangle\).

So, \(D_\text{A}\) decides \(\text{A}_\text{TM}\), a contradiction of the fact that \(\text{A}_\text{TM}\) is not decidable. Thus, \(D_P\) cannot exist as described, and thus \(P\) must be undecidable. Finally, by generality, any nontrivial language over Turing machine descriptions is undecidable.

Interpretation

Rice's theorem shows that every nontrivial property of a program is undecidable. This generalizes all the undecidability results we've seen (including the halting problem) since meta language classes are defined by non-trivial properties. In doing so, Rice's theorem establishes a boundary between the syntax and semantics of a program: syntactic properties are decidable, while semantic ones aren't.

So, static analysis will never be able to determine whether a program is correct, since that is a semantic property
However, static analysis can handle syntax. We'd be able to learn how if CMPUT 416 were still offered :(

Rice's theorem Wikipedia page

16 - Recursive Functions

What if we did Turing machines, but worse?

Our models of computation so far have been based on Turing machines and strings; this was Turing's innovation that cleaned everything up. But, we will also explore the earlier model based on the naturals \(\mathbb{N}\) and recursive functions instead.

Primitive Recursive Functions

A function is computable if there exists an algorithm to calculate that calculates the output given an input. In this sense, all algorithms are computable functions. But, since there are countably infinitely many algorithms and uncountably many functions \(f : \mathbb{N}\to \mathbb{N}\), there exist uncomputable functions.

Every computable function \(f : \mathbb{N}^{d}\to \mathbb{N}\) can be build from the following atomic operations:

Name	Definition
Zero function	\(0(x):=0\) for all \(x\)
Successor function	\(S(x) := x+1\) for all \(x\)
Identity function	\(\text{Id}(x)=x\) for all \(x\)
Projection/indexing	\(\pi_i(x_1, \dots, x_n):=x_i\) for all \(\vec{x}\)
Composition	\((f\circ g)(x)=f(g(x))\) for all \(x\)

A primitive recursive function \(h\) is any function \(h : \mathbb{N}^{a}\to \mathbb{N}^{b}\) that can be described as a base case of the form \(h(x,0) :=f(x)\) and a recursive case of the form \(h(x,S(y)) := g(x,y,h(x,y))\) for some functions \(f,g\).

\(g\) simply handles whatever combination of \(x\), \(y\), and \(h(x,y)\) form the recursive case
So many connections here! Peano axioms, lambda calculus, combinatory logic etc
Aside: we see some quality pattern matching here with the \(h(x,S(y))\) definition. This actually gets done in functional and logic programming. Take CMPUT 325 :)
Implementations of arithmetic functions with primitive recursion are in the [[Appendix]]
Functions from multiple variables can be seen as single-variable functions of vectors

Primitive recursion is equivalent to a programming language with for loops, which are sometimes easier to work with. We see \(\implies\) by implementing the recursion above in imperative pseudocode:

// what is this syntax. python is rotting my brain
function h(x, i):
    z := f(x)
    if (i > 0):
        for y := 0 to i - 1
            z := g(x, y, z)
    return f(x)

Here, we treat for loops as loops with bounds where the incremented variable behaves monotonically. As a result, for loops always terminate.
Again, there's a lot to work with here in terms of recursion vs. loops

Limits of Primitive Recursion

The fact that we only use (terminating) for loops should raise a red flag. We've learned that any universal model of computation will have a non-halting program. So, by contrapositive, since we know any for loop program (and by equivalence primitively recursive program) must terminate, primitive recursion cannot be a universal model of computation. No for loop program can simulate all for loop programs.

Symbolically, if we have universal program \(U\) such that \(U(f,x)=f(x)\) for all \(x\in \mathbb{N}\) and primitive-recursive \(f : \mathbb{N}\to \mathbb{N}\), then \(U\) cannot be primitive-recursive.

Aside: the functional equation \(U(f,x)=f(x)\) is a nice characterization of what a universal simulator is

Non-primitive Recursion

The Ackermann Function

Are all computable, halting functions primitive? Turns out, nope!

We define the Ackermann function(s) as \(A_1(x,y)=x+y=x+1+\dots + 1\), \(A_2(x,y)=x\times y=x+ \dots + x\), \(A_3(x,y)=x^{y}=x\times \dots\times x\), etc. More generally, we have the recursive definition:

Index base case: \(A_0(x,y)=y+1\)
Argument base case: \(A_n(x,0)=\) \(x\) (for \(n=1\)), \(= 0\) (for \(n=2\)), and \(1\) otherwise
Recursive case: \(A_n(x,y)=A_{n-1}(x, A_n(x,y-1))\)

Clearly, the Ackermann function is primitive recursive for constant \(n\); we just defined it (primitively) recursively. But, this is not the case if we let \(n\) be an argument.

Theorem

Theorem

The function \(f :\mathbb{N}^{2}\to \mathbb{N}\) given by \(f(x,y):=A_y(x,y)\) is not primitive recursive, even though it halts and is computable for all inputs.

Proof sketch: It is possible to show that for any primitive recursive \(f\), there exists an \(n\) such that \(f(y)<A_n(2,y)\) for all \(y\geq 3\) (exercise left to the reader). Then, if \(A_y(2,y)\) were primitive recursive, \(n\) would exist such that \(A_y(2,y)<A_n(2,y)\) for all \(y\geq 3\). So, \(y=n\) yields a contradiction (???).
I think I'm missing something here
So, this function essentially grows too fast to be primitively recursive

Partial/\(\mu\) Recursion

As we remove constraints, the expressivity of our model increases.

We define a new constructor \(\mu(x):=\min\set{y\mid f(x,y)=0}\) given function \(f(x,y)\). This may be a partial function because an algorithm to find this minimum may not halt, i.e. when no such \(f(x,y)=0\) exists. A partially recursive function is a function with the same structure as \(\mu(x)\) (formalize).

Clearly, we can't implement \(\mu\) with for loops, but we can implement them with while loops, namely one with the condition f(x,y) =/= 0.

Any program with while loops can be expressible in terms of \(\mu\); \(\mu(x)\) itself is the canonical while-loop program

Hierarchy of Recursion

We observe the following hierarchy of strict inclusion for types of recursive functions:

Primitive recursive functions, equivalent to for loop programs
Total recursive functions, equivalent to halting while loop programs
- Strict inclusion suggests an interesting result: there are programs that can be expressed with halting while loops, but not for loops.
Partial recursive functions, equivalent to while loop programs

Importantly, partial recursion is computationally equivalent to a Turing machine. By corollary, all algorithms can be expressed as while loop programs.

Aside: The "while loops are equivalent to recursion" becomes concrete in the study of computer architecture. Notably, tail-call optimization in functional programming languages essentially re-writes assembly code for tail-recursive functions into while loops.

Function	Loop type	Automaton	Language	Description
Primitive recursive	For loops	Finite automaton	Regular language	Regular expression
Total recursive	Halting while loops	Pushdown automaton	Context-free language	Context-free grammar
Partial recursive	While loops	Turing machine	Turing-recognizable / recursively enumerable	Turing machine description

Universal Partial Recursion and Gödel Numbering

It seems feasible that partial recursive functions could be universal. However, we first need to arithmetize the syntax of functions (i.e. describe a way to encode functions as numbers) so that we can pass them to the universal function as input. It is much less trivial to do this for natural numbers than strings.

We define the Gödel numbering function \(g : (\mathbb{N}\to \mathbb{N})\to \mathbb{N}\) that maps a function \(f\) to its Gödel number \(\bar{f}\). These numbers are encoded as powers of prime numbers, i.e. along the prime factor \(\mathbb{N}\to \text{multisets}\) bijection. \(\bar{f}\) is defined as:

\(f=0\): \(7\)
\(f=S\): \(11\)
\(f=\text{Id}\): \(13\)
\(f=g\circ h\): \(2\times 3^{\bar{g}} \times 5^{\bar{h}}\)
\(f\) is derivable from \(g\) and \(h\) by primitive recursion: \(2^{2}\times 3^\bar{g}\times 5^\bar{h}\)
\(f=\min\set{y\mid g(x,y)=0}\): \(2^{3}\times 3^\bar{g}\)
We encode sequences \(w=(x_1, \dots, x_n)\) as \(\bar{w}=p_1^{x_1}\times \dots \times p_n^{x_n}\) for new primes \(p_1, \dots, p_n\).
We encode natural numbers as themselves

I.e. the encoding recursively follows the structures of primitive and partial recursion. Since prime factorizations are unique (fundamental theorem of arithmetic), we can always recover the original structure.

We claim the following:

The configuration of \(f\) on \(x\) can be encoded as a unique number
The next configuration of \(f\) on \(x\) can be computed with a primitive recursive function
\(f\) on \(x\) can be simulated with a recursive function \(U\)

==What is a configuration of \(f\) on \(x\)? Also, proofs?==

Kleene Normal Form Theorem

Theorem

There exist primitive recursive function \(f,g\) such that every partial recursive function \(h\) can be expressed as \(h(x)=g(x,\min\set{y\mid f(p,x,y)=0})\) for some \(p\in \mathbb{N}\).

By corollary, any while loop program can be expressed as a single while loop.

It turns out that for and while loops together form a Turing-complete model computation (1936). So, all programs can be written with just for and while loops (i.e. just with while loops, since a while loop can simulate a for loop).

17 - Undecidability and Incompleteness

Logical Systems

Mathematical logic employs mechanically checkable reasoning to determine and prove mathematical statements. Different logical systems may be used (commonly, first-order logic), but all logical systems require:

A language for expressing statements and formulae
Rules for reasoning
Method to determine the truth or falsity of a statement

Language

The language of FOL statements can be expressed as a context-free grammar with variables (e.g. \(x\)) and constants (e.g. \(\pi\)) as terminals, and boolean operators (e.g. \(\lnot\)), functions (e.g. \(+\)), relations (e.g. \(=\)), parentheses, and quantifiers (\(\forall, \exists\)) as nonterminal rules and/or parts of nonterminal rules. Thus, every formula is a tree.

Reasoning

A proof in FOL is a finite, directed, tree (maybe a DAG actually?) where vertices are sentences (statements without free variables) and (directed) edges correspond to reasoning rules. An edge joins two sentences when one is derivable from another via a valid reasoning rule. Thus, the leaves of the tree are premises and the root is the conclusion.

For set of premises \(\Gamma=\set{\gamma_1,\dots,\gamma_n}\), we write \(\Gamma\vdash \sigma\) iff a valid proof tree exists with \(\sigma\) as a root and leaves in \(\Gamma\).
Roughly, \(\vdash\) represents derivability

Since there are countably many sentences and proof trees, we can enumerate them mechanically to generate a set of provable consequences. This, the language \(C(\Gamma) :=\set{\sigma\mid \Gamma\vdash \sigma}\) is recognizable.

Aside: what are these rules exactly? link CS 245 notes.

Rules for reasoning are syntactic.

Semantics

We need to define a way to interpret the strings in our language.

Model

Definition

A model \(M\) is a universe \(U\) such that all constants are in \(U\), all function symbols are bound to functions on \(U\), and all relation symbols are bound to relations on \(U\) (i.e. subsets of \(U^n\) for an \(n\)-ary relation).

Given a model, the truth or falsity of a FOL sentence is well-defined.

If sentence \(\sigma\) is true under some model \(M\), we denote \(\vDash_M \sigma\). We define the theory \(\text{Th}(M)\) of \(M\) as the language of all statements made true by \(M\), i.e. \(\text{Th}(M):=\set{\sigma\mid \, \vDash_M \sigma}\). Finally, we write \(\Gamma\vDash \sigma\) if we have \(\vDash_M \Gamma\) implies \(\vDash_M \sigma\) under any model \(M\).

Roughly, \(\vDash\) represents (semantic) correctness

With \(\vdash\) and \(\vDash\), we can now describe two properties of logical systems:

Soundness: Every statement that can be derived is correct, i.e. \(\Gamma\vdash \sigma\implies \Gamma\vDash \sigma\)
Completeness: Every correct statement can be derived, i.e. \(\Gamma\vDash \sigma\implies \Gamma\vdash \sigma\)

The Big Questions

Is it possible to find proofs mechanically?

Of course, once we find one, checking it mechanically is easy
If \(\sigma\in C(\Gamma)\), then we find it eventually by enumerating consequences of \(\Gamma\). But what if \(\sigma\not\in C(\Gamma)\)?
We are speaking formally here. Of course, the heuristics that go into actually generating proofs is an interesting AI search problem

Is there a formal procedure for deciding truth or falsity?

I.e, given \(M\), can we mechanically decide \(\text{Th}(M)\)?
If we axiomatize \(M\) by formulating \(\Gamma\) such that \(C(\Gamma)=\text{Th}(M)\), we decide \(\text{Th}(M)\) by checking (in parallel) if \(\sigma\) or \(\lnot \sigma\) is in \(C(\Gamma)\) (exactly one must be).
However, in the general case, it doesn't look good since truth is a semantic property, and so Rice's theorem likely applies

Results about Arithmetics

Define model \(A_+\) to be the arithmetic with \(+\), \(\mathbb{N}\) and relations \(<, \leq, =, \geq, >\). Define \(A_{+,\times}\) as \(A_+ \cup \set{\times}\).

\(\text{Th}(A_+)\) is decidable

Theorem

Given an expression in \(A_+\), it is possible to decide whether it is true or false.

Proof sketch: no self-reference is allowed, each time we evaluate a \(+\), we monotonically reduce the number of terms. So, eventually we will a point where our expression is a single inequality, which we can decide.

\(\text{Th}(A_{+,\times})\) is not decidable, i.e. undecidable

Theorem

Given an expression in \(A_{+,\times}\), it is not generally possible to decide whether it is true or false.

Proof sketch: we can reduce deciding \(\text{A}_\text{TM}\) to deciding \(\text{Th}(A_{+,\times})\) by constructing a formula \(\phi\) (based on some Turing machine description \(\langle M \rangle\) and \(w\)) that is in \(\text{Th}(A_{+,\times})\) if and only if \(M\) accepts \(w\). Specifically, \(\phi\) checks if \(x\) specifies a sequence of configurations of \(M\) on \(w\) that ends with an accepting state, using the Gödel numbering scheme.

This indicates the general result that any system with the capacity to implement/simulate Gödel numbering must inherit undecidability.

Model \(M\) is axiomatizable if there exists set of premises \(\Gamma\) such that \(C(\Gamma)=\text{Th}(M)\), i.e. if we can describe \(M\) as a set of axioms \(\Gamma\).

\(\text{Th}(A_{+,\times})\) is not axiomatizable

Theorem

It is not possible to construct a set of axioms for \(A_{+,\times}\).

Proof: if some \(\Gamma\) existed such that \(C(\Gamma)=\text{Th}(A_{+,\times})\), then we could decide \(\text{Th}(A_{+,\times})\) by checking in parallel, for a given \(\sigma\), if \(\sigma\in C(\Gamma)\) or if \(\lnot\sigma\in C(\Gamma)\). By assumption that \(C(\Gamma)=\text{Th}(A_{+,\times})\), we'd have \(\sigma\in \text{th}(A_{+,\times})\iff \sigma\in C(\Gamma))\).

As a corollary, if we have sound axioms \(\Gamma\subseteq \text{Th}(A_{+,\times})\), then there must exist some \(\sigma\in \text{Th}(A_{+,\times})\) such that \(\sigma\not\in C(\Gamma)\).

18 - Asymptotics and Algorithm Efficiency

Thus far, all terminating problems have been labelled "decidable", but we can reason further about "how long" a decidable program might take to terminate and "how much space" it would use. We've neglected such measures when designing our algorithms, and now we must pay for our sins.

Complexity

We define the worst-case runtime for Turing machine \(M\) deciding language \(L\) as a function \(t:\mathbb{N}\to \mathbb{N}\) where \(t(n)\) is defined as the maximum number of steps it takes \(M\) to halt on an input string \(w\) of length \(n\).

This definition is concrete, but unnecessarily jagged: the precise definition of \(t(n)\) might be chaotic and categorically difficult to compute. To smooth things out, we partition worst-case runtime functions into classes by their asymptotic efficiency, i.e. their behavior as \(n\to\infty\).

Common Asymptotics Definitions

Definition

We have \(f(n)\in O(g(n))\) when there exists \(n_0<\infty\) such that for all \(n\geq n_0\), we have \(f(n)\leq c\cdot g(n)\) for some constant \(c > 0\).

We have \(f(n)\in \Omega(g(n))\) iff \(g(n)\in O(f(n))\)
We have \(f(n)\in \Theta(g(n))\) iff \(f(n)\in O(g(n))\cap \Omega(g(n))\), i.e. iff \(f(n)\in O(g(n))\) and \(g(n)\in O(f(n))\)

We have \(f(n)\in o(g(n))\) when for all \(\varepsilon > 0\), there exists some \(n_0<\infty\) such that for all \(n\geq n_0\), we have \(f(n)<\varepsilon\cdot g(n)\), implying \(\displaystyle\lim_{n\to \infty}\dfrac{f(n)}{g(n)}=0\).

We have \(f(n)\in \omega(g(n))\) iff \(g(n)\in o(f(n))\), implying \(\displaystyle\lim_{n\to \infty}\dfrac{g(n)}{f(n)}=0\).

The partitioning by the classes \(O\), \(\Omega\), \(o\), \(\omega\) defines an order relations over functions, acting respectively like \(\leq, \geq, <, >\). Partitioning based on \(\Theta\) defines an equivalence relation over functions.

Important Examples

Equivalence of logs: \(O(\log_k{n})=O(\ln{n})\)

Polynomials of a given degree: \(\displaystyle\sum\limits_{i=0}^{d}a_i n^{i}=O(n^{d})\)

Class of polynomial functions: \(2^{O(\log{n})}=n^{O(1)}=\displaystyle\bigcup_{k\in \mathbb{N}}O(n^{k})\). So, \(O(n^k)\subseteq n^{O(1)}\).

Complexity vs. Efficiency

Although complexity and efficiently are measured with the same units, they are different things: complexity is a property of a problem, whereas efficiency is a property (namely the worst-case runtime) of an algorithm.

Thus, complexity is global while efficiency is local

Since complexity is global, we can construct language classes based on complexity. We denote with \(\text{TIME}(t(n))\) the class of languages decidable by a multi-tape (standard) TM in \(O(t(n))\).

E.g. \(\set{0^{k}1^{k}\mid k\geq 0}\) is in \(\text{TIME}(n)\).

Relating Worst Cases of Different Models

Single-tape \(\leftrightarrow\) multi-tape TM

Theorem

A multi-tape TM with worst-case runtime \(t(n)\) has an equivalent single-tape TM with worst-case runtime \(O(t(n)^{2})\).

Proof: recall that to simulate multiple tapes, we separate them with sentinels. Each tape uses at most \(t(n)\) squares, so the tape itself has at most \(k\cdot t(n)+2k+1\) squares. Each step requires at most \(k\) inserts, so at most \(k\cdot t(n)+2k+1\) shifts are required. Every other step is \(O(1)\) to simulate, so our overall worst-case is \(t(n)\cdot k\cdot (k\cdot t(n)+2k+1)\in O(t(n)^2)\).

To reason about non-deterministic TMs, we amend our definition of \(t\): for ND TMs, \(t(n)\) is the maximum number of steps taken on any computational branch given an input of length \(n\).

Deterministic \(\leftrightarrow\) non-deterministic TM

Theorem

A non-deterministic TM with worst-case runtime \(t(n)\) has an equivalent deterministic TM with worst-case runtime \(2^{O(t(n))}\).

Proof sketch: follows again from ND → D conversion procedure. The execution tree has \(O(b^{t(n)})\) nodes and travelling from root to left is \(O(t(n))\); we have \(O(t(n)b^{t(n)})\subseteq 2^{O(t(n))}\).

19 - The Classes P and NP

The Class \(\mathbf{P}\)

The class \(\mathbf{P}\)

Definition

We define polynomial time \(P\) as the class of all languages that can be decided in polynomial time, i.e. \(P:=\displaystyle\bigcup_{k\in \mathbb{N}}\mathbf{TIME}(n^{k})\).

We consider polynomial time to be "reasonable" when designing algorithms, at least compared to exponential time.

When reasoning about problems, by social contract, we assume that structures are encoded in a "reasonable" way, e.g. graphs are adjacency matrices or adjacency lists. We can make a problem trivially easy by picking an "adversarial" encoding. E.g. primality testing is \(O(1)\) (loose bound) if we store numbers as products of prime powers.

Aside: I wonder what the average case deriving this encoding is. We know \(\pi(n)\sim \ln{n}\) (where \(\pi\) is the prime counting function), so most of the time it will be less. Hm.

Context-free languages are polynomial

Theorem

All context-free languages can be decided in polynomial time. So, \(L\in \text{CF}\implies L\in P\).

Proof: we know that a derivation of length \(|w|\) in a CFG in CNF (a reasonable assumption, since this conversion is also polynomial) requires exactly \(2|w|-1\) steps. Although there are exponentially many such paths, they overlap. Thus, they exhibit optimal substructure and dynamic programming can be used: for each node bottom up, we compute the set of possible non-terminals that can generate children. Reject if the top node doesn't contain \(S\). This is \(O(|w|^{2}|\langle G' \rangle|)\), so is polynomial.

Examples

We prove by describing a polytime algorithm.

Define the problem \(\text{PATH}\) as the language \(\set{\langle G,s,t \rangle\mid G \text{ is a directed graph with path from } s \text{ to } t}\).

\(\text{PATH}\) is in \(P\)

Theorem

\(\text{PATH}\) can be solved in polynomial time.

Proof: we mark every node that can be reached from \(s\), stopping when we either reach \(t\) (accept) or a fixpoint without \(t\) (reject). A simple graph with \(n\) vertices has at most \(n(n-1)\in O(n^{2})\) edges. At most \(n\in O(n)\) passes are done (since each pass is either a fixpoint or adds another vertex). So, the worst case runtime is \(O(n^{3})\), which is polynomial.

Define the problem \(\text{RELPRIME}\) as the language \(\set{\langle x,y \rangle \mid x,y\in \mathbb{N}, \text{GCD}(x,y)=1}\).

\(\text{RELPTIME}\) is in \(P\).

Theorem

\(\text{RELPTIME}\) can be solved in polynomial time.

Proof: the Euclidian algorithm determines the GCD of two numbers in polynomial time, and checking if this GCD is \(1\) is \(O(1)\).

The Class \(\mathbf{NP}\)

A verifier \(V\) for language \(L\) is an algorithm (TM) \(V\) such that \(L\) can be defined \(L:=\set{w\mid V\text{ accepts} \langle w,c \rangle \text{ for some } c}\), where \(c\) is a certificate string. \(L\) is polynomially verifiable if \(V\) runs in polynomial time with respect to \(|w|\) (in which case \(V\) is a polynomial time verifier).

Note: \(c\) is not constant with respect to \(w\); we just need some \(c\) to exist given some \(w\)
\(c\) is often an instance that is checked; more on that later

The class \(\mathbf{NP}\)

Definition

We define \(\mathbf{NP}\) as the class of languages with polynomial-time verifiers. Every language \(L\in \mathbf{NP}\) can be decided by a non-deterministic TM in polynomial time

Proof \(\implies\): For \(V\in O(n^{k})\), we construct non-deterministic TM that non-deterministically guesses \(c\) (where \(|c|\leq n^{k}\)) and calls \(V(\langle w,c \rangle)\), accepting if \(V\) accepts and rejecting if \(V\) rejects.
Proof \(\impliedby\): Given nondeterministic TM \(M\) that decides \(L\) in polytime, construct \(V(w,c)\) that simulates \(M\) on \(w\) using \(c\) to decide the path in the computation tree. If a branch accepts, \(V\) accepts. Otherwise, \(V\) rejects.

We define the class \(\mathbf{coNP}\) as the class \(\set{L \mid \overline{L}\in\mathbf{NP}}\).

==Note that being in NP is analogous (though not equivalent) to being recognizable. For recognizable \(L\), the recognizer may never halt when \(w\not\in L\) is used as input. However, for \(L\in \mathbf{NP}\), no certificate will exist for \(w\not\in L\), so it can't be verified in polynomial time==.

We define \(\mathbf{NTIME}(t(n))\) as the set of languages decidable in \(O(t(n))\) time by some non-deterministic TM. We can then characterize NP as \(\mathbf{NP}:=\displaystyle\bigcup_{k\in \mathbb{N}}\mathbf{NTIME}(n^{k})\).

Examples

Define the problem \(\text{HAMPATH}\) as the language \(\set{\langle G,s,t \rangle\mid G \text{ is a directed graph with a hamiltonian path from } s \text{ to } t}\). Recall that a hamiltonian path visits every vertex exactly once.

Define the problem \(\text{COMPOSITES}\) as the language \(\set{\langle x \rangle\mid x=pq \text{ for } p,q\in \mathbb{N} \text{ where } p,q\ne 1 \text{ and } p,q \ne x}\), i.e. the language consisting of composite numbers.

It turns out that \(\text{COMPOSITES}\) is actually in \(\mathbf{P}\), since we proved that \(\text{PRIMES}\) is in \(\mathbf{P}\) (2002) (\(\text{COMPOSITES}\) is the complement of \(\text{PRIMES}\))

Define the problem \(\text{CLIQUE}\) as the language \(\left\{ \langle G, k \rangle \mid G \text{ is an undirected graph that contains a clique of size } k \right\}\)

Proof: our certificate \(c\) is a \(k\)-clique. Our verifier \(V\) certifies that \(c\) has \(k\) nodes (\(O(k)\)) and that edges between each pair of nodes in \(c\) exists (\(O(k^{2})\)). If both pass, we accept, otherwise we reject.
Aside: there is an interesting reduction from \(\text{CLIQUE}\) to \(\text{3-SAT}\), more on that later

Define the problem \(\text{SUBSET-SUM}\) as the language \(\left\{ \langle S, t \rangle \mid S \text{ is a finite set of integers and some subset of } S \text{ sums to } t \right\}\)

Proof: Our certificate is a subset \(S'\) of \(S\). We simply add all the numbers in the subset together (\(O(n\lg{n})\)) and check if the result is equal to \(t\) (\(O(1)\), since we can compare to (constant) \(t\) directly).

Interpretation

In most of these examples, the corresponding decision problem can't be done in polynomial time because of the search aspect. For example, there are \(2^{|S|}\in O(2^{|S|})\) subsets of a set \(S\), so any decision problem expressible as finding a subset is likely exponential.

Many algorithms proceed by brute-forcing all the possible solutions, then checking them one by one until a valid one is found; it's usually the brute-force step that is super-exponential.

Improving algorithm efficiency is often done by pruning this search, either by leveraging some structure of the problem itself (e.g. optimal substructure → DP, greedy algorithms) or using a heuristic to trade completeness for efficiency

\(\mathbf{P}\) Vs. \(\mathbf{NP}\)

Clearly, \(\mathbf{P}\subseteq\mathbf{NP}\); if a problem can be decided in polynomial time, clearly an instance can be verified in polynomial time. On the other hand, we've yet to find a problem in \(\mathbf{NP}\) that is provably not in \(\mathbf{P}\). The fundamental question \(\mathbf{P}\overset{?}{=} \mathbf{NP}\) remains famously unsolved.

We believe that the search step can't always be avoided, i.e. that \(\mathbf{P}\ne \mathbf{NP}\)

20 - NP-Completeness

Polytime Reductions

A function \(f : \Sigma^{*}\to \Sigma^{*}\) is polytime-computable if there exists a TM that takes input \(w\), halts with \(f(w)\) written the the tape, and runs in polynomial time with respect to \(|w|\).

Language \(A\) is polytime-reducible to language \(B\), denoted \(A \leq_p B\) iff there exists polytime-computable \(f:A\to B\) such that \(w\in A\) if and only if \(f(w)\in B\).

Here, \(f\) is called a polytime reduction from \(A\) to \(B\).

\(\mathbf{P}\) is closed under polytime reduction

Theorem

For languages \(A,B\), if we have \(A\leq_p B\) and \(B\in \mathbf{P}\), then we will also hae \(A\in \mathbf{P}\).

Proof: we simply use the polytime reduction to convert the input from \(w\) to \(f(w)\) and call the polytime TM that recognizes \(B\) with input \(f(w)\). By construction, \(B\) accepts \(f(w)\) if and only if \(A\) accepts \(w\); thus, \(w\in A\) is decided in polytime.

Satisfiability Problems

Problems of satisfiability were among the first problems to be analyzed for complexity, and play a role in defining complexity classes. They are the canonical hard problem.

Define the problem \(\text{SAT}\) as the language \(\set{\langle \varphi \rangle \mid \varphi\text{ is a satisfiable boolean formula}}\), and the problem \(\text{3-SAT}\) as the language \(\set{\langle \varphi \rangle \mid \varphi\text{ is a satisfiable boolean formula in 3-CNF}}\), where \(\text{3-CNF}\) means "in conjunctive normal form, with each clause having \(3\) atoms". It is fairly trivial to see that both problems are in \(\mathbf{NP}\), since booleans formulas can be evaluated in polytime.

Note that converting a boolean formula to CNF is not a polytime reduction in general

\(\text{SAT} \leq_p \text{3-SAT}\)

Theorem

\(\text{SAT}\) is reducible to \(\text{3-SAT}\)

We describe a procedure to convert a boolean formula \(\varphi\) to 3-CNF formula \(\psi\) such that \(\varphi\) is satisfiable if and only if \(\psi\) is satisfiable. For every internal node in \(\varphi\), we add to \(\psi\):

Term \(a := b\land c\): \((b\lor \lnot a)\land (c\lor \lnot a)\land (\lnot b \lor \lnot c \lor a)\)
Term \(a := b\lor c\): \((\lnot b\lor a)\land (\lnot c\lor a)\land ( b \lor c \lor \lnot a)\)
Term \(a := \lnot b\): \((\lnot b \lor \lnot a)\land (b \lor a)\)

Each time, we encode the operator into multiple clauses by using \(a\) to "bind" the 3CNF clauses together. E.g. for \(\land\), we either have \(a\) or \(\lnot a\), so we either have \(b\land c\) (from clauses \(1\) and \(2\)) for \(a\) or \(\lnot b \lor \lnot c \equiv\lnot(b\land c)\) for \(\lnot a\), as expected. So, \(\psi\) is satisfiable if and only if \(\varphi\) is satisfiable.

We also have \(|\langle \psi \rangle|\in O(|\langle \varphi \rangle|)\)

\(\text{3-SAT}\leq_p \text{CLIQUE}\)

Theorem

\(\text{3-SAT}\) is reducible to \(\text{CLIQUE}\)

We construct a graph \(G\) from a boolean formula \(\varphi\) as follows:

For each 3-SAT clause in \(\varphi\), add three vertices to \(G\) such that each vertex in \(G\) corresponds to a literal in \(\varphi\).
Connect every pair of vertices except those in the same triple (clause) and those who contradict each other directly (i.e. no edges of the form \((a,\lnot a)\)).
Set \(k\) to be the number of clauses in \(\varphi\).

Then, \(G\) has a \(k\)-clique if and only if \(\varphi\) is satisfiable, since a \(k\)-clique in \(G\) corresponds to a satisfying assignment of \(\varphi\).

By not joining vertices in the same triple and requiring a clique of size \(k\), exactly one variable from each triple is chosen by a clique
By construction, no nodes in the clique contradict each other, so there exists a valuation such that each literal in the clique is true
Thus, by the structure of CNF, this is a satisfying assignment

This construction is also given in CMPUT 304.

NP-Completeness

\(\mathbf{NP}\)-Completeness

Definition

Language/problem \(L\) is \(\mathbf{NP}\)-complete if \(L\) is in \(\mathbf{NP}\) and every other problem in \(\mathbf{NP}\) can be polytime-reduced to \(L\)

Symbolically, if \(L\in \mathbf{NP}\) and \(\forall A\in \mathbf{NP}, A \leq_\mathbf{P} L\)

\(\mathbf{NP}\)-completeness is closed under polytime reduction

Theorem

If language \(A\) is \(\mathbf{NP}\)-complete, language \(B\) is in \(\mathbf{NP}\), and \(A \leq_\mathbf{P} B\), then \(B\) is \(\mathbf{NP}\)-complete.

Proof: given some \(\mathbf{NP}\)-complete \(L\), we know by definition that \(L \leq_\mathbf{P} A\) and \(A\leq_\mathbf{P} B\). Thus, by transitivity, \(L\leq_\mathbf{P}B\). Since we assumed \(B\in \mathbf{NP}\), \(B\) is \(\mathbf{NP}\)-complete.

We also fund that if there exists from \(\mathbf{NP}\)-complete language \(L\) that is also in \(\mathbf{P}\), then we have \(\mathbf{P}=\mathbf{NP}\), since we could reduce any \(\mathbf{NP}\)-complete \(A\) to \(L\in P\).

Cook-Levin Theorem (1971)

Theorem

\(\text{SAT}\) is \(\mathbf{NP}\)-complete.

Proof sketch: we can use a formula \(\varphi\) to enforce constraints that encode legal config sequences for \(M\) when executing on \(w\). Full proof in notes 20

21 - Complexity Hierarchy

Space Complexity

Recall the definition of worst-case time complexity. There is an analogous definition for space: A TM has space complexity \(s(n)\) it never visits more than \(s(n)\) tape cells on input \(|w|<n\). Again, for a nondeterministic TM, no sequence of configurations can visit more than \(s(n)\) spaces.

For \(s(n)\) smaller than \(n\) (i.e. in \(o(n)\)), we assume that the entire input has been read, but the working area of the TM is smaller

We have corresponding space complexity classes:

\(\text{SPACE}(s(n))\) is the set of languages that can be decided by an \(O(s(n))\) space TM
\(\text{NSPACE}(s(n))\) is the set of languages that can be decided by an \(O(s(n))\) space nondeterministic TM

Relating Space and time Complexity

Trivially, \(\text{TIME}(t(n))\subseteq \text{NTIME}(t(n))\) and \(\text{SPACE}(s(n))\subseteq \text{NSPACE}(s(n))\).

Time complexity bound by space complexity

Theorem

\(\text{TIME}(f(n))\subseteq \text{SPACE}(f(n))\) and \(\text{NTIME}(f(n))\subseteq \text{NSPACE}(f(n))\).

Proof sketch: anything that uses \(O(f(n))\) space needs \(O(f(n))\) time to actually access that space

Space complexity bound by time complexity

Theorem

\(\text{SPACE}(f(n))\subseteq \text{TIME}(2^{O(f(n))})\) and \(\text{NSPACE}(f(n))\subseteq \text{NTIME}(2^{O(f(n))})\)

Proof sketch: There are at most \(c^{f(n)}\) possible configurations that can be explored in \(f(n)\) space. If we explore more, there is a repetition → a cycle → an infinite loop → the algorithm might not terminate and thus is no longer a decider.

Basically, space can be reused whereas time cannot.

Savitch's Theorem

Theorem

For \(s(n)\geq n\), we have \(\text{NSPACE}(s(n))\subseteq \text{SPACE}(s(n)^2)\)

Proof sketch: We can recursively check whether \(c_0\) can reach \(c_\text{accept}\) in time \(t:=2^{d\times s(n)}\) steps. Each time, we look at half time and recursively check if both pairs of configuration "pairs" can be reached (like a binary search). Each check uses \(s(n)\) space, and we must keep track of a stack for recursion, which has depth \(\log_2{t}=\log_2{2^{d\times s(n)}}=d\times s(n)\in O(s(n))\). So, the final complexity is \(O(s(n)^2)\)

Constructibility

Function \(t(n) :\mathbb{N}\to \mathbb{N}\) is time-constructible if \(t(n)\geq n\log{n}\) and if the function that maps \(1^{n}\mapsto \text{binary representation of } t(n)\) can be computed in \(O(t(n))\).

Function \(s(n) :\mathbb{N}\to \mathbb{N}\) is space-constructible if \(s(n)\geq \log{n}\) and if the function that maps \(1^{n}\mapsto \text{binary representation of } s(n)\) can be computed in \(O(s(n))\).

These rule out slow-growing functions and uncomputable functions.

Space Hierarchy Theorem

Theorem

For any space-constructible \(s(n)\), there exists language \(L\) decidable in \(O(s(n))\) space but not \(o(s(n))\) space.

Proof is by diagonalization, full proof in Notes 21

Time Hierarchy Theorem

Theorem

For any time-constructible \(t(n)\), there exists language \(L\) decidable in \(O(t(n))\) time but not \(o\left(\dfrac{t(n)}{\log{t(n)}}\right)\) time.

Proof is similar to the space hierarchy theorem

Important Classes

Concentrically, in order of increasing generality

Name	Expression	Example
\(L\)	\(\text{SPACE}(\log{n})\)	\(0^k 1^k\)
\(\mathbf{NL}=\mathbf{coNL}\)	\(\text{NSPACE}(\log{n})\)	\(\text{PATH}\)
\(P\)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{TIME}(n^k)\)	\(A_\text{CFG}\), \(\text{RELPRIME}\), \(\text{CIRCUIT-VALUE}\)
\(NP\) or \(coNP\) (not equal!)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{NTIME}(n^k)\)	\(\text{SAT}, \text{3-SAT}, \text{CLIQUE}\)
\(PSPACE=NPSPACE\)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{SPACE}(n^k)\)	\(\text{TQBF}\)
\(EXPTIME\)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{TIME}(2^{n^k})\)	\(\text{HALT IN }k\), when coded in \(\log{k}\) size
\(NEXPTIME\)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{NTIME}(2^{n^k})\)
\(EXPSPACE\)	\(\displaystyle\bigcup_{k\in \mathbb{N}}\text{SPACE}(2^{n^k})\)	\(\text{EQ}_{\text{REX}\uparrow}\), i.e. equality for regular expressions with exponents

Reductions

The space complexity equivalent to a polytime reduction is a log space reduction

Log space reduction

Definition

A log space reduction \(A\leq_{L}B\) exists between languages \(A\) and \(B\) if there exists log-space computable function \(f\) such that \(w\in A\iff f(w)\in B\).

A log space reduction implies a polytime reduction since \(\text{SPACE}(f(n))\subseteq \text{TIME}(2^{O(f(n))})\)

In turn, a function \(f\) is log space computable if there exists TM with read-only input, write-only output, and working memory that uses \(O(\log{|w|})\) working space to produce \(f(w)\) from \(w\).

Completeness

We can define completeness for each class we've seen in the same way we defined \(\mathbf{NP}\)-completeness earlier.

Also, it is worth noting which separations we know, i.e. which classes we know are not the same:

X22 - Kolmogorov Complexity

All this talk of "compressibility" in the first lecture and we don't cover Kolmogorov complexity? This shall not stand.

prefix vs. prefix-free code

strings are x or w?

In these notes, we reason about the Kolmogorov complexity of strings, but the Kolmogorov complexity of any object with a description language can be reasoned about. E.g. if \(\mathbb{T}\) is the set of all Turing machines, then \(\set{T\in \mathbb{T} \mid \langle T \rangle}\) is the description language of Turing machines (with respect to \(\langle \dots \rangle\)).

random access notes

- depends on a model of computation
- see also: code golf
- prefix-free has to do with nondeterminism? → can nondeterministically compute (?)
- diagonal argument
- bound by actual length of the string → $| \langle M,w \rangle |$ 
- entropy

Our definitions of time and space complexity describe the behavior of an algorithm while it is running (i.e. at runtime). There is also a notion of complexity the describes how compressible a problem is by considering bounds on the size of the description of an algorithm that solves it.

E.g. the string 10101010101010101010101010101010 can be described as "repeat 10 \(16\) times"
E.g. the pseudo-random string 1001011101101011010001 doesn't have a succinct description like the string above, even though it has more characters

Plain Kolmogorov Complexity

Given a target string \(x\), consider combinations of Turing machines \(M\) and input strings \(w\) such that \(M\) leaves \(x\) on the tape when run with input \(w\). Thus, \(d:=\langle M,w \rangle\) is a description of \(x\). Let \(\star{d}\) be a minimal description of \(x\) with respect to length.

Then, the plain Kolmogorov complexity \(C(x)\) of string \(x\) is the minimal length \(|\star{d}|\) of a description of \(x\).

A.k.a plain Kolmogorov Complexity

By the Church-Turing thesis (and theoretical CS common sense), we can reason about Kolmogorov complexity at a higher level than Turing machines. E.g. K-complexity puts a bound on how succinctly ideas can be expressed with natural language.

Sneaky Sneaky Shenanigans

There's almost a law of the universe that the first solution you try is almost always the worst one

CGP Grey

Plain Kolmogorov complexity makes an assumption that it shouldn't: it assumes we know the lengths of strings. This seems reasonable, but becomes a problem when we reason about concatenation. Namely, we can't relate \(C(xy)\) and \(C(x)+C(y)\) without knowing "where to divide \(xy\) into \(x\) and \(y\)". We store the length of strings or use a delimiter, but both of these solutions effectively add characters to the input string.

Specifying the length of a string that lets us separate the concatenation \(xy\) takes \(O(\min\set{\ln{x},\ln{y}})\), since a string representing length is \(O(\lg{n})\)

With plain Kolmogorov complexity, \(C(xy) < C(x)+C(y) + O(1)\) is not generally true as would be expected. We would need a \(O(\min\set{\ln{x},\ln{y}})\) term on the right to balance the inequality.

Prefix-free Kolmogorov complexity solves this problem.

Prefix-free Kolmogorov Complexity

A prefix-free code is a code (in the coding theory sense; here, a subset of \(\set{0,1}^*\)) such that no codeword is a prefix of another. In such a code, we can always determine where a codeword ends, so we don't need to use delimiters or length encoding. A prefix-free Turing machine writes prefix-free codewords (using alphabet \(\set{0,1}\)) to its tape; it is required to read its input exactly once, left to right.

Huffman coding is used to find optimal prefix-free codes

We formally describe the prefix-free Kolmogorov complexity as follows:

Construct prefix-free Turing machine \(M\) with three tapes: a read tape infinite in one direction, a write tape infinite in one direction, and a work tape infinite in both directions
- We have already seen how multi-tape Turing machines are computationally equivalent to basic ones
No backtracking is allowed on the read and write tapes
The work tape and write tape are initialized with \(0\)s. The read tape is initialized with the input, encoded with a prefix-free code \(S\)

The prefix-free Kolmogorov complexity \(K(x)\) of string \(x\) is the shortest prefix-free codephrase that makes Turing machine \(M\) output \(x\), i.e. \(K(x):=\min\set{|w| \mid w\in S^{*}, U(c)=x}\).

check this, it seems fucky

Invariance Theorem

A optimal description language is a description language such that any description of an object in an alternate description language can be adapted to the constant language with at most constant overhead. Namely, this constant overhead is a description of the alternate description language.

I.e. if \(L_D\) is an optimal description language and object \(O\) has description \(w_O'\) in some other description language \(L_D'\), the description \(w_O\) of \(O\) in \(L_D\) obeys \(|w_O|\in O(|w_O'|)\).

Invariance Theorem

Theorem

Let \(L_1, L_2\) be Turing-complete description languages, and let \(K_{L_1}\) and \(K_{L_2}\) be prefix-free complexity functions for \(L_1\) and \(L_2\) respectively. Then, for all strings \(w\), we have \({-c \leq K_{L_1}(w) - K_{L_2}(w) \leq c}\) for some constant \(c\).

Proof sketch: Let program \(L_1 \ni U : \Sigma_{L_2}^{*} \to \Gamma_{L_2}^{*}\) be an interpreter of \(L_2\) written in \(L_1\) (this must exist by Turing-completeness). Specifically, it takes a program string (member of \(L_2\)) and returns the result. Let program \(W\) in \(L_2\) be a minimal description of the string \(w\). Then, \(U(W)\) must return \(w\) by definition. So, the description of \(w\) in \(L_1\) is \(\langle \langle U \rangle_{L_1}, \langle W \rangle_{L_2}\rangle \geq \langle U \rangle_{L_1} + \langle W \rangle _{L_2} \in O(|\langle W \rangle_{L_2}|)\), as desired.
Main idea: the length of minimal descriptions of strings in two languages differs by the size of a minimal interpreter for one language written in the other

Basic Results

Relating plain and prefix-free complexity

Theorem

\(K(x)\leq C(x) + 2\lg{C(x)} + O(1)\).

Proof: We define the following procedure for converting a basic Turing machine program to a prefix-free Turing machine program:

Determine the length of the basic Turing machine program (\(O(n)\))
Encode the length to a prefix-free encoding, e.g. by doubling each digit and adding a termination code (e.g. \(9_{10}\to 1001_2 \to (11)(00)(00)(11)\underline{(01)}\)).

The description for the prefix-free Turing machine program must include the following:

A description of how to simulate a basic Turing machine with a prefix-free one (\(O(1)\))
The coded length of the program (\(O(\lg(C(x))\))
The program itself \(O(C(x))\)

So, \(K(x)\in O(C(x))\). The precise inequality comes from the encoding scheme we picked for the length of the prefix-free encoding.

Bound on plain complexity relative to length of string

Theorem

There exists constant \(c\) such that for all strings \(x\), we have \(C(x) \leq |x|+c\).

Proof sketch: to output string \(x\), we could write a program \(\text{print }x\). Thus, the \(+c\) comes from the amount of description required for a "print" statement.

As a corollary, we have \(K(x)\leq |x|+2\lg{|x|}\) (from the above-above theorem).

We define the conditional Kolmogorov complexity \(K(x|y)\) of the string \(x\) given \(y\) as the length of the shortest program (in some fixed description language) that outputs \(x\) when given \(y\) as additional input.

For convenience, we will \(K((x,y))\) (i.e. the complexity of the tuple of strings \((x,y)\)) as \(K(x,y)\).

Extra information bound

Theorem

\(K(x| y) \leq K(x)\leq K(x,y) \leq \max\set{K(x|y) + K(y), K(y|x) + K(x)}\leq K(x)+K(y)\)

Proof sketches:

Conditioning a complexity on another string can't increase it since we are providing additional information that we could just ignore
Describing one string cannot be more complex than describing that string and an additional string
(WLOG) If we describe \(x\), \(y|x\) should be easier to describe than \(y\) by itself because we have additional information about \(x\) already

Sub-additivity

Theorem

\(K(xy)\leq K(x,y)\)

Proof sketch: describing \(x,y\) requires delimiter information where \(xy\) doesn't.

Note that there isn't a way to compare \(K(xy)\) to \(K(x|y)\), \(K(x)\), etc.

Symmetry of Information

Theorem

\(K(x,y)=K(y,x)=K(x|y, K(y)) + K(y)\)

Here, the last equality is proved with a counting argument.

Information non-increase

Theorem

For any computable function \(f\), we have \(K(f(x))\leq K(x)+K(f)\)

Proof: clearly, if we can describe a computable function and an input, we can compute the result of the function application with a universal program (constant overhead).

Kolmogorov Complexity is Uncomputable

Note that we can't compute Kolmogorov complexity by simply enumerating all valid programs and simulating them because not all programs can halt.

Proposition 1: Kolmogorov complexity can get arbitrarily large

Proposition

For every \(n\in \mathbb{N}\), there exists string \(x\) such that \(K(x)\geq n\)

Proof: if this weren't true, there would be some \(n\) such that every possible string can be produced by a program expressible with less than \(n\) bits. There are finitely many such programs but infinitely many strings, a contradiction.

Uncomputability of Kolmogorov complexity

Theorem

The function \(K(x)\) is not computable

Interpretations

Current LLMs make real an interesting thought experiment: code can now be described with natural language. There is an attitude that LLMs make programming "easier" because the programmer must know less to implement their program. However, the results of programs have some (categorically minimal) Kolmogorov complexity, which puts a bound on just how "easy" such programs could be to write. Describing the behavior of a program (as would need to be done to prompt an LLM to write one) cannot be unboundedly easy.

X23 - Randomized Algorithms and their Complexity Classes

Randomized algorithms and their consequences have been a disaster for my GPA Or they would be, if the science faculty would let me take the damn 499 Shut up and take my money!

Appendix

implementations of functions with ptimive recursion notes 16

In my (admittedly basic) experience, algebraic structures are described as a tuple of sets and operators defined over those sets, as well as a collection of axioms that describe the behavior of the operators, e.g. a ring is a \(3\)-tuple \((R, +, \cdot)\) where \(+,\cdot\) are binary operators over \(R\) such that [ring axioms]. I've found this structure of description to be a useful constraint for understanding because it forces me to reduce a structure to its essence when describing it this way, much how implementing a program in assembly requires absolute understanding of each internal mechanism. Anyway, this kind of description probably exists in every field of math, but I associate it with algebra the most.↩︎

Formal Languages Automata and Computability CMPUT474

01 - Introduction

Behavior of Simple Mechanisms

Cellular Automata

Simple Rules → Complex Behavior

Math is Hard

Finiteness Vs Infiniteness

Decision Problems

Cantor's Theorem

Self-Reference is Interesting

It is Always Possible to Write Non-halting Programs

Semantics is Hard Too

02 - Number Systems and History of Computation

Real Numbers

Construction

\(\mathbb{R}\)? In My Computations? It's Less Likely than You Think

Emergence of Computation

03 - Finite State Computation

Finite State Computation

Languages

Cardinality Questions

Regular Expressions

04 - Nondeterministic Computation

DFAs and NFAs Have the Same Computational Power

Regular Expressions (cont.)

05 - Regular Expressions

Regular Expressions

Generalized NFAs

06 - Limits of Finite State Computation

Not Everything Can Be Finite-state Computed

Proving a Language Isn't Regular

Using the Pumping Lemma

Using the Recursive Nature of Regular Expressions

07 - Pushdown Automata

Deterministic Pushdown Automata (DPDA)

What's New?

Nondeterministic Pushdown Automata (PDA)

Asides:

08 - Context-Free Grammars

Context-free Grammars

Examples and Applications

Notation

Elementary Results

Ambiguity

Chomsky Normal Form

09 - Equivalence of PDAs and CFGs

Lemma I: CFG \(\implies\) PDA

Lemma II: PDA \(\implies\) CFG

10 - Limits of stack augmented computation

Pumping Lemma for CFGs

Proof

Proving a Language Isn't Context-free

11 - Turing Machines

Models of Computation

Turing Machines

Example: \(\Lambda:=\set{w\# w\mid w\in \set{0,1}^{*}}\)

Recognizing and Deciding

Non-deterministic Turing Machines

12 - Equivalence of Turing machine models

Extensions and Their Models

Adding a \(\text{STAY}\) Action

Adding Insert/Delete

Adding \(k\) Different Tapes

Nondeterministic Turing Machine

13 - Models of Computation

Levels of Abstraction

Concentric Models of Computation

14 - Decidable Problems

Meta-Language Classes

Decidability of Language Classes

Theorems: Finite Automata and Regular Expressions

Theorems: Pushdown Automata and Context-free Grammars

15 - Undecidable and Unrecognizable Languages

Undecidability of Language Classes

Theorems: Turing Machines

Co-recognizability

Proving Undecidability Using Reduction

Theorems: Turing Machines

Rice's Theorem

Proof

Formal Languages Automata and Computability `CMPUT474`