Proof Systems
Formalizing deduction — natural deduction, sequent calculus, and the relationship between provability and truth.
A proof system is a purely syntactic machine for deriving theorems: given a set of axioms and inference rules, it produces new formulas without ever appealing to truth or meaning. Where semantics asks “is this sentence true in every model?”, a proof system asks “can this sentence be derived from the rules?” The remarkable fact — established by completeness theorems — is that these two questions have the same answer for classical logic. But the syntactic perspective stands on its own, and different proof systems illuminate different aspects of deduction. Natural deduction mirrors how mathematicians actually reason; Hilbert-style systems strip inference down to a single rule; sequent calculus exposes the deep symmetries of logical connectives. Understanding why these systems exist, and how they relate, is the starting point for proof theory, automated reasoning, and the foundations of mathematics.
Natural Deduction
In 1934, Gerhard Gentzen introduced natural deduction with the explicit goal of capturing how mathematicians actually think. Rather than deriving everything from a fixed set of axioms, natural deduction provides paired introduction and elimination rules for each logical connective — rules that say how to build a formula up and how to take it apart. The system was a departure from earlier formalisms: proofs look like trees of reasoning steps, not chains of axiom substitutions, and the rules are designed so that each connective is fully characterized by the way it is used.
The introduction and elimination rules for propositional connectives are as follows. For conjunction: -intro derives from separate proofs of and ; -elim extracts either conjunct from a proof of . For disjunction: -intro derives from a proof of either disjunct; -elim (sometimes called “proof by cases”) derives from together with proofs that and . For implication: -intro derives by assuming and deriving , then discharging the assumption; -elim is the familiar modus ponens, deriving from and . For negation: -intro derives by assuming and deriving a contradiction (); -elim (or ex falso quodlibet) derives any formula from .
Classical logic adds a further rule: reductio ad absurdum (proof by contradiction). To prove , assume and derive ; then discharge the assumption and conclude . This rule is what separates classical natural deduction from its intuitionistic counterpart, which omits it and thereby restricts itself to constructive reasoning. Beyond the logical rules, natural deduction also employs structural rules that govern the management of assumptions: weakening allows unused assumptions to be carried along in a proof, contraction permits using the same assumption more than once, and exchange says the order in which assumptions are listed does not matter.
The system extends naturally to first-order logic by adding quantifier rules. Universal introduction (-intro) derives from a proof of where is an arbitrary parameter not appearing in any undischarged assumption — this captures the “let be arbitrary” move in mathematical proofs. Universal elimination (-elim) derives from for any term . Existential introduction (-intro) derives from for a specific witness . Existential elimination (-elim) derives from and a subproof showing follows from , where is fresh.
As an example, consider deriving . We assume (assumption 1), then assume (assumption 2). From assumption 1 we have immediately. By -intro, discharging assumption 2, we get . By -intro again, discharging assumption 1, we get . The proof is a nested discharge of assumptions — exactly the pattern a working mathematician would use when told “suppose ; now suppose also ; clearly still holds, so .”
The key insight of natural deduction is this correspondence between formal derivations and informal mathematical reasoning. Gentzen designed the system so that a proof of is literally a procedure that transforms a proof of into a proof of . This idea — that proofs have computational content — later became the foundation of the Curry-Howard correspondence, linking proof theory to type theory and lambda calculus.
Hilbert-Style Axiom Systems
Hilbert-style systems take the opposite approach to natural deduction: instead of many inference rules and no axioms, they use many axiom schemas and a single inference rule. The design reflects David Hilbert’s formalist program of the early 1900s, which sought to reduce all of mathematics to a minimal, mechanically verifiable foundation. The result is a system of extreme economy — easy to reason about but painful to reason in.
A standard axiomatization of classical propositional logic uses three axiom schemas and one rule. The axioms are:
The sole inference rule is modus ponens: from and , derive . Every instance of these schemas — obtained by substituting any formulas for , , — is an axiom. Axiom A1 says that any true proposition is implied by anything; A2 distributes implication over itself; A3 encodes the contrapositive, which gives the system its classical character.
The central metatheorem of Hilbert-style systems is the deduction theorem: if (i.e., is derivable from together with ), then . This is not a rule of the system but a theorem about the system, and its proof is a constructive procedure that transforms any derivation using an extra assumption into a longer derivation that avoids the assumption by wrapping everything under . The deduction theorem bridges the gap between Hilbert-style proofs and the more intuitive assumption-based reasoning of natural deduction.
This axiomatization is complete for classical propositional logic: every tautology is derivable. The proof of completeness, combined with the soundness of modus ponens, shows that provability in the Hilbert system coincides exactly with semantic validity. For first-order logic, the axiom set is extended with quantifier axioms — such as (universal instantiation) — and a generalization rule that derives from when is not free in any undischarged assumption.
The trade-off of Hilbert systems is one of economy versus usability. The system has a beautifully small set of primitives, which makes it ideal for metamathematical analysis — proofs of completeness, consistency, and independence are cleanest in this setting. But actual derivations are notoriously long. Proving something as elementary as (the identity) requires several non-obvious steps using A1 and A2, whereas in natural deduction it is a one-step assumption discharge. This verbosity is inherent: Hilbert-style proofs are often exponentially longer than their natural deduction counterparts for the same theorem. In practice, logicians typically work in richer systems and invoke the equivalence to Hilbert systems only when they need the metamathematical leverage that minimality provides.
Sequent Calculus and Cut Elimination
Gentzen introduced the sequent calculus alongside natural deduction in his landmark 1934 paper, but it was designed for a different purpose: not to mirror mathematical reasoning, but to make the structure of proofs amenable to mathematical analysis. The fundamental object is the sequent, written , where and are finite multisets of formulas. The intended reading is “if all formulas in hold, then at least one formula in holds.” The two-sided presentation — with formulas on both left and right of the turnstile — reveals a duality between assumptions and conclusions that is invisible in other proof systems.
Each connective receives left rules and right rules. For instance, the right rule for conjunction (-R) derives from and — to conclude a conjunction on the right, prove both conjuncts. The left rule for conjunction (-L) derives from (or from ) — to use a conjunction on the left, extract a conjunct. Similarly, -R introduces an implication on the right by moving the antecedent to the left of the turnstile, and -L splits: to use on the left, one must prove (on the right) and use (on the left). The structural rules — weakening, contraction, and exchange — operate on both sides of the sequent.
The most important rule in the sequent calculus is the cut rule:
The cut rule says: if can be proved, and can be used to prove something else, then can be bypassed — the “middle man” is eliminated. Cut corresponds to the use of lemmas in ordinary mathematics: you prove a lemma, then use it. The cut formula need not appear in the conclusion, so the cut rule can introduce formulas that are not subformulas of the final theorem.
Gentzen’s Hauptsatz (main theorem), also from 1934, states that the cut rule is admissible: any sequent provable with cuts is also provable without cuts. This is the cut-elimination theorem, and it is the single most important result in structural proof theory. The proof is a complex double induction — on the complexity of the cut formula and the depth of the derivation — that systematically pushes cuts upward through the proof tree until they disappear.
The consequences of cut elimination are profound. First, it yields the subformula property: in a cut-free proof of , every formula appearing in the derivation is a subformula of some formula in or . This means proofs cannot go “outside” the vocabulary of the theorem — they are analytic. The subformula property immediately implies consistency: there is no cut-free proof of the empty sequent (with nothing on either side), since there are no subformulas to work with. Gentzen used this idea in 1936 to give the first constructive consistency proof of Peano arithmetic, using transfinite induction up to the ordinal .
Cut elimination also implies Craig’s interpolation theorem: if is provable, then there exists an interpolant — a formula whose non-logical vocabulary appears in both and — such that and are both provable. The interpolant can be read off from a cut-free proof by examining which formulas mediate between the left and right sides. Interpolation has far-reaching applications in model theory, database theory, and software verification.
Proof Complexity and Automated Reasoning
The study of proof complexity asks a deceptively simple question: how long must a proof be? Different proof systems can produce dramatically different proof lengths for the same theorem, and understanding these separations connects proof theory directly to computational complexity.
The foundational framework was laid by Stephen Cook and Robert Reckhow in 1979. They defined a proof system (in the general sense) as a polynomial-time computable function such that the range of is exactly the set of tautologies. A string is a proof of if — that is, proofs are efficiently verifiable certificates of validity. Under this definition, the question “does every tautology have a polynomial-length proof in some proof system?” is equivalent to the question . If , then no single proof system can have short proofs for all tautologies.
The most striking concrete results involve resolution, a proof system based on a single rule: from clauses and , derive . J. Alan Robinson introduced the resolution principle in 1965 as a foundation for automated theorem proving. Resolution is refutation-complete for propositional logic: a set of clauses is unsatisfiable if and only if the empty clause can be derived. However, resolution proofs can be exponentially long. The classic example is the pigeonhole principle , which states that pigeons cannot be placed into holes without a collision. Armin Haken proved in 1985 that every resolution proof of requires exponentially many steps in , even though the principle is an elementary combinatorial tautology. This exponential lower bound demonstrates a fundamental limitation of resolution-based reasoning.
In practice, resolution is the backbone of modern SAT solvers, which have become remarkably effective despite worst-case exponential bounds. The DPLL algorithm (Davis-Putnam-Logemann-Loveland, 1962) searches for satisfying assignments using unit propagation and backtracking — essentially performing resolution implicitly. Modern conflict-driven clause learning (CDCL) solvers extend DPLL with learned clauses derived from conflicts, enabling them to solve industrial-scale instances with millions of variables. The gap between worst-case exponential bounds and practical polynomial-time behavior on structured instances is one of the central mysteries of computational complexity.
Beyond propositional reasoning, automated deduction for first-order logic builds on Robinson’s resolution with unification — an algorithm that finds substitutions making two terms identical. First-order resolution, combined with strategies like set-of-support and hyperresolution, powered early automated theorem provers. The modern landscape includes both resolution-based systems and more sophisticated approaches: superposition calculus for equational reasoning, tableaux methods that search for counterexamples, and SMT solvers (satisfiability modulo theories) that combine SAT solving with domain-specific decision procedures for arithmetic, arrays, and other theories.
The connection between proof theory and computation reaches its fullest expression in interactive theorem provers such as Coq, Lean, and Isabelle. These systems implement formal proof calculi — typically based on dependent type theory or higher-order logic — and allow mathematicians to construct machine-verified proofs. The Curry-Howard correspondence ensures that every proof in these systems is also a program, and every program is a proof. Major mathematical results have been formalized in these systems, including the four-color theorem (in Coq, 2005) and the Kepler conjecture (in Isabelle and HOL Light, 2014). As the technology matures, the boundary between proof theory and software engineering continues to dissolve, and the question of what constitutes a rigorous proof is being reshaped by the tools that Gentzen’s work made possible.