Proof systems

Author

Affiliation

Clayton Cafiero

University of Vermont

Published

2026-05-14

What is a proof system? A proof system allows us to derive new statements from an existing set of facts and rules. This is done by querying (asking questions of) the proof system.

We refer to a set of facts and rules as a knowledge base, and we often abbreviate this as KB. Given some knowledge base, we can ask questions like “Is such-and-such true?” or “What are the substitutions which make some proposition true?” More generally, we ask: does the knowledge base, KB, entail Q for some query Q.

Earlier, we introduced entailment, where we write KB \vDash Q. If when KB is true for all interpretations Q is also true, we say KB entails Q.

The Deduction Theorem

The Deduction Theorem, which has been known since the time of Aristotle (and likely earlier), states

\begin{gather*} A \vDash B \quad \Longleftrightarrow \quad \vDash A \implies B. \end{gather*}

This is connects entailment, which is a matter of semantics, with implication, which is a matter of syntax.

For clarity’s sake, it’s best to reinforce the difference between entailment and implication. Implication is a relation between propositions (a.k.a., sentences), specifically

\begin{gather*} A \implies B \quad \Longleftrightarrow \quad \neg B \lor A, \end{gather*}

whereas entailment is a relation between a set of propositions (which here we call a knowledge base) and a single proposition. Thus KB \vDash Q is true if and only if every interpretation that makes all elements of KB true, also makes Q true.

A naïve proof system

Recall that a tautology is a proposition whose negation cannot be satisfied. This means that a tautology is necessarily true for all interpretations. Accordingly, we write \vDash A to indicate that A is a tautology.

If we wish to demonstrate that KB \vDash Q, one approach is to show that KB \implies Q is a tautology. This is a brute force approach, called model checking. Recall that

\begin{gather*} A \vDash B \quad \Longleftrightarrow \quad M(A) \subseteq M(B). \end{gather*}

This approach seeks to prove that A \vDash B by verifying that M(A) \subseteq M(B).

However, there’s a problem with this approach. Consider that each propositional variable can be true or false. Therefore, for any given knowledge base, KB, if there are n propositional variables, then there are 2^n possible interpretations for these variables. In the worst case, we have asymptotic complexity of \mathcal{O}(2^n).

Unfortunately, this is the worst case for any problem of entailment.¹ Clearly, our naïve approach is only useful for tiny problem instances and cannot be used for knowledge bases of any appreciable size.

What is to be done, then? Since we cannot escape the worst case of \mathcal{O}(2^n) we need an approach with better performance in the average case (hopefully, a majority of cases). Accordingly, we take recourse to the rules of logic to assist.

Instead of enumerating the set of all possible interpretations, we use rules of logical inference to simplify problem instances. First, we will need some inference rules.

Inference rules

Modus ponens²

Whenever we are given any proposition of the form A \implies B, and a proposition A, then we may infer B. One way of writing this is with a proof tree in sequent notation:

\dfrac{A \Rightarrow B \qquad A}{B}

The antecedents are written above the line, and the consequent (that which we may infer) is below the line.

For example, if we are given “if the sidewalk is wet then it is raining” and “the sidewalk is wet”, we may infer (in this system, setting aside matters of lawn sprinklers, leaky hoses, and gushing fire hydrants) that “It is raining.”

\dfrac{\text{Sidewalk is wet.} \Rightarrow \text{It is raining.} \qquad \text{Sidewalk is wet.}}{\text{It is raining.}}

That’s modus ponens in a nutshell.

Modus tollens³

Modus tollens is often referred to as “denying the consequent.” If we are given that A entails B, and we are also given \neg B, we may infer \neg A.

\dfrac{A \Rightarrow B \qquad \neg B}{\neg A}

Conjunction elimination

Another rule is conjunction elimination (a.k.a. and-elimination). If we have a conjunction which is true, we may infer either of the conjuncts.

\dfrac{A \land B}{A}

This is easily proven by recourse to truth table.

Other rules of inference

Earlier, we presented a number of equivalences in propositional logic, e.g., biconditional elimination, distribution, use of De Morgan’s Laws, etc. We may use these as inference rules. For example,

\dfrac{(A \Rightarrow B) \land (B \Rightarrow A)}{A \Longleftrightarrow B}

\dfrac{\neg B \Rightarrow \neg A}{A \Rightarrow B}

It is important to note that these rules are sound. That is, we cannot derive anything using this approach that is not actually entailed by the facts and rules in our knowledge base.

Another way of writing such derivations is with the use of the symbol \vdash. If we can derive one proposition from another we write A \vdash B and say that B can be derived from A, or B is derivable from A.

An improved proof system

Equipped with rules of inference, we may proceed with an improved proof system—one which, instead of enumerating models, can search for proofs using these rules of inference. While we’re stuck with the worst case described earlier, searching for proofs can be much more efficient than enumerating models. One reason for this is that this approach can ignore irrelevant propositions—no matter how many there are! So searching for proofs—by eliminating irrelevant propositions—has an average case complexity that is much better than the model enumeration approach.

We will see other examples of techniques to reduce the search space, which, in some cases, make the difference between whether a given problem instance is tractable or not. This is a common theme in artificial intelligence programming.

Proof by contradiction

One common proof approach in mathematics is called proof by contradiction. The idea is simple: we assume, for the sake of contradiction, the negation of what we wish to prove, and then we use this to derive a contradiction.⁴ We may use this approach when searching for proofs. Consider:

\begin{gather*} KB \Rightarrow Q \quad \equiv \quad \neg KB \lor Q \\ \neg(KB \Rightarrow Q) \quad \equiv \quad \neg(\neg KB \lor Q) \end{gather*}

Here, the first line is the definition of implication, and the second we get by negating both sides.

Taking the right hand side, distributing the negation, and cancelling double-negation, we have

\begin{gather*} \neg(KB \Rightarrow Q) \quad \equiv \quad KB \land \neg Q \end{gather*}

So if our knowledge base KB entails Q then KB \land \neg Q is unsatisfiable.

\begin{gather*} KB \vDash Q \quad \text{if and only if} \quad KB \land \neg Q \text{ is unsatisfiable.} \end{gather*}

Soundness and completeness

Important properties we wish our proof system to have are soundness and completeness.

Soundness means that we cannot derive any proposition using the syntactic rules of our proof system which is semantically invalid. More succinctly, we should only be able to prove things which are true.

Completeness means that our proof system should be able to prove anything that is indeed true—that is, there are no truths that we cannot derive in our proof system. More generally, in logic, a proof system—any formal system—is called complete with respect to a particular property if every formula having the property can be derived using that system.

Note that soundness and completeness are two sides of the same coin. One is the converse of the other.

\begin{gather*} \text{if } KB \vdash Q \quad \text{then} \quad KB \vDash Q \\ \text{if } KB \vDash Q \quad \text{then} \quad KB \vdash Q \end{gather*}

Again, the symbol \vdash signifies derivation using syntactic rules, e.g., A \vdash B means that B can be derived from A.

Syntactic derivation and semantic entailment

So, we have two levels to consider. The syntactic level, which deals with the symbols and rules of propositional logic, and a semantic level, which deals with interpretations and models. We may represent this with what is called a “commutative diagram.” A commutative diagram is a diagram such that all directed paths in the diagram with the same start and endpoint yield the same result.

\begin{array}{ccc} & \textit{syntax} & \\[6pt] KB & \xrightarrow{\quad\text{derivation } \vdash\quad} & Q \\[10pt] \downarrow\,{\scriptstyle\text{interpretation}} & & \downarrow\,{\scriptstyle\text{interpretation}} \\[10pt] \text{M}(KB) & \xrightarrow{\quad\text{entailment } \vDash\quad} & \text{M}(Q) \\[6pt] & \textit{semantics} & \end{array}

This shows that if our system is sound and complete, then we can manipulate symbols at the syntactic level to arrive at true statements at the semantic level. That is, as noted earlier, we could start from a knowledge base, interpret it, enumerate all models, and perform model checking to see if KB \vDash Q, or instead, through symbolic manipulation, derive the result we wish (if possible) at the syntactic level, and then interpret the result.

Refutation completeness

A formal system S is refutation-complete if it is able to derive false from every unsatisfiable set of formulas. That is,

\begin{gather*} \Gamma \models_{S} \bot \to \Gamma \vdash_{S} \bot. \end{gather*}

The system used by Prolog—specifically, selective linear definite clause resolution (SLD resolution) on Horn clauses—is refutation-complete.

Resolution

We require a little more machinery to make our proof system complete. The machinery we need is a proof technique called resolution.

The term resolution in logic refers to a mechanical method for proving statements in first-order logic. It is applied to two clauses in a sentence, and by unification, it eliminates a literal that occurs positive in one clause and negative in the other. A literal stated in the antecedent of an implication is negative because an implication P \Rightarrow Q is equivalent to \neg P \lor Q.

\begin{gather} (\text{human}(X) \Rightarrow \text{mortal}(X)) \land \text{human}(\text{socrates}) \\ (\text{human}(\text{socrates}) \Rightarrow \text{mortal}(\text{socrates})) \land \text{human}(\text{socrates}) \\ (\neg \text{human}(\text{socrates}) \lor \text{mortal}(\text{socrates})) \land \text{human}(\text{socrates}) \\ (\text{human}(\text{socrates}) \land \neg \text{human}(\text{socrates})) \lor (\text{human}(\text{socrates}) \land \text{mortal}(\text{socrates})) \\ \text{false} \lor (\text{human}(\text{socrates}) \land \text{mortal}(\text{socrates})) \\ \text{human}(\text{socrates}) \land \text{mortal}(\text{socrates}) \end{gather}

Is our starting point. We have (2) by unification of X and \text{socrates}. We have (3) by A \Rightarrow B \Longleftrightarrow \neg A \lor B. Using distribution, that is (A \lor B) \land C \Longleftrightarrow (A \lor C) \land (A \lor B), we arrive at (4). Since (A \lor \neg A) is false, we have (5).
follows trivially.

Through this resolution process, we proved \text{mortal}(\text{socrates}).

Let’s see how to generalize this approach. Earlier we saw modus ponens. Now we will consider another rule of logic, the resolution rule.

\dfrac{A \lor B \qquad \neg B \lor C}{A \lor C}

Clearly if A \lor B is true, and \neg B \lor C is true, then we must have A \lor C. We call the derived clause the resolvent. Now, recall from the definition of implication that we have

\begin{gather*} \neg B \lor C \quad \Longleftrightarrow \quad B \Rightarrow C \end{gather*}

Thus, by substitution we have

\dfrac{A \lor B \qquad B \Rightarrow C}{A \lor C}

This is a generalization of modus ponens.

We can generalize this further by adding any number of literals to the clauses in the resolution rule. Why can we do this? First, because any WFF can be rewritten in what is called conjunctive normal form (abbreviated CNF). A WFF is in conjunctive normal form if it consists of one or more clauses joined by conjunction (logical AND) where each clause itself is a disjunction of literals. For example,

\begin{gather*} (A_1 \lor A_2 \lor \ldots \lor A_n) \land (B_1 \lor B_2 \lor \ldots \lor B_n) \end{gather*}

In the example above, there are two clauses, each within parentheses, which are disjunctions of literals, and these clauses are connected by conjunction. We are not limited to just two clauses—we may have some arbitrary number of clauses—the only constraint of CNF is that each clause within parentheses is disjunctive, and that these clauses are connected by conjunction.

This gives us a more general form of the resolution rule (keep in mind that the clauses above the line are conjoined with logical AND):

\dfrac{(A_1 \lor \ldots \lor A_m \lor B) \qquad (\neg B \lor C_1 \lor \ldots \lor C_n)}{A_1 \lor \ldots \lor A_m \lor C_1 \lor \ldots \lor C_n}

The clause produced by a resolution rule is sometimes called a resolvent.

While the resolution rule can be traced to Davis and Putnam (1960), it was Robinson who made application of the resolution rule computationally efficient. What made this possible was the development of the syntactical unification algorithm. It is this algorithm that prevents combinatorial explosion during computation.

We’ll also see how computation is facilitated by the use of Horn clauses.

The general algorithm for a resolution proof:

Negate what it is you wish to prove (the original theorem).
Add that negation to the knowledge base (or theory).
Convert the knowledge base into conjunctive normal form.
Repeat until there is no resolvable pair of clauses:
1. Find resolvable clauses and resolve them.
2. Add the results of resolution to the knowledge base.
3. If an empty clause is produced, halt and report true.
Report that the original theorem is false.

No generative AI was used in producing this material. This was written the old-fashioned way.

Footnotes

In fact, the problem of entailment is co-NP-complete, which means that it is in co-NP and every other problem in co-NP can be reduced to a problem of entailment in polynomial time. co-NP is the set of all decision problems whose complement is in NP—that is, only “no”-instances have a certificate which can be verified in polynomial time.↩︎
This is Latin for “the method of putting by placing.”↩︎
Latin for “the method of removal.”↩︎
An interesting side-note: adherents of intuitionism in mathematics and philosophy don’t generally accept proof by contradiction, since it hinges on the Law of the Excluded Middle—that a proposition must be either true or false, and there is no other “middle” value. Without the Law of the Excluded Middle, most proofs by contradiction fail.↩︎

Reuse

CC BY-NC-SA 4.0