Artificial Intelligence

The broad field of creating intelligent agents — search, planning, knowledge representation, reasoning, and modern AI systems.


Artificial intelligence is the study of how to build systems that perceive their environment, reason about it, and act to achieve goals. It encompasses the classical problems of search, planning, and knowledge representation as well as the modern synthesis of learning-based and symbolic methods. Where Machine Learning focuses on algorithms that improve through data, AI asks the broader question of how to create agents that behave intelligently in complex, uncertain, and adversarial worlds.

Foundations and Historical Arc

The intellectual roots of artificial intelligence stretch back centuries — to Ramon Llull’s combinatorial logic machines in the thirteenth century, to Leibniz’s vision of a calculus ratiocinator, and to Charles Babbage and Ada Lovelace’s nineteenth-century explorations of mechanical computation. But the field as a scientific discipline was born at the Dartmouth workshop of 1956, where John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon proposed that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” The name “artificial intelligence” was McCarthy’s coinage.

The early decades were dominated by symbolic AI — programs that manipulated formal representations of knowledge. Allen Newell and Herbert Simon built the Logic Theorist (1956) and the General Problem Solver (1957), demonstrating that machines could prove mathematical theorems and solve puzzles by searching through symbolic state spaces. The 1960s and 1970s saw rapid progress in game playing, theorem proving, and natural language understanding, but also growing recognition of the difficulty of common-sense reasoning and the brittleness of hand-crafted rules.

The field experienced two major AI winters — periods of reduced funding and diminished expectations. The first, in the mid-1970s, followed the Lighthill Report in the United Kingdom and the failure of early machine translation efforts. The second, in the late 1980s, came after the collapse of the expert-systems market. These winters were not failures of the underlying ideas but corrections of inflated expectations. Each was followed by a resurgence driven by new methods: probabilistic reasoning in the 1990s, statistical machine learning in the 2000s, and deep learning from 2012 onward.

Alan Turing’s 1950 paper Computing Machinery and Intelligence introduced the Turing test — a criterion for machine intelligence based on whether a human interrogator can distinguish a machine from a person in a text conversation. The test remains a cultural touchstone, though the field has largely moved beyond it as a serious benchmark. The distinction between strong AI (also called artificial general intelligence, or AGI — a system with human-level understanding across all domains) and weak AI (also called narrow AI — a system that excels at a specific task) frames much of the contemporary debate about the field’s trajectory and risks.

Search and Problem Solving

Many AI problems can be cast as search problems: given an initial state, a set of actions, a transition model, and a goal test, find a sequence of actions that leads from the initial state to a goal state. The formulation is powerful because it abstracts away domain-specific details, allowing general-purpose algorithms to be applied across chess, route planning, robot navigation, and theorem proving alike.

Uninformed search methods explore the state space without domain knowledge. Breadth-first search (BFS) expands nodes level by level, guaranteeing the shortest solution in terms of number of steps but consuming memory proportional to the branching factor raised to the solution depth. Depth-first search (DFS) follows each branch to its end before backtracking, using memory linear in the depth but offering no optimality guarantee. Iterative deepening depth-first search combines the strengths of both: it runs DFS with increasing depth limits, achieving BFS’s completeness and optimality with DFS’s linear memory. Uniform-cost search generalizes BFS to weighted graphs, expanding the node with the lowest path cost — it is equivalent to Dijkstra’s algorithm.

Informed (heuristic) search uses a heuristic function h(n)h(n) — an estimate of the cost from node nn to the nearest goal — to guide exploration. Greedy best-first search expands the node with the smallest h(n)h(n), which can be fast but is neither complete nor optimal. The A* algorithm, introduced by Peter Hart, Nils Nilsson, and Bertram Raphael in 1968, evaluates nodes by f(n)=g(n)+h(n)f(n) = g(n) + h(n), where g(n)g(n) is the actual cost from the start. A* is both complete and optimal provided the heuristic is admissible (never overestimates the true cost) and consistent (satisfies the triangle inequality h(n)c(n,n)+h(n)h(n) \leq c(n, n') + h(n')). The algorithm’s efficiency depends on the quality of the heuristic — a perfect heuristic would lead A* directly to the goal without exploring any unnecessary nodes, while a trivial heuristic h=0h = 0 reduces it to uniform-cost search.

Heuristic design is both an art and a science. Relaxation — simplifying the problem by removing constraints — is a systematic way to derive admissible heuristics. For instance, in the 15-puzzle, the Manhattan distance heuristic is derived by relaxing the constraint that tiles cannot pass through each other. Pattern databases precompute exact costs for subproblems and combine them, yielding much stronger heuristics at the cost of precomputation time and memory.

Adversarial Search and Game Playing

Games have been a proving ground for AI since the field’s inception. Two-player, zero-sum, perfect-information games — chess, checkers, Go — are naturally modeled as search through a game tree, where alternating layers represent the two players’ moves.

The minimax algorithm evaluates game positions by assuming both players play optimally: the maximizing player chooses the move with the highest value, and the minimizing player chooses the move with the lowest. In theory, minimax computes the game-theoretic value of every position, but in practice game trees are far too large to search exhaustively. Alpha-beta pruning, described by various researchers in the late 1950s and formalized by Donald Knuth and Ronald Moore in 1975, dramatically reduces the number of nodes evaluated by maintaining lower and upper bounds (α\alpha and β\beta) and cutting off branches that cannot influence the final decision. With perfect move ordering, alpha-beta reduces the effective branching factor from bb to approximately b\sqrt{b}, squaring the depth that can be searched in the same time.

For games where exhaustive search is impossible even with pruning, Monte Carlo tree search (MCTS) offers an alternative. MCTS repeatedly simulates random playouts from the current position, uses the results to build a search tree, and directs future simulations toward the most promising branches using the Upper Confidence Bound for Trees (UCT) formula. MCTS does not require a heuristic evaluation function — it relies entirely on simulation outcomes — and scales gracefully to enormous game trees. It was the key ingredient in AlphaGo (DeepMind, 2016), which combined MCTS with deep neural networks trained by reinforcement learning to defeat the world Go champion Lee Sedol, a milestone that many had predicted was decades away.

The history of AI game playing is a sequence of landmarks: Arthur Samuel’s checkers program in the 1950s; Deep Blue’s defeat of Garry Kasparov at chess in 1997 (relying on alpha-beta search with a handcrafted evaluation function and special-purpose hardware); and the AlphaGo lineage, culminating in AlphaZero (2017), which learned chess, shogi, and Go from scratch through self-play alone, with no domain knowledge beyond the rules of the game. Games with imperfect information, such as poker, require different techniques — Libratus and Pluribus used counterfactual regret minimization to achieve superhuman play in no-limit Texas Hold’em.

Constraint Satisfaction and Planning

Many real-world problems are more naturally expressed as constraint satisfaction problems (CSPs) than as path-search problems. A CSP is defined by a set of variables, each with a domain of possible values, and a set of constraints that restrict which combinations of values are allowed. Map coloring, scheduling, circuit layout, and Sudoku are all CSPs.

Backtracking search is the standard approach: it assigns values to variables one at a time and backtracks when a constraint is violated. Its efficiency depends critically on the order in which variables and values are tried. The minimum remaining values (MRV) heuristic selects the variable with the fewest legal values remaining — the one most likely to cause a failure, which allows the search to prune dead ends early. The least constraining value heuristic chooses the value that rules out the fewest options for neighboring variables, maximizing flexibility for future assignments.

Constraint propagation techniques reduce the search space before and during backtracking. Arc consistency (AC-3) enforces that for every value in a variable’s domain, there exists a consistent value in every neighboring variable’s domain. Forward checking is a simpler version that prunes domains only for variables connected to the most recently assigned variable. More powerful forms of consistency — path consistency, k-consistency — provide stronger pruning at greater computational cost.

Classical planning addresses problems where an agent must find a sequence of actions to achieve a goal from a known initial state, with deterministic actions. The STRIPS language (Richard Fikes and Nils Nilsson, 1971) represents actions by their preconditions and effects — lists of predicates that must hold before the action and that are added or deleted by it. PDDL (Planning Domain Definition Language) generalizes STRIPS and has become the standard formalism for planning competitions. Planning can be done as forward search from the initial state, backward search (goal regression) from the goal, or through specialized algorithms like GraphPlan (Avrim Blum and Merrick Furst, 1997), which builds a layered graph of propositions and actions and extracts a plan by backward search through it. Hierarchical task network (HTN) planning decomposes high-level tasks into subtasks recursively, capturing the kind of hierarchical structure that characterizes real-world plans.

Knowledge Representation and Reasoning

Intelligent behavior requires not just the ability to search and plan but also the ability to represent knowledge about the world and reason with it. Knowledge representation is the study of how to encode facts, rules, and relationships in a form that a computer can use for inference.

First-order logic (FOL) extends propositional logic with variables, quantifiers (\forall, \exists), predicates, and functions, making it powerful enough to express most mathematical and common-sense knowledge. Inference rules — most importantly modus ponens (from PP and PQP \to Q, derive QQ) and resolution (from APA \lor P and B¬PB \lor \neg P, derive ABA \lor B) — allow new facts to be derived from known ones. Resolution-based theorem provers, building on the work of J. Alan Robinson (1965), can in principle answer any first-order query, though the process is in general only semi-decidable.

Horn clauses — a restricted form of FOL where each clause contains at most one positive literal — form the basis of logic programming and the language Prolog. Horn clause inference is decidable and efficient, making it practical for expert systems, deductive databases, and rule-based AI systems that were central to AI in the 1970s and 1980s.

Knowledge graphs and semantic networks represent knowledge as directed graphs where nodes are entities and edges are relationships. Modern knowledge graphs — Wikidata, Google’s Knowledge Graph, DBpedia — contain billions of facts and power search engines, recommendation systems, and question-answering applications. Ontologies, formalized using description logics, provide a rigorous vocabulary for a domain, defining concepts, their properties, and the relationships among them. The Web Ontology Language (OWL) is the standard for encoding ontologies on the Semantic Web.

Non-monotonic reasoning addresses a limitation of classical logic: in classical logic, adding new information can never invalidate previously derived conclusions. But common-sense reasoning frequently requires defaults that can be overridden — “birds typically fly, but penguins do not.” Default logic (Raymond Reiter, 1980), circumscription (McCarthy, 1980), and negation as failure (used in Prolog) provide formal frameworks for reasoning with defaults, exceptions, and incomplete information.

Probabilistic Reasoning and Decision Making

The real world is pervasively uncertain, and deterministic logic alone is insufficient for intelligent behavior. Probabilistic reasoning extends AI systems to handle noise, ambiguity, and partial observability.

Bayesian networks, introduced by Judea Pearl in the 1980s, represent joint probability distributions over sets of variables using directed acyclic graphs. Each node stores a conditional probability table specifying the probability of its value given its parents. D-separation provides a graphical criterion for determining conditional independence, and the structure of the network encodes all the independence assumptions of the model compactly. Exact inference — computing posterior probabilities given evidence — can be performed by variable elimination or junction tree algorithms, but is NP-hard in general. Approximate inference methods, including likelihood weighting, Gibbs sampling, and particle filtering, trade exactness for tractability and are essential for large networks.

Hidden Markov models (HMMs) are a special case of Bayesian networks for temporal sequences, where a hidden state evolves according to Markov dynamics and each state emits an observation. The forward algorithm computes the likelihood of an observation sequence, the Viterbi algorithm finds the most likely state sequence, and the Baum-Welch algorithm (a special case of expectation-maximization) learns the model parameters from data. HMMs were the backbone of speech recognition for decades before being supplanted by deep learning.

Markov decision processes (MDPs) extend the probabilistic framework to sequential decision-making. At each time step, the agent observes a state, takes an action, receives a reward, and transitions to a new state according to a known probability distribution. Value iteration and policy iteration compute optimal policies when the model is known. Partially observable MDPs (POMDPs) generalize MDPs to settings where the agent cannot directly observe the state and must maintain a belief state — a probability distribution over possible states. POMDPs are vastly harder to solve but are essential for modeling realistic scenarios in robotics, dialogue systems, and medical decision-making. Utility theory provides the normative foundation, asserting that a rational agent should act to maximize expected utility — a principle that connects AI decision-making to economics and game theory.

Ethics, Safety, and the Path to General Intelligence

As AI systems grow more capable and pervasive, questions of ethics, fairness, transparency, and safety have moved from the margins to the center of the field. Algorithmic bias — the tendency of machine learning systems to perpetuate or amplify societal biases present in training data — has been documented in criminal justice, hiring, lending, and healthcare. Algorithmic fairness research formalizes competing definitions of fairness (demographic parity, equalized odds, individual fairness) and develops methods for measuring and mitigating bias.

Explainability addresses the “black box” problem: complex models, especially deep neural networks, can make highly accurate predictions without offering any human-understandable justification. Methods like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) provide post-hoc explanations for individual predictions, while attention visualization and concept-based explanations offer windows into what models have learned. The tension between accuracy and interpretability remains one of the field’s central tradeoffs.

AI safety and alignment research addresses the challenge of ensuring that increasingly powerful AI systems act in accordance with human values and intentions. The problem is subtle: specifying a reward function or objective that captures everything a human designer cares about is extremely difficult, and systems that optimize a misspecified objective can produce harmful outcomes — a phenomenon known as reward hacking. Reinforcement learning from human feedback (RLHF) and constitutional AI are recent approaches that attempt to align language models by training them on human preferences rather than hand-specified reward functions.

The question of artificial general intelligence (AGI) — whether and when machines will achieve human-level intelligence across all cognitive domains — remains one of the most debated topics in AI. Cognitive architectures like SOAR and ACT-R attempt to model the full breadth of human cognition in a unified framework. Neurosymbolic AI seeks to combine the pattern-recognition strengths of neural networks with the compositional reasoning of symbolic systems. And the rapid progress of large language models and multimodal systems has accelerated both the timeline estimates of optimists and the safety concerns of those who worry about the consequences of building systems more capable than their creators. Whatever the outcome, the trajectory from the Dartmouth workshop to the present makes clear that artificial intelligence is not merely a technical field but one of the defining intellectual enterprises of the modern era.