Cybersecurity

The protection of computer systems and networks — threat modeling, access control, network security, and defensive techniques.


Cybersecurity is the discipline concerned with protecting computer systems, networks, and data from unauthorized access, disruption, and destruction. It sits at the intersection of mathematics, systems engineering, and adversarial reasoning: defenders must anticipate every possible avenue of attack, while an attacker needs to find only one overlooked weakness. This asymmetry gives the field its distinctive character and makes it one of the most consequential areas of modern computer science. From the earliest multi-user time-sharing systems of the 1960s to the global internet infrastructure of today, every expansion in connectivity has brought a corresponding expansion in the attack surface that must be defended.

Security Fundamentals and Threat Models

The foundation of cybersecurity rests on three properties known collectively as the CIA triad: confidentiality, integrity, and availability. Confidentiality means that information is accessible only to those authorized to see it. Integrity means that information has not been tampered with or altered by unauthorized parties. Availability means that systems and data remain accessible to legitimate users when needed. These three properties are not always simultaneously achievable in full measure — designing a secure system often involves trading one against another — but they provide the conceptual vocabulary for reasoning about what “security” means in any given context.

Two additional properties complement the triad. Authentication is the process of verifying that an entity — a user, a device, a piece of software — is who or what it claims to be. Authorization determines what an authenticated entity is permitted to do. A system that authenticates users but fails to enforce proper authorization boundaries is vulnerable to privilege escalation attacks, where an attacker with legitimate low-level access obtains higher privileges than intended.

The concept of defense in depth holds that no single security mechanism should be trusted in isolation. Instead, multiple overlapping layers of defense should be deployed so that the failure of any one layer does not compromise the entire system. This principle, borrowed from military strategy, pervades modern security architecture: a web application might rely on input validation, a web application firewall, network segmentation, database access controls, and encrypted storage, each independently capable of stopping certain classes of attack.

To reason systematically about what can go wrong, security practitioners use threat modeling — the process of identifying assets worth protecting, enumerating the threats they face, and prioritizing defenses accordingly. The STRIDE framework, developed at Microsoft in the late 1990s, classifies threats into six categories: Spoofing (impersonating another entity), Tampering (modifying data or code), Repudiation (denying having performed an action), Information disclosure (exposing data to unauthorized parties), Denial of service (making a resource unavailable), and Elevation of privilege (gaining unauthorized access). Each category maps naturally to a security property and suggests specific countermeasures. More formal approaches use attack trees, a technique introduced by Bruce Schneier in 1999, which represent the possible paths an attacker might take as a tree structure rooted at the attacker’s goal, with branches representing alternative or sequential steps. Attack trees allow defenders to reason about the cost, difficulty, and likelihood of different attack scenarios and to allocate defensive resources where they matter most.

A vulnerability is a weakness in a system that can be exploited. A threat is a potential event that exploits a vulnerability. An exploit is the actual mechanism — a piece of code, a social engineering technique, a physical action — that carries out the threat. The distinction matters: a vulnerability exists whether or not anyone knows about it, a threat exists whether or not anyone acts on it, and an exploit is the realization of both. Zero-day vulnerabilities — flaws that are unknown to the vendor and for which no patch exists — are particularly dangerous because they offer attackers a window of opportunity during which no defense is possible except generic mitigations like defense in depth.

Cryptography Basics

Cryptography provides the mathematical toolkit for enforcing confidentiality, integrity, and authentication. The field divides into two broad paradigms: symmetric-key cryptography, where the same secret key is used for both encryption and decryption, and public-key cryptography, where a pair of mathematically related keys — one public, one private — serve complementary roles.

In symmetric-key systems, the central challenge is key distribution: both parties must share the same secret before they can communicate securely. Stream ciphers encrypt data one bit or byte at a time by combining the plaintext with a pseudorandom keystream generated from the key. Block ciphers encrypt fixed-size blocks of data — typically 128 bits — using a series of substitution and permutation operations controlled by the key. The Data Encryption Standard (DES), adopted by the U.S. National Bureau of Standards in 1977, was the first widely deployed block cipher, but its 56-bit key length proved insufficient against brute-force search as computing power grew. The Advanced Encryption Standard (AES), selected by NIST in 2001 after a public competition won by the Rijndael algorithm designed by Joan Daemen and Vincent Rijmen, uses 128-bit blocks and supports key lengths of 128, 192, or 256 bits. AES operates through a sequence of rounds, each consisting of four transformations — SubBytes, ShiftRows, MixColumns, and AddRoundKey — that together provide both confusion (obscuring the relationship between key and ciphertext) and diffusion (spreading plaintext statistics across the ciphertext).

A block cipher on its own encrypts only a single fixed-size block. To encrypt messages of arbitrary length, it must be combined with a mode of operation. The simplest mode, Electronic Codebook (ECB), encrypts each block independently, but this preserves patterns in the plaintext and is generally insecure. Cipher Block Chaining (CBC) addresses this by XORing each plaintext block with the previous ciphertext block before encryption, so identical plaintext blocks produce different ciphertext blocks. Counter (CTR) mode turns the block cipher into a stream cipher by encrypting successive counter values and XORing the results with the plaintext, enabling parallelization. Galois/Counter Mode (GCM) adds authentication to CTR mode, producing both a ciphertext and an authentication tag that detects any tampering — a property called authenticated encryption.

Public-key cryptography, invented independently by Whitfield Diffie and Martin Hellman in 1976 and by Ralph Merkle around the same time, resolved the key distribution problem by allowing two parties to establish a shared secret over an insecure channel. The Diffie-Hellman key exchange relies on the computational difficulty of the discrete logarithm problem: given a prime pp, a generator gg, and the value gamodpg^a \bmod p, it is computationally infeasible to recover aa. The RSA algorithm, published by Ron Rivest, Adi Shamir, and Leonard Adleman in 1978, relies instead on the difficulty of factoring large integers: given n=pqn = pq where pp and qq are large primes, computing pp and qq from nn alone is believed to require time that grows exponentially with the size of nn. RSA enables both encryption and digital signatures. Elliptic Curve Cryptography (ECC), developed in the 1980s by Neal Koblitz and Victor Miller, achieves equivalent security to RSA with much shorter keys by working over the group of points on an elliptic curve rather than the integers modulo nn.

Cryptographic hash functions complete the toolkit. A hash function HH maps an input of arbitrary length to a fixed-length output (the digest) and must satisfy three properties: preimage resistance (given hh, it is hard to find mm such that H(m)=hH(m) = h), second preimage resistance (given m1m_1, it is hard to find m2m1m_2 \neq m_1 such that H(m1)=H(m2)H(m_1) = H(m_2)), and collision resistance (it is hard to find any m1m2m_1 \neq m_2 such that H(m1)=H(m2)H(m_1) = H(m_2)). The SHA-256 algorithm, part of the SHA-2 family standardized by NIST, produces 256-bit digests and remains widely used. Hash functions underpin digital signatures, message authentication codes (MACs), key derivation functions, and the integrity verification mechanisms of protocols like TLS.

Authentication and Access Control

Authentication answers the question “who are you?” and access control answers “what are you allowed to do?” Together, they form the gatekeeping layer that stands between users and the resources they wish to access.

Password-based authentication remains the most common mechanism despite its well-documented weaknesses. The security of a password depends on its entropy — the number of bits of randomness it contains. A truly random 8-character password drawn from 95 printable ASCII characters has roughly log2(958)52.6\log_2(95^8) \approx 52.6 bits of entropy, but human-chosen passwords typically have far less because people favor common words, predictable patterns, and personal information. Passwords should never be stored in plaintext; instead, they are processed through a key derivation function like bcrypt or Argon2 that applies a slow, memory-hard hash together with a random salt — a unique value concatenated with the password before hashing, ensuring that two users with the same password produce different stored hashes. This combination defends against precomputed lookup tables (rainbow tables) and makes brute-force attacks computationally expensive.

Multi-factor authentication (MFA) strengthens authentication by requiring evidence from two or more independent categories: something you know (a password), something you have (a hardware token or mobile device), and something you are (a biometric measurement). The combination dramatically reduces the probability of compromise because an attacker must defeat multiple independent mechanisms. Hardware security keys using the FIDO2/WebAuthn protocol are particularly resistant to phishing because the authentication is bound to the specific origin (website) requesting it — a fake site cannot relay the challenge to the real one.

On the authorization side, several models structure how permissions are assigned. Discretionary Access Control (DAC) lets the owner of a resource decide who may access it — the Unix file permission system is a classic example. Mandatory Access Control (MAC) assigns security labels to both subjects and objects and enforces access rules that even the resource owner cannot override, as in the Bell-LaPadula model (formalized in 1973) which enforces “no read up, no write down” for confidentiality, and the Biba model which addresses integrity. Role-Based Access Control (RBAC) assigns permissions to roles rather than individuals, and users inherit permissions by being assigned to roles — a model that scales well to large organizations. Attribute-Based Access Control (ABAC) generalizes further by evaluating access decisions based on arbitrary attributes of the subject, the resource, and the environment, enabling fine-grained policies like “engineers in the London office may access staging databases during business hours.”

Identity federation protocols allow a single authentication event to grant access across multiple systems. Kerberos, developed at MIT in the 1980s, uses a trusted third-party ticket-granting server to authenticate users without transmitting passwords over the network. OAuth 2.0 and OpenID Connect enable delegated authorization and authentication across web services, allowing users to log in to one application using credentials managed by another. SAML (Security Assertion Markup Language) serves a similar purpose in enterprise environments.

Web and Application Security

The web application layer is among the most attacked surfaces in modern computing, because web applications are publicly accessible, handle sensitive data, and are written by developers with varying levels of security expertise. The OWASP Top 10, published by the Open Web Application Security Project, catalogs the most critical categories of web application vulnerabilities and has become a de facto standard for web security awareness.

Injection attacks occur when untrusted input is incorporated into a command or query without proper sanitization. SQL injection, first described publicly in the late 1990s, allows an attacker to manipulate database queries by inserting malicious SQL fragments through input fields. If a login form constructs a query by concatenating a username directly into a SQL string, an attacker who enters a specially crafted value can bypass authentication, extract data, or even modify the database. The defense is straightforward in principle: use parameterized queries (also called prepared statements), which separate the query structure from the data, making injection impossible regardless of what the user enters. Despite this well-known mitigation, injection vulnerabilities persist in practice because developers continue to build queries through string concatenation. Command injection, template injection, and LDAP injection follow the same fundamental pattern: wherever untrusted input meets an interpreter, injection is possible.

Cross-Site Scripting (XSS) allows an attacker to inject malicious scripts into web pages viewed by other users. In reflected XSS, the malicious script is embedded in a URL parameter and reflected back to the user in the server’s response. In stored XSS, the script is saved on the server (in a database, a comment field, a forum post) and delivered to every user who views the affected page. In DOM-based XSS, the vulnerability exists entirely in client-side code that processes user input insecurely. XSS can be used to steal session cookies, redirect users to malicious sites, or perform actions on behalf of the victim. Defenses include output encoding (escaping special characters before inserting user-supplied data into HTML), the Content Security Policy (CSP) header (which restricts which scripts the browser will execute), and the same-origin policy (which prevents scripts from one origin from accessing resources belonging to another).

Cross-Site Request Forgery (CSRF) exploits the browser’s automatic inclusion of cookies in requests to a site the user is already authenticated with. An attacker crafts a request — hidden in an image tag, a form submission, or a script — that performs an action on a target site and tricks the authenticated user into triggering it. Defenses include anti-CSRF tokens (unique, unpredictable values embedded in forms and validated on the server), SameSite cookies (which restrict when cookies are sent with cross-site requests), and requiring re-authentication for sensitive operations. Secure web development also encompasses proper use of HTTP security headers: HTTP Strict Transport Security (HSTS) instructs browsers to communicate only over HTTPS, while the X-Frame-Options header prevents clickjacking by blocking the page from being embedded in an iframe.

Network Security and Cryptographic Protocols

Network security operates at the infrastructure level, controlling the flow of traffic between systems and detecting or preventing malicious activity in transit.

Firewalls are the oldest and most fundamental network security mechanism. A stateless firewall (packet filter) examines each packet in isolation, making allow-or-deny decisions based on source and destination addresses, ports, and protocol fields. A stateful firewall tracks the state of active connections, enabling it to distinguish between a legitimate response to an outgoing request and an unsolicited inbound packet. Application-layer firewalls inspect traffic at the application level, understanding protocol-specific semantics and capable of detecting attacks that appear benign at the network layer. Web Application Firewalls (WAFs) specialize further, analyzing HTTP traffic for patterns characteristic of SQL injection, XSS, and other web-specific attacks.

Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) monitor network traffic for signs of malicious activity. Signature-based detection matches traffic against a database of known attack patterns — effective against known threats but blind to novel attacks. Anomaly-based detection builds a statistical model of normal network behavior and flags deviations — capable of detecting previously unseen attacks but prone to false positives. Modern systems often combine both approaches and integrate threat intelligence feeds that provide up-to-date indicators of compromise.

Distributed Denial of Service (DDoS) attacks attempt to overwhelm a target with traffic volume, protocol-level resource exhaustion, or application-layer request floods. Amplification attacks exploit protocols like DNS or NTP that produce large responses to small queries: an attacker sends a small request with a spoofed source address (the victim’s), and the responding server delivers a much larger payload to the victim. Defenses include rate limiting, traffic scrubbing (filtering malicious traffic at upstream providers), geographic distribution of infrastructure, and anycast routing, which distributes incoming traffic across multiple geographically dispersed servers.

TLS (Transport Layer Security) secures the vast majority of web traffic. The TLS handshake negotiates a cipher suite (specifying algorithms for key exchange, authentication, encryption, and integrity), authenticates the server via X.509 certificates issued by trusted Certificate Authorities (CAs), and derives session keys for encrypting the subsequent data exchange. TLS 1.3, finalized in 2018, simplified the handshake to a single round trip, removed support for obsolete cipher suites, and mandated perfect forward secrecy — ensuring that compromise of long-term keys does not reveal past session traffic. Historical vulnerabilities in TLS implementations, including Heartbleed (a buffer over-read in OpenSSL that leaked server memory contents, disclosed in 2014) and various padding oracle attacks, underscore that even well-designed protocols can be undermined by implementation flaws.

Network segmentation divides a network into isolated zones to contain breaches. A DMZ (demilitarized zone) places public-facing servers in a separate segment with restricted access to internal systems. Micro-segmentation enforces access controls between individual workloads. Zero trust architecture, a paradigm that gained prominence in the 2010s, rejects the traditional notion of a trusted internal network and instead requires continuous verification of every access request, regardless of the requester’s network location — embodying the principle “never trust, always verify.”

System-Level Vulnerabilities and Malware

At the system level, security confronts the consequences of how software interacts with hardware, particularly memory. Buffer overflow vulnerabilities arise when a program writes data beyond the bounds of an allocated memory region. In a stack-based buffer overflow, an attacker overwrites the return address stored on the call stack, redirecting execution to injected malicious code (shellcode) or to a chain of existing code fragments known as Return-Oriented Programming (ROP). Heap-based overflows corrupt dynamically allocated memory structures with similar effect. These vulnerabilities have been a primary vector for remote code execution exploits since the earliest internet worms — the Morris Worm of 1988 exploited a buffer overflow in the Unix finger daemon.

Modern operating systems deploy multiple countermeasures. Address Space Layout Randomization (ASLR) randomizes the base addresses of the stack, heap, and loaded libraries at each program execution, making it difficult for attackers to predict the memory locations they need to target. Data Execution Prevention (DEP), also known as the NX (No-Execute) bit, marks memory pages as non-executable, preventing injected code from running even if an attacker manages to write it into memory. Stack canaries — small random values placed between local variables and the return address, introduced in the StackGuard system by Crispin Cowan in 1998 — detect overflows by checking whether the canary value has been altered before a function returns. Control Flow Integrity (CFI) instruments programs to verify at runtime that control flow transfers target only legitimate destinations. No single mitigation is sufficient on its own — attackers have developed techniques to bypass each individually — but in combination they raise the cost and complexity of exploitation significantly.

Malware — malicious software — takes many forms. Viruses attach themselves to legitimate programs and propagate when those programs are executed. Worms spread autonomously across networks without requiring user action. Trojans disguise themselves as legitimate software while performing covert malicious actions. Ransomware encrypts the victim’s files and demands payment for the decryption key, a category that has caused billions of dollars in damage since it rose to prominence in the mid-2010s. Rootkits conceal their presence by modifying the operating system itself, hiding processes, files, and network connections from detection tools. Malware analysis combines static analysis (examining the binary without executing it, using disassemblers like Ghidra or IDA Pro) with dynamic analysis (running the malware in an isolated sandbox and observing its behavior), and increasingly leverages machine learning to identify malicious patterns that evade signature-based detection.

Privacy, Cryptanalysis, and Emerging Threats

Privacy extends beyond preventing unauthorized access to encompass the right of individuals to control how their personal information is collected, used, and shared. The principle of data minimization holds that systems should collect only the data strictly necessary for their stated purpose. Privacy by design, a framework articulated by Ann Cavoukian in the 1990s, embeds privacy protections into the architecture of systems from the outset rather than adding them as afterthoughts.

Several mathematical techniques enable useful computation on data while preserving individual privacy. K-anonymity ensures that each record in a dataset is indistinguishable from at least k1k - 1 other records with respect to certain identifying attributes. Differential privacy, formalized by Cynthia Dwork in 2006, provides a rigorous mathematical guarantee: a randomized mechanism M\mathcal{M} satisfies ε\varepsilon-differential privacy if for all datasets D1D_1 and D2D_2 differing in a single record, and for all possible outputs SS:

Pr[M(D1)S]eεPr[M(D2)S]\Pr[\mathcal{M}(D_1) \in S] \leq e^{\varepsilon} \cdot \Pr[\mathcal{M}(D_2) \in S]

The parameter ε\varepsilon controls the privacy-utility tradeoff: smaller ε\varepsilon provides stronger privacy but noisier results. Homomorphic encryption allows computation on encrypted data without decrypting it, and secure multi-party computation (SMPC) enables multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other. On the regulatory side, the General Data Protection Regulation (GDPR), enacted by the European Union in 2018, established sweeping requirements for data protection including mandatory breach notification and the right to erasure. The California Consumer Privacy Act (CCPA), HIPAA, and PCI DSS impose additional sector-specific requirements.

Cryptanalysis — the study of breaking cryptographic systems — has a history as long as cryptography itself. Classical techniques include frequency analysis, known since the work of the Arab polymath Al-Kindi in the 9th century. Modern methods are far more sophisticated: differential cryptanalysis, developed by Eli Biham and Adi Shamir in the late 1980s, studies how differences in input pairs propagate through cipher rounds, while linear cryptanalysis, introduced by Mitsuru Matsui in 1993, approximates nonlinear operations with linear equations and exploits any resulting bias. Side-channel attacks bypass mathematical security entirely by exploiting physical implementation details — timing variations, power consumption, electromagnetic emissions, or CPU cache behavior. The Spectre and Meltdown vulnerabilities, disclosed in 2018, demonstrated that speculative execution in modern processors could leak data across security boundaries through cache timing side channels, affecting virtually every major CPU architecture.

The most profound long-term threat comes from quantum computing. Peter Shor demonstrated in 1994 that a sufficiently powerful quantum computer could factor large integers and compute discrete logarithms in polynomial time using Shor’s algorithm, which would render RSA, Diffie-Hellman, and ECC insecure. While no such computer exists at the required scale today, the threat is taken seriously because encrypted data intercepted now could be stored and decrypted in the future — a strategy known as “harvest now, decrypt later.” Post-quantum cryptography aims to develop algorithms secure against both classical and quantum computers. NIST launched a standardization process in 2016 and selected its first post-quantum algorithms in 2022, based primarily on lattice-based cryptography (whose security relies on the hardness of problems like Learning With Errors) and hash-based signatures. The transition to post-quantum cryptography is one of the most significant infrastructure challenges facing the field, alongside the proliferation of IoT devices, the rise of supply chain attacks that compromise software during development, and the emergence of adversarial machine learning that threatens the reliability of AI systems deployed in security-critical roles. Cybersecurity, fundamentally, is not a problem to be solved but a condition to be managed — an ongoing contest between attack and defense that will continue as long as information systems exist.