Computer Vision (Deep Learning)

Deep-learning approaches to vision tasks.


foundation tier

Computer Vision (Deep Learning) addresses deep-learning approaches to vision tasks. It sits within AI and Machine Learning and inherits that area’s core questions about correctness, scale, and tractability. This page surveys the conceptual axes of the topic and points to the references that frame ongoing research and teaching. The intent is to be useful both as an entry point for newcomers and as an index for practitioners cross-checking their mental model against the field’s primary sources.

Work on computer vision (deep learning) can be organised around a few interlocking concerns: the formal objects under study, the algorithms or systems that compute over them, the resource trade-offs (time, memory, communication, statistical efficiency), and the empirical or theoretical guarantees that practitioners rely on. The sources cited below approach the topic from a mix of these angles.

Foundational references

Szeliski, Computer Vision: Algorithms and Applications (2022) is a standard reference for this material and is used both as a curriculum anchor and as a long-form survey of techniques.

Historical context

Deep Residual Learning for Image Recognition (He, 2016) situates the topic in its historical trajectory; revisiting it clarifies which ideas in current practice are recent and which trace back to the field’s founding texts. ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky, 2012) situates the topic in its historical trajectory; revisiting it clarifies which ideas in current practice are recent and which trace back to the field’s founding texts.

Open methodological questions in computer vision (deep learning) cluster around how to compose the techniques above under realistic constraints — scale, adversarial inputs, partial observability, and shifting workloads. The cited references give the precise statements, proofs, and empirical evaluations that this overview only sketches; downstream topic pages drill into specific subfields.

Prerequisites

Sources

In context

Where this topic sits in the prerequisite graph. Click any node to jump.

Open in full atlas →

Reviewed by

Explore

  1. 01

    Image Classification

    Deep classifiers from AlexNet to modern ConvNets.

  2. 02

    Neural Scene Representations

    Representing 3D scenes as continuous neural functions — radiance fields, signed distance fields, and 3D Gaussian splats — that can be optimised from multi-view images and rendered from novel viewpoints.

  3. 03

    Diffusion Priors for 3D Generation

    Repurposing pretrained 2D image diffusion models as priors for 3D content creation — text-to-3D via score distillation, single-image-to-3D, and instruction-driven scene editing without large 3D training datasets.

  4. 04

    Object Detection

    Two-stage and single-stage detectors, DETR, and modern detection.

  5. 05

    Semantic Segmentation

    Per-pixel classification with FCNs, U-Net, and DeepLab.

  6. 06

    Vision Transformers

    Adapting transformer architectures to images and 3D inputs — multi-scale attention, mask-transformer decoders for dense prediction, and the ConvNet-versus-transformer debate at scale.

  7. 07

    Instance Segmentation

    Mask R-CNN and modern instance-segmentation models.

  8. 08

    Vision Foundation Models

    Vision systems trained once at scale and applied to novel tasks, objects, or domains without retraining — generalising the language-model recipe of pretrain-once-prompt-many to dense vision problems.

  9. 09

    Panoptic Segmentation

    Unified instance and semantic segmentation.

  10. 10

    Depth Estimation

    Monocular and learned multi-view depth estimation.

  11. 11

    Pose Estimation

    Human and object pose estimation in 2D and 3D.

  12. 12

    Action Recognition

    Recognizing actions in video using 3D CNNs and transformers.

  13. 13

    Video Understanding

    Temporal modeling, video captioning, and video question answering.

  14. 14

    Object Tracking

    Single- and multi-object tracking in video.

  15. 15

    3D Reconstruction

    Learning-based 3D reconstruction from images and video.

  16. 16

    3D Gaussian Splatting

    Explicit 3D Gaussian representations for real-time rendering.

  17. 17

    Image Restoration

    Denoising, super-resolution, and inpainting with deep models.

  18. 18

    Super-Resolution

    Learning-based image and video upscaling.

  19. 19

    Image Generation Models

    Deep generative models for image synthesis.

  20. 20

    Face Recognition

    Deep face embeddings and identity verification.

  21. 21

    Medical Image Analysis

    Deep learning for radiology, pathology, and medical imaging.

  22. 22

    Remote Sensing

    ML for satellite and aerial imagery.

  23. 23

    Document Analysis

    OCR, layout analysis, and document understanding.

  24. 24

    Self-Supervised Vision

    Pretext tasks and contrastive pretraining for vision.

  25. 25

    Adversarial Examples in Vision

    Attacks and defenses on image classifiers and detectors.

  26. 26

    Event-Based Vision

    Algorithms for event cameras and asynchronous sensors.


Review this topic

This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.