Computer Vision (Deep Learning)
Deep-learning approaches to vision tasks.
Computer Vision (Deep Learning) addresses deep-learning approaches to vision tasks. It sits within AI and Machine Learning and inherits that area’s core questions about correctness, scale, and tractability. This page surveys the conceptual axes of the topic and points to the references that frame ongoing research and teaching. The intent is to be useful both as an entry point for newcomers and as an index for practitioners cross-checking their mental model against the field’s primary sources.
Work on computer vision (deep learning) can be organised around a few interlocking concerns: the formal objects under study, the algorithms or systems that compute over them, the resource trade-offs (time, memory, communication, statistical efficiency), and the empirical or theoretical guarantees that practitioners rely on. The sources cited below approach the topic from a mix of these angles.
Foundational references
Szeliski, Computer Vision: Algorithms and Applications (2022) is a standard reference for this material and is used both as a curriculum anchor and as a long-form survey of techniques.
Historical context
Deep Residual Learning for Image Recognition (He, 2016) situates the topic in its historical trajectory; revisiting it clarifies which ideas in current practice are recent and which trace back to the field’s founding texts. ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky, 2012) situates the topic in its historical trajectory; revisiting it clarifies which ideas in current practice are recent and which trace back to the field’s founding texts.
Open methodological questions in computer vision (deep learning) cluster around how to compose the techniques above under realistic constraints — scale, adversarial inputs, partial observability, and shifting workloads. The cited references give the precise statements, proofs, and empirical evaluations that this overview only sketches; downstream topic pages drill into specific subfields.
Prerequisites
Sources
-
-
- paper · historical · 2012krizhevsky-2012
In context
Where this topic sits in the prerequisite graph. Click any node to jump.
Reviewed by
Explore
- 01
Image Classification
Deep classifiers from AlexNet to modern ConvNets.
- 02
Neural Scene Representations
Representing 3D scenes as continuous neural functions — radiance fields, signed distance fields, and 3D Gaussian splats — that can be optimised from multi-view images and rendered from novel viewpoints.
- 03
Diffusion Priors for 3D Generation
Repurposing pretrained 2D image diffusion models as priors for 3D content creation — text-to-3D via score distillation, single-image-to-3D, and instruction-driven scene editing without large 3D training datasets.
- 04
Object Detection
Two-stage and single-stage detectors, DETR, and modern detection.
- 05
Semantic Segmentation
Per-pixel classification with FCNs, U-Net, and DeepLab.
- 06
Vision Transformers
Adapting transformer architectures to images and 3D inputs — multi-scale attention, mask-transformer decoders for dense prediction, and the ConvNet-versus-transformer debate at scale.
- 07
Instance Segmentation
Mask R-CNN and modern instance-segmentation models.
- 08
Vision Foundation Models
Vision systems trained once at scale and applied to novel tasks, objects, or domains without retraining — generalising the language-model recipe of pretrain-once-prompt-many to dense vision problems.
- 09
Panoptic Segmentation
Unified instance and semantic segmentation.
- 10
Depth Estimation
Monocular and learned multi-view depth estimation.
- 11
Pose Estimation
Human and object pose estimation in 2D and 3D.
- 12
Action Recognition
Recognizing actions in video using 3D CNNs and transformers.
- 13
Video Understanding
Temporal modeling, video captioning, and video question answering.
- 14
Object Tracking
Single- and multi-object tracking in video.
- 15
3D Reconstruction
Learning-based 3D reconstruction from images and video.
- 16
3D Gaussian Splatting
Explicit 3D Gaussian representations for real-time rendering.
- 17
Image Restoration
Denoising, super-resolution, and inpainting with deep models.
- 18
Super-Resolution
Learning-based image and video upscaling.
- 19
Image Generation Models
Deep generative models for image synthesis.
- 20
Face Recognition
Deep face embeddings and identity verification.
- 21
Medical Image Analysis
Deep learning for radiology, pathology, and medical imaging.
- 22
Remote Sensing
ML for satellite and aerial imagery.
- 23
Document Analysis
OCR, layout analysis, and document understanding.
- 24
Self-Supervised Vision
Pretext tasks and contrastive pretraining for vision.
- 25
Adversarial Examples in Vision
Attacks and defenses on image classifiers and detectors.
- 26
Event-Based Vision
Algorithms for event cameras and asynchronous sensors.
Review this topic
This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.