prep4place
Computer VisionUnit 1Introduction to Image Processing & Computer Vision

Introduction to Image Processing & Computer Vision

Interactive

Key differences between image processing and computer vision, applications, and image file formats.

Concept

Introduction to Image Processing

What is Digital Image Processing?

A digital image is a two-dimensional function f(x, y), where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point.

When x, y, and the amplitude values of f are all finite, discrete quantities, we call the image a digital image.

  • Input: An image (raw, noisy, low-contrast, etc.)
  • Processing: Mathematical operations on pixel values
  • Output: Either an improved image OR extracted features/characteristics

Digital image processing covers:

  1. Low-level processing – noise removal, contrast enhancement, sharpening
  2. Mid-level processing – segmentation, object description
  3. High-level processing – making sense of objects (computer vision)

What is Computer Vision?

Computer vision is a field of artificial intelligence that aims to replicate and extend human visual perception in machines. It enables computers to derive meaningful high-level information from digital images, video, or other visual inputs.

  • Input: An image or video stream
  • Output: Scene interpretation, object labels, decisions, 3D models
  • Core philosophy: "See and understand" rather than "see and transform"

Key Differences: Image Processing vs Computer Vision

AspectImage ProcessingComputer Vision
Primary GoalImprove image quality or extract low-level featuresUnderstand scene content, make decisions
InputDigital image/videoDigital image/video
OutputModified image or numerical feature dataHigh-level description, recognition, decisions
Level of AnalysisLow-level (pixel operations)High-level (semantic understanding)
Intelligence RequiredMinimal (rule-based math)High (ML/DL models, reasoning)
ExamplesNoise removal, contrast stretch, histogram EQObject detection, face recognition, scene understanding
TechniquesFilters, DFT, morphological ops, histogramsCNNs, RNNs, transformers, SLAM, stereo vision
Human InterpretationNot required (mathematical operations)Goal is to replicate human understanding
Processing SpeedGenerally fastCan be computationally expensive
Training DataNot typically neededOften requires large labeled datasets

Key insight: Image Processing is a tool used within Computer Vision. A typical pipeline is: Raw Image → Image Processing (enhance, denoise) → Computer Vision (detect, recognize, decide)


Applications of Image Processing

1. Medical Imaging

  • X-ray enhancement for better bone/tissue visibility
  • MRI and CT scan noise reduction
  • Tumor boundary detection
  • Mammography screening
  • Retinal image analysis for diabetic retinopathy

2. Remote Sensing & Satellite Imagery

  • Land cover classification
  • Vegetation index computation (NDVI)
  • Flood/fire/drought detection
  • Urban sprawl monitoring

3. Document Processing

  • OCR (Optical Character Recognition) preprocessing
  • Binary thresholding for document scanning
  • Skew correction of scanned pages
  • Barcode and QR code reading

4. Photography & Cinematography

  • HDR (High Dynamic Range) imaging
  • Noise reduction in low-light photos
  • Image stabilization
  • Depth-of-field simulation

5. Industrial Inspection

  • PCB defect detection
  • Surface scratch and crack inspection
  • Dimensional measurement using machine vision
  • Product sorting by color, shape, size

6. Security & Forensics

  • Fingerprint enhancement and matching
  • Face image restoration
  • License plate recognition
  • Surveillance video analysis

Applications of Computer Vision

1. Autonomous Vehicles

  • Lane line detection
  • Pedestrian and obstacle detection
  • Traffic sign recognition
  • 3D scene reconstruction using LiDAR + cameras

2. Facial Recognition

  • Unlock systems (iPhone Face ID)
  • Attendance management
  • Law enforcement (surveillance matching)
  • Emotion and age estimation

3. Medical Diagnosis (AI-assisted)

  • Diabetic retinopathy classification from fundus images
  • Skin lesion classification (benign vs malignant)
  • COVID-19 detection from chest X-rays
  • Polyp detection in colonoscopy

4. Augmented & Virtual Reality

  • Marker-based AR (detecting fiducial markers)
  • SLAM (Simultaneous Localization and Mapping)
  • Hand gesture recognition
  • Body pose estimation

5. Robotics

  • Object grasping using visual feedback
  • Obstacle avoidance
  • Navigation in unknown environments

6. Agriculture (Precision Farming)

  • Crop disease detection from drone imagery
  • Weed vs crop classification
  • Irrigation planning via NDVI mapping

7. Retail & E-commerce

  • Amazon Go: cashierless checkout using cameras
  • Visual product search (Google Lens)
  • Inventory shelf monitoring

Image File Formats

An image file format is a standardized way of organizing and storing digital images. Formats differ in:

  • Compression: Lossless vs Lossy
  • Color depth: 1-bit, 8-bit, 16-bit, HDR
  • Transparency support: Alpha channel
  • Color spaces supported

Lossless Formats (No quality loss on compression)

BMP (Bitmap Image File)

  • Little or no compression (Run-Length Encoding optional)
  • Simple structure: file header + pixel array
  • Large file sizes (stores every pixel raw)
  • Native Windows format
  • Best for: Raw uncompressed data, simple applications
  • Color depth: 1, 4, 8, 16, 24, 32-bit

PNG (Portable Network Graphics)

  • Lossless compression using DEFLATE (LZ77 + Huffman coding)
  • Supports transparency (alpha channel, 8 or 16-bit)
  • Supports: RGB, RGBA, Grayscale, Indexed color
  • Better than GIF for non-animated images
  • Best for: Web graphics, screenshots, logos, text overlays
  • Not ideal for: Photographs (large file size vs JPEG)

TIFF (Tagged Image File Format)

  • Can be lossless or lossy compressed
  • Supports multiple pages (multi-page documents)
  • Rich metadata support (EXIF, IPTC, XMP)
  • Supports: 8-bit, 16-bit, 32-bit (HDR) per channel
  • Best for: Medical imaging, professional photography, archival, printing workflows
  • Color spaces: RGB, CMYK, Lab, YCbCr

GIF (Graphics Interchange Format)

  • Lossless LZW compression
  • Limited to 256 colors (8-bit palette)
  • Supports frame-based animation
  • Supports 1-bit transparency (binary, not alpha)
  • Best for: Simple animations, icons with few colors
  • Not suitable for: Photographs, gradient-rich images

WebP (by Google, Lossless mode)

  • Lossless uses LZ77 + Huffman coding (like PNG but ~26% smaller)
  • Supports alpha channel in lossless mode

Lossy Formats (Quality reduced to achieve compression)

JPEG (Joint Photographic Experts Group)

  • Compression: DCT (Discrete Cosine Transform) based
    1. Convert RGB → YCbCr (chroma subsampling)
    2. Divide into 8×8 blocks
    3. Apply 2D DCT to each block
    4. Quantize DCT coefficients (lossy step)
    5. Entropy code (Huffman / arithmetic coding)
  • Quality factor: 1–100 (higher = less compression, better quality)
  • Artifacts: "Blocky" regions and ringing at high compression
  • Does NOT support transparency
  • Best for: Natural photographs, complex images with gradients
  • Not ideal for: Text, sharp edges, logos (JPEG artifacts visible)

WebP (by Google, Lossy mode)

  • Based on VP8 video codec's intra-frame coding
  • Combines DCT (like JPEG) with predictive coding
  • 25–34% smaller than JPEG at equivalent quality
  • Supports both transparency and animation
  • Best for: Web images where bandwidth matters

HEIF/HEIC (High Efficiency Image File Format)

  • Based on HEVC/H.265 video codec
  • ~50% smaller than JPEG at same perceptual quality
  • Supports 16-bit color, depth maps, HDR
  • Default format on iPhone (iOS 11+)
  • Limited browser support (mainly Safari)

Format Comparison Table

FormatCompressionTransparencyAnimationBest Use
BMPNone/RLENoNoRaw pixel editing
PNGLossless (DEFLATE)Yes (alpha)NoWeb graphics, screenshots
TIFFLossless/LossyYesMulti-pageProfessional, medical
GIFLossless (LZW)1-bitYesSimple animations
JPEGLossy (DCT)NoNoPhotographs
WebPBothYesYesModern web
HEICLossy (HEVC)YesYesMobile photos

Color Depth and Representation

TypeBits/PixelGray Levels / ColorsUsage
Binary12 (black/white)Document scans
Grayscale8256 gray levelsMedical, surveillance
True Color2416.7 million colors (RGB)Photography
Deep Color48281 trillion (RGB 16-bit/ch)Professional editing, HDR