Introduction to Image Processing & Computer Vision
InteractiveKey differences between image processing and computer vision, applications, and image file formats.
Introduction to Image Processing
What is Digital Image Processing?
A digital image is a two-dimensional function f(x, y), where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point.
When x, y, and the amplitude values of f are all finite, discrete quantities, we call the image a digital image.
- Input: An image (raw, noisy, low-contrast, etc.)
- Processing: Mathematical operations on pixel values
- Output: Either an improved image OR extracted features/characteristics
Digital image processing covers:
- Low-level processing – noise removal, contrast enhancement, sharpening
- Mid-level processing – segmentation, object description
- High-level processing – making sense of objects (computer vision)
What is Computer Vision?
Computer vision is a field of artificial intelligence that aims to replicate and extend human visual perception in machines. It enables computers to derive meaningful high-level information from digital images, video, or other visual inputs.
- Input: An image or video stream
- Output: Scene interpretation, object labels, decisions, 3D models
- Core philosophy: "See and understand" rather than "see and transform"
Key Differences: Image Processing vs Computer Vision
| Aspect | Image Processing | Computer Vision |
|---|---|---|
| Primary Goal | Improve image quality or extract low-level features | Understand scene content, make decisions |
| Input | Digital image/video | Digital image/video |
| Output | Modified image or numerical feature data | High-level description, recognition, decisions |
| Level of Analysis | Low-level (pixel operations) | High-level (semantic understanding) |
| Intelligence Required | Minimal (rule-based math) | High (ML/DL models, reasoning) |
| Examples | Noise removal, contrast stretch, histogram EQ | Object detection, face recognition, scene understanding |
| Techniques | Filters, DFT, morphological ops, histograms | CNNs, RNNs, transformers, SLAM, stereo vision |
| Human Interpretation | Not required (mathematical operations) | Goal is to replicate human understanding |
| Processing Speed | Generally fast | Can be computationally expensive |
| Training Data | Not typically needed | Often requires large labeled datasets |
Key insight: Image Processing is a tool used within Computer Vision. A typical pipeline is: Raw Image → Image Processing (enhance, denoise) → Computer Vision (detect, recognize, decide)
Applications of Image Processing
1. Medical Imaging
- X-ray enhancement for better bone/tissue visibility
- MRI and CT scan noise reduction
- Tumor boundary detection
- Mammography screening
- Retinal image analysis for diabetic retinopathy
2. Remote Sensing & Satellite Imagery
- Land cover classification
- Vegetation index computation (NDVI)
- Flood/fire/drought detection
- Urban sprawl monitoring
3. Document Processing
- OCR (Optical Character Recognition) preprocessing
- Binary thresholding for document scanning
- Skew correction of scanned pages
- Barcode and QR code reading
4. Photography & Cinematography
- HDR (High Dynamic Range) imaging
- Noise reduction in low-light photos
- Image stabilization
- Depth-of-field simulation
5. Industrial Inspection
- PCB defect detection
- Surface scratch and crack inspection
- Dimensional measurement using machine vision
- Product sorting by color, shape, size
6. Security & Forensics
- Fingerprint enhancement and matching
- Face image restoration
- License plate recognition
- Surveillance video analysis
Applications of Computer Vision
1. Autonomous Vehicles
- Lane line detection
- Pedestrian and obstacle detection
- Traffic sign recognition
- 3D scene reconstruction using LiDAR + cameras
2. Facial Recognition
- Unlock systems (iPhone Face ID)
- Attendance management
- Law enforcement (surveillance matching)
- Emotion and age estimation
3. Medical Diagnosis (AI-assisted)
- Diabetic retinopathy classification from fundus images
- Skin lesion classification (benign vs malignant)
- COVID-19 detection from chest X-rays
- Polyp detection in colonoscopy
4. Augmented & Virtual Reality
- Marker-based AR (detecting fiducial markers)
- SLAM (Simultaneous Localization and Mapping)
- Hand gesture recognition
- Body pose estimation
5. Robotics
- Object grasping using visual feedback
- Obstacle avoidance
- Navigation in unknown environments
6. Agriculture (Precision Farming)
- Crop disease detection from drone imagery
- Weed vs crop classification
- Irrigation planning via NDVI mapping
7. Retail & E-commerce
- Amazon Go: cashierless checkout using cameras
- Visual product search (Google Lens)
- Inventory shelf monitoring
Image File Formats
An image file format is a standardized way of organizing and storing digital images. Formats differ in:
- Compression: Lossless vs Lossy
- Color depth: 1-bit, 8-bit, 16-bit, HDR
- Transparency support: Alpha channel
- Color spaces supported
Lossless Formats (No quality loss on compression)
BMP (Bitmap Image File)
- Little or no compression (Run-Length Encoding optional)
- Simple structure: file header + pixel array
- Large file sizes (stores every pixel raw)
- Native Windows format
- Best for: Raw uncompressed data, simple applications
- Color depth: 1, 4, 8, 16, 24, 32-bit
PNG (Portable Network Graphics)
- Lossless compression using DEFLATE (LZ77 + Huffman coding)
- Supports transparency (alpha channel, 8 or 16-bit)
- Supports: RGB, RGBA, Grayscale, Indexed color
- Better than GIF for non-animated images
- Best for: Web graphics, screenshots, logos, text overlays
- Not ideal for: Photographs (large file size vs JPEG)
TIFF (Tagged Image File Format)
- Can be lossless or lossy compressed
- Supports multiple pages (multi-page documents)
- Rich metadata support (EXIF, IPTC, XMP)
- Supports: 8-bit, 16-bit, 32-bit (HDR) per channel
- Best for: Medical imaging, professional photography, archival, printing workflows
- Color spaces: RGB, CMYK, Lab, YCbCr
GIF (Graphics Interchange Format)
- Lossless LZW compression
- Limited to 256 colors (8-bit palette)
- Supports frame-based animation
- Supports 1-bit transparency (binary, not alpha)
- Best for: Simple animations, icons with few colors
- Not suitable for: Photographs, gradient-rich images
WebP (by Google, Lossless mode)
- Lossless uses LZ77 + Huffman coding (like PNG but ~26% smaller)
- Supports alpha channel in lossless mode
Lossy Formats (Quality reduced to achieve compression)
JPEG (Joint Photographic Experts Group)
- Compression: DCT (Discrete Cosine Transform) based
- Convert RGB → YCbCr (chroma subsampling)
- Divide into 8×8 blocks
- Apply 2D DCT to each block
- Quantize DCT coefficients (lossy step)
- Entropy code (Huffman / arithmetic coding)
- Quality factor: 1–100 (higher = less compression, better quality)
- Artifacts: "Blocky" regions and ringing at high compression
- Does NOT support transparency
- Best for: Natural photographs, complex images with gradients
- Not ideal for: Text, sharp edges, logos (JPEG artifacts visible)
WebP (by Google, Lossy mode)
- Based on VP8 video codec's intra-frame coding
- Combines DCT (like JPEG) with predictive coding
- 25–34% smaller than JPEG at equivalent quality
- Supports both transparency and animation
- Best for: Web images where bandwidth matters
HEIF/HEIC (High Efficiency Image File Format)
- Based on HEVC/H.265 video codec
- ~50% smaller than JPEG at same perceptual quality
- Supports 16-bit color, depth maps, HDR
- Default format on iPhone (iOS 11+)
- Limited browser support (mainly Safari)
Format Comparison Table
| Format | Compression | Transparency | Animation | Best Use |
|---|---|---|---|---|
| BMP | None/RLE | No | No | Raw pixel editing |
| PNG | Lossless (DEFLATE) | Yes (alpha) | No | Web graphics, screenshots |
| TIFF | Lossless/Lossy | Yes | Multi-page | Professional, medical |
| GIF | Lossless (LZW) | 1-bit | Yes | Simple animations |
| JPEG | Lossy (DCT) | No | No | Photographs |
| WebP | Both | Yes | Yes | Modern web |
| HEIC | Lossy (HEVC) | Yes | Yes | Mobile photos |
Color Depth and Representation
| Type | Bits/Pixel | Gray Levels / Colors | Usage |
|---|---|---|---|
| Binary | 1 | 2 (black/white) | Document scans |
| Grayscale | 8 | 256 gray levels | Medical, surveillance |
| True Color | 24 | 16.7 million colors (RGB) | Photography |
| Deep Color | 48 | 281 trillion (RGB 16-bit/ch) | Professional editing, HDR |