Skip to Content

Leukemia Detection

AI-Powered Blood Cancer Diagnosis Through Deep Learning and Computer Vision

GitHub Repo

Overview / About

Leukemia Detection is a deep learning-based medical imaging system that classifies acute lymphoblastic leukemia (ALL) from peripheral blood smear images. The system uses convolutional neural networks — including EfficientNet B3, ResNet50, and a custom-built CNN — to distinguish between benign cells and three malignant ALL subtypes (Early Pre-B, Pre-B, and Pro-B). A Flask-based API serves real-time predictions, while histogram-based template matching validates that uploaded images are actually white blood cells before classification.

Problem

Diagnosing leukemia traditionally relies on manual microscopic examination of blood smears by trained hematologists — a process that is time-consuming, subjective, and prone to human error. Misclassification of ALL subtypes can lead to incorrect treatment plans, while delayed diagnosis reduces survival rates. Access to expert pathologists is also limited in many regions, creating a critical gap in early detection.

Solution

Build an automated classification system powered by deep learning that analyzes blood smear images and identifies both the presence and specific subtype of ALL. The system processes a 224x224 image through a trained CNN, validates it as a legitimate white blood cell using color histogram comparison, and returns a classification with confidence scores — enabling faster, more consistent, and more accessible diagnosis.

Key Features

  • Multi-class classification: Benign, Malignant Pre-B, Malignant Pro-B, and Malignant Early Pre-B
  • Input validation through histogram-based template matching to reject non-blood-cell images
  • REST API for real-time image classification via base64-encoded uploads
  • Patient record management with MySQL database integration
  • Multiple model architectures benchmarked for optimal accuracy
  • Data augmentation pipeline (flipping, scaling) for robust generalization

Tech Stack

  • Python
  • TensorFlow / Keras
  • OpenCV
  • Flask
  • MySQL
  • NumPy
  • PIL / Pillow

How It Works

  1. Image Upload — A blood smear image is submitted to the Flask API as a base64-encoded string
  2. Input Validation — The image is compared against a template white blood cell using color histogram correlation across RGB channels, rejecting non-blood-cell images (threshold: 0.6)
  3. Preprocessing — The validated image is resized to 224x224 and preprocessed using MobileNetV2's normalization pipeline
  4. Classification — The CNN model analyzes the processed image and outputs probability scores across all four classes
  5. Result — The system returns the predicted class (Benign or specific ALL subtype) along with full prediction confidence scores
  6. Record Storage — Patient information and diagnosis results can be stored in a MySQL database for clinical record-keeping

Results / Outcomes

  • Custom CNN achieved the highest accuracy at 97.7% for leukemia stage classification
  • EfficientNet B3 reached 95.56% training accuracy and 93.75% validation accuracy
  • ResNet50 achieved 91.32% accuracy using residual learning
  • Models demonstrate strong generalization across a dataset of 3,242 images from 89 patients
  • Input validation layer effectively filters out non-blood-cell images to prevent false classifications

My Role

  • Designed and implemented the complete deep learning pipeline: data preprocessing, model training, evaluation, and deployment
  • Benchmarked three CNN architectures (EfficientNet B3, ResNet50, Custom CNN) to identify the optimal model for classification accuracy
  • Built the image validation system using OpenCV histogram comparison to ensure input quality
  • Developed the Flask REST API for real-time inference with base64 image handling
  • Integrated MySQL database for patient record management
  • Applied data augmentation and preprocessing strategies to maximize model generalization on a 3,242-image dataset