Intelligent Document Scanner

Image processing / AI scanner

Description

A sophisticated document cropping and processing application that uses OpenCV for automatic mask detection and AI-powered binarization. Features include multi-file upload, real-time image editing with rotation and filtering, background processing with progress tracking, and export capabilities to PDF. Built with React, TypeScript, Canvas API, and integrates with external binarization services.

Problem Solved

Automating document digitization and improving image quality for better OCR results and document archiving.

Technical Highlights

The frontend implements a sophisticated image processing pipeline using Canvas API for real-time manipulation. It features dynamic handle radius calculation based on image dimensions, edge dragging for precise crop adjustments, and a loupe component for detailed editing. The system uses EventSource for real-time progress updates during background binarization and implements intelligent caching for filter combinations.

Frontend Workflow

Upload multiple document images
Automatic mask detection using OpenCV
Interactive crop area adjustment with real-time preview
Apply image filters and rotation
Background binarization processing with progress tracking
Export processed documents to PDF

Backend Workflow

Docker Containerization (TensorFlow 2.15.0-gpu base)
NVIDIA Runtime Configuration
File Upload & Validation
Image Loading with TensorFlow
GPU Memory Configuration (TF_FORCE_GPU_ALLOW_GROWTH)
Image Slicing (224x224 patches)
4-Directional Augmentation
Neural Network Processing (TensorFlow SavedModel)
Sigmoid Activation + Isotonic Regression Calibration
Image Reconstruction from Slices
Post-processing & Cleanup
PNG Export & File Serving
Volume Mounting for Persistent Storage

Model Training

The binarization model uses a modified U-Net architecture with a frozen MobileNetV2 encoder (ImageNet pre-trained) for transfer learning. Training data consists of 224x224 pixel document images with corresponding binary masks from Google Drive. The model implements sophisticated data augmentation including random horizontal/vertical flips, hue adjustments, and innovative synthetic contrast blur generation using 10 random sinusoidal frequencies to simulate realistic document degradation. Training uses Adam optimizer with learning rate 1e-3, binary crossentropy loss, early stopping (patience=10), and learning rate reduction on plateau. The system includes nondeterministic binarization during training (random thresholds 0-255) for improved generalization. For inference, an 8-way test-time augmentation wrapper applies rotations and flips, averaging predictions for robust results. Post-training calibration uses isotonic regression to convert model logits into calibrated probabilities with reliability diagrams. The final model exports to multiple formats (SavedModel, H5, TensorFlow.js) and processes images via 224x224 patches with 4-directional augmentation for handling arbitrary input sizes. GPU optimization includes TF_FORCE_GPU_ALLOW_GROWTH for memory management and batch processing for efficiency.

Tech Stack

ReactTypeScriptTailwindOpenCVCanvasPDFUploadReal-timeREST API