Project background
Back to Projects

Intelligent Document Scanner

Image processing / AI scanner

Description

A sophisticated document cropping and processing application that uses OpenCV for automatic mask detection and AI-powered binarization. Features include multi-file upload, real-time image editing with rotation and filtering, background processing with progress tracking, and export capabilities to PDF. Built with React, TypeScript, Canvas API, and integrates with external binarization services.

Problem Solved

Automating document digitization and improving image quality for better OCR results and document archiving.

Technical Highlights

The frontend implements a sophisticated image processing pipeline using Canvas API for real-time manipulation. It features dynamic handle radius calculation based on image dimensions, edge dragging for precise crop adjustments, and a loupe component for detailed editing. The system uses EventSource for real-time progress updates during background binarization and implements intelligent caching for filter combinations.

Frontend Workflow

  • Upload multiple document images
  • Automatic mask detection using OpenCV
  • Interactive crop area adjustment with real-time preview
  • Apply image filters and rotation
  • Background binarization processing with progress tracking
  • Export processed documents to PDF

Backend Workflow

  • Docker Containerization (TensorFlow 2.15.0-gpu base)
  • NVIDIA Runtime Configuration
  • File Upload & Validation
  • Image Loading with TensorFlow
  • GPU Memory Configuration (TF_FORCE_GPU_ALLOW_GROWTH)
  • Image Slicing (224x224 patches)
  • 4-Directional Augmentation
  • Neural Network Processing (TensorFlow SavedModel)
  • Sigmoid Activation + Isotonic Regression Calibration
  • Image Reconstruction from Slices
  • Post-processing & Cleanup
  • PNG Export & File Serving
  • Volume Mounting for Persistent Storage

Model Training

The binarization model uses a modified U-Net architecture with a frozen MobileNetV2 encoder (ImageNet pre-trained) for transfer learning. Training data consists of 224x224 pixel document images with corresponding binary masks from Google Drive. The model implements sophisticated data augmentation including random horizontal/vertical flips, hue adjustments, and innovative synthetic contrast blur generation using 10 random sinusoidal frequencies to simulate realistic document degradation. Training uses Adam optimizer with learning rate 1e-3, binary crossentropy loss, early stopping (patience=10), and learning rate reduction on plateau. The system includes nondeterministic binarization during training (random thresholds 0-255) for improved generalization. For inference, an 8-way test-time augmentation wrapper applies rotations and flips, averaging predictions for robust results. Post-training calibration uses isotonic regression to convert model logits into calibrated probabilities with reliability diagrams. The final model exports to multiple formats (SavedModel, H5, TensorFlow.js) and processes images via 224x224 patches with 4-directional augmentation for handling arbitrary input sizes. GPU optimization includes TF_FORCE_GPU_ALLOW_GROWTH for memory management and batch processing for efficiency.

Tech Stack

ReactTypeScriptTailwindOpenCVCanvasPDFUploadReal-timeREST API
Contact background

Let's connect and build something amazing together

I'm always open to discussing new projects, creative ideas or opportunities to be part of your visions.

Intelligent Document Scanner | Denis Vlas Portfolio | Denis Vlas