LeCun 1989 CNN - Digit Recognition

About This Paper

In 1989, Yann LeCun and his team at AT&T Bell Laboratories published a landmark paper in the history of deep learning. They demonstrated that a neural network could learn directly from raw pixels to recognize handwritten characters — without any hand-crafted features. This network was used by the U.S. Postal Service to automatically read zip codes on mail. Its architecture became the foundation of the Convolutional Neural Network (CNN) that dominates modern computer vision.

Network Architecture

Layer	Type	Output	Params
Input	Image	1×16×16	—
H1	Conv 5×5, stride 2	12×8×8	1,068
H2	Sparse Conv 5×5, stride 2	12×4×4	2,592
H3	Fully Connected	30	5,790
Output	Fully Connected	10	310

Total: only 9,760 parameters — incredibly small compared to modern models that have millions to billions of parameters.

Key Innovations

Weight Sharing — Units within the same feature map share identical weights, drastically reducing parameters and providing translation invariance
Feature Maps — Layer H1 has 12 feature maps, each detecting different patterns (edges, corners, etc.) across all image locations
Sparse Connectivity — In H2, each filter connects to only 8 of the 12 H1 feature maps, reducing computation and improving generalization
End-to-End Learning — The entire pipeline from raw pixels to classification is learned through backpropagation

Training Details

Dataset: 9,298 handwritten zip code digits (7,291 train / 2,007 test)
Input: 16×16 pixel grayscale images, normalized to [-1, +1]
Loss: Mean Squared Error (MSE)
Optimizer: SGD, learning rate 0.03, batch size 1
Activation: tanh across all layers
Epochs: 23 passes through the training set

Results

Metric	Paper (1989)	Karpathy (PyTorch)	Ours (Node.js)
Train loss	2.5e-3	4.07e-3	4.85e-3
Train error	0.14%	0.62%	0.86%
Test loss	1.8e-2	2.84e-2	3.02e-2
Test error	5.0%	4.09%	4.19%
Test misses	102	82	84

Small differences are expected due to different datasets (MNIST vs. original zip codes) and different PRNG implementations.

How This Demo Works

This implementation is fully reproduced in pure Node.js without any ML frameworks — all convolution, backpropagation, and SGD operations are written manually from scratch. After training for 23 epochs (~27 seconds), the weights are saved to JSON. The browser then runs the forward pass directly: your drawing is resized to 16×16 pixels, normalized to [-1, +1], and processed through the 4-layer CNN exactly as described in the original paper. Prediction happens in real-time in the browser with no server round-trip.