See what your users
see before they do
Upload any image and get an instant AI-generated heatmap showing where human eyes will look first, powered by a CNN trained on 10,000 images with real human fixation data.
Drop an image to analyse
or click to browse
or try an example
Process
How it works
01
Upload
Drop any image · website screenshot, ad, poster, or UI design. JPEG, PNG, or WebP up to 10 MB.
02
AI Analysis
A MobileNetV2 encoder-decoder CNN · trained on 10,000 images with human fixation data · predicts where people look.
03
Attention Map
Get a heatmap showing primary, secondary, and tertiary focus zones plus rule-based UX insights.
Model
About the model
Trained from scratch on SALICON · 10,000 natural images annotated with crowd-sourced human fixation maps.
Evaluation · SALICON validation set (5,000 images)
What do these metrics mean?
AUC-Judd
0.9613
Can the model rank fixated pixels above non-fixated ones?
0.5 = random, 1.0 = perfect. 0.96 means the model almost always assigns higher saliency to pixels humans actually looked at.
CC
0.8756
How closely does the predicted heatmap match the ground truth?
Pearson correlation: -1 = perfectly wrong, 0 = no relationship, 1 = perfect. 0.88 is a strong match.
NSS
2.163
How much above average is predicted saliency at fixation points?
Map normalised to mean 0, std 1. NSS is the average value at human fixation locations. 2.16 standard deviations above average is strong.
SIM
0.7649
How much do the predicted and ground truth distributions overlap?
Histogram intersection: 0 = no overlap, 1 = identical. 0.76 means 76% of the probability mass is shared.
KL-Div
0.2383
How different is the predicted distribution from the real one?
Lower is better. 0 = perfect. 0.24 is low, meaning the model closely matches where humans looked.
Architecture
MobileNetV2 encoder with U-Net-style skip connections and an upsampling decoder. 6.6M parameters.
Training data
SALICON: 10,000 train + 5,000 val images from MS COCO 2014 with crowd-sourced human fixation maps.
Loss function
KL Divergence (1.0x) + Correlation Coefficient (0.5x) + BCE (0.1x). Standard saliency research formulation.
Training
13 epochs on GTX 1650 (4 GB), ~113 min. Early stopping. Mixed precision (AMP). Encoder frozen for first 5 epochs.
Inference
Under 500 ms on GPU, under 2 s on CPU. MobileNetV2 keeps it fast enough for real-time interactive use.
Output
Single-channel saliency map in [0,1], post-processed into a jet-coloured heatmap with attention region analysis.