Histoboard - Pathology Foundation Model Benchmarks

THUNDER Benchmark
pathology
H&E

Comprehensive benchmark evaluating pathology foundation models across KNN classification, linear probing, few-shot learning, segmentation, calibration, and adversarial robustness tasks.

23 models evaluated

6 tasks

Organs:

multi-organ

Detailed Results

Model	Rank sum	KNN Classification F1-score	Linear Probing F1-score	Few-shot Learning F1-score	Segmentation Dice	Calibration ↓ ECE (%)	Adversarial Attack ↓ ASR (%)
1UNI2	19	0.833	0.857	0.798	0.690	3.9%	31.7%
2Virchow2	30	0.829	0.848	0.739	0.693	3.9%	31.1%
3GenBio-PathFM	30	0.834	0.853	0.794	0.682	4.2%	34.9%
4H0-mini	41	0.797	0.838	0.750	0.691	3.8%	34.3%
5Midnight-12k	44	0.799	0.847	0.715	0.688	2.9%	37.0%
6UNI	46	0.808	0.835	0.781	0.678	3.8%	40.3%
7H-optimus-1	52	0.825	0.851	0.773	0.645	3.5%	57.4%
8KEEP	53	0.815	0.832	0.771	0.680	4.0%	44.9%
9H-optimus-0	57	0.814	0.838	0.762	0.652	4.0%	43.9%
10Prov-GigaPath	61	0.795	0.829	0.755	0.635	3.4%	42.1%
11Hibou-L	66	0.786	0.837	0.738	0.686	4.7%	39.5%
12Hibou-B	66	0.789	0.812	0.763	0.678	3.2%	52.7%
13Virchow	69	0.774	0.828	0.718	0.692	4.5%	38.3%
14Kaiko ViT-B/16	73	0.787	0.814	0.764	0.668	5.0%	38.8%
15OpenMidnight	75	0.793	0.847	0.437	0.691	5.4%	38.3%
16CONCH	78	0.788	0.819	0.734	0.683	4.1%	57.3%
17Kaiko ViT-S/16	79	0.782	0.817	0.751	0.668	4.6%	42.5%
18CONCH 1.5	81	0.799	0.824	0.750	0.688	4.6%	75.8%
19Phikon	83	0.757	0.809	0.736	0.680	5.8%	33.5%
20Phikon-v2	89	0.739	0.797	0.718	0.674	3.9%	43.8%
21MUSK	103	0.777	0.811	0.719	0.651	4.0%	71.9%
22PLIP	135	0.702	0.740	0.641	0.585	4.5%	60.8%
23QuiltNet	135	0.704	0.739	0.656	0.589	5.8%	55.6%

THUNDER BenchmarkpathologyH&E

Detailed Results

THUNDER Benchmark
pathology
H&E