Back to Benchmarks

THUNDER Benchmark
pathology
H&E

Comprehensive benchmark evaluating pathology foundation models across KNN classification, linear probing, few-shot learning, segmentation, calibration, and adversarial robustness tasks.

22 models evaluated
6 tasks
Organs:
multi-organ

Detailed Results

Model
Rank
sum
KNN Classification
F1-score
Linear Probing
F1-score
Few-shot Learning
F1-score
Segmentation
Dice
Calibration
ECE (%)
Adversarial Attack
ASR (%)
180.8330.8570.7980.6903.9%31.7%
270.8290.8480.7390.6933.9%31.1%
380.7970.8380.7500.6913.8%34.3%
400.7990.8470.7150.6882.9%37.0%
410.8080.8350.7810.6783.8%40.3%
470.8250.8510.7730.6453.5%57.4%
480.8150.8320.7710.6804.0%44.9%
520.8140.8380.7620.6524.0%43.9%
560.7950.8290.7550.6353.4%42.1%
610.7860.8370.7380.6864.7%39.5%
610.7890.8120.7630.6783.2%52.7%
640.7740.8280.7180.6924.5%38.3%
670.7870.8140.7640.6685.0%38.8%
700.7930.8470.4370.6915.4%38.3%
730.7820.8170.7510.6684.6%42.5%
740.7880.8190.7340.6834.1%57.3%
760.7990.8240.7500.6884.6%75.8%
780.7570.8090.7360.6805.8%33.5%
840.7390.7970.7180.6743.9%43.8%
980.7770.8110.7190.6514.0%71.9%
1290.7020.7400.6410.5854.5%60.8%
1290.7040.7390.6560.5895.8%55.6%