Back to Benchmarks

THUNDER Benchmark
pathology
H&E

Comprehensive benchmark evaluating pathology foundation models across KNN classification, linear probing, few-shot learning, segmentation, calibration, and adversarial robustness tasks.

23 models evaluated
6 tasks
Organs:
multi-organ

Detailed Results

Model
Rank
sum
KNN Classification
F1-score
Linear Probing
F1-score
Few-shot Learning
F1-score
Segmentation
Dice
Calibration
ECE (%)
Adversarial Attack
ASR (%)
190.8330.8570.7980.6903.9%31.7%
300.8290.8480.7390.6933.9%31.1%
300.8340.8530.7940.6824.2%34.9%
410.7970.8380.7500.6913.8%34.3%
440.7990.8470.7150.6882.9%37.0%
460.8080.8350.7810.6783.8%40.3%
520.8250.8510.7730.6453.5%57.4%
530.8150.8320.7710.6804.0%44.9%
570.8140.8380.7620.6524.0%43.9%
610.7950.8290.7550.6353.4%42.1%
660.7860.8370.7380.6864.7%39.5%
660.7890.8120.7630.6783.2%52.7%
690.7740.8280.7180.6924.5%38.3%
730.7870.8140.7640.6685.0%38.8%
750.7930.8470.4370.6915.4%38.3%
780.7880.8190.7340.6834.1%57.3%
790.7820.8170.7510.6684.6%42.5%
810.7990.8240.7500.6884.6%75.8%
830.7570.8090.7360.6805.8%33.5%
890.7390.7970.7180.6743.9%43.8%
1030.7770.8110.7190.6514.0%71.9%
1350.7020.7400.6410.5854.5%60.8%
1350.7040.7390.6560.5895.8%55.6%