Benchmark Sources

Data is aggregated from 10 benchmark sources covering multiple evaluation categories

...

EVA Benchmark
pathology
H&E
radiology

Comprehensive evaluation framework for pathology foundation models covering patch-level classification, slide-level analysis, and segmentation tasks.

...

HEST-Benchmark
spatial-transcriptomics
H&E

HEST-Benchmark is a spatial transcriptomics benchmark from the Mahmood Lab evaluating foundation models on gene expression prediction from H&E images. The benchmark assesses how well models can predict spatially-resolved gene expression patterns directly from histology across 9 cancer types using Pearson correlation as the primary metric.

...

PathBench
pathology
H&E

Large-scale benchmark for pathology foundation models across 229 tasks including classification, OS prediction, DFS prediction, and DSS prediction.

...

Patho-Bench
pathology
H&E

Comprehensive benchmark from Mahmood Lab with 95 tasks across 33 datasets covering mutation prediction, TME characterization, survival, grading, and treatment response.

...

PathoROB
pathology
robustness
H&E

Robustness benchmark evaluating pathology foundation models across domain shift scenarios including TCGA 2x2 splits, Camelyon, and Tolkach ESCA datasets.

...

Plismbench
pathology
robustness
H&E

Robustness benchmark from Owkin evaluating embedding consistency across scanners and staining variations using cosine similarity and top-10 retrieval accuracy.

...

Sinai SSL Benchmark
pathology
H&E

Comprehensive benchmark from Mount Sinai evaluating pathology foundation models on cancer detection and biomarker prediction tasks across multiple organs and institutions.

...

STAMP Benchmark
pathology
H&E

Nature Biomedical Engineering benchmark evaluating 15 foundation models as feature extractors for weakly supervised computational pathology across morphology, biomarker, and prognosis tasks.

...

Stanford PathBench
pathology
H&E

Comprehensive benchmark evaluating 31 foundation models across 41 tasks from TCGA, CPTAC, and external datasets.

...

THUNDER Benchmark
pathology
H&E

Comprehensive benchmark evaluating pathology foundation models across KNN classification, linear probing, few-shot learning, segmentation, calibration, and adversarial robustness tasks.