Benchmark Sources

Data is aggregated from 12 benchmark sources covering multiple evaluation categories

BC Survival Benchmark
pathology
survival
H&E

Multi-cohort benchmark by Gustafsson et al. (2026) evaluating 13 pathology foundation models on breast cancer survival prediction (RFS and PFS) across three independent Swedish clinical cohorts (N=5,434 patients). Models are assessed on all patients and on the ER+ & HER2− subgroup, using concordance index (C-index) as the primary metric.

...

EVA Benchmark
pathology
H&E
radiology

Comprehensive evaluation framework for pathology foundation models covering patch-level classification, slide-level analysis, and segmentation tasks.

...

HEST-Benchmark
spatial-transcriptomics
H&E

HEST-Benchmark is a spatial transcriptomics benchmark from the Mahmood Lab evaluating foundation models on gene expression prediction from H&E images. The benchmark assesses how well models can predict spatially-resolved gene expression patterns directly from histology across 9 cancer types using Pearson correlation as the primary metric.

...

HKUST PathBench
pathology
H&E

Large-scale benchmark for pathology foundation models across 229 tasks including classification, OS prediction, DFS prediction, and DSS prediction.

...

Patho-Bench
pathology
H&E

Comprehensive benchmark from Mahmood Lab with 95 tasks across 33 datasets covering mutation prediction, TME characterization, survival, grading, and treatment response.

...

PathoROB
pathology
robustness
H&E

Robustness benchmark evaluating pathology foundation models across domain shift scenarios including TCGA 2x2 splits, Camelyon, and Tolkach ESCA datasets.

...

PFM-DenseBench
pathology
segmentation
H&E

Dense prediction benchmark evaluating pathology foundation models on 18 segmentation datasets covering nuclear, tissue, and gland segmentation tasks. Rankings are average ranks across datasets (lower is better).

...

Plismbench
pathology
robustness
H&E

Robustness benchmark from Owkin evaluating embedding consistency across scanners and staining variations using cosine similarity and top-10 retrieval accuracy.

...

Sinai SSL Benchmark
pathology
H&E

Comprehensive benchmark from Mount Sinai evaluating pathology foundation models on cancer detection and biomarker prediction tasks across multiple organs and institutions.

...

STAMP Benchmark
pathology
H&E

Nature Biomedical Engineering benchmark evaluating 15 foundation models as feature extractors for weakly supervised computational pathology across morphology, biomarker, and prognosis tasks.

...

Stanford PathBench
pathology
H&E

Comprehensive benchmark evaluating 31 foundation models across 41 tasks from TCGA, CPTAC, and external datasets.

...

THUNDER Benchmark
pathology
H&E

Comprehensive benchmark evaluating pathology foundation models across KNN classification, linear probing, few-shot learning, segmentation, calibration, and adversarial robustness tasks.