Back to Benchmarks

Plismbench
pathology
robustness
H&E

Robustness benchmark from Owkin evaluating embedding consistency across scanners and staining variations using cosine similarity and top-10 retrieval accuracy.

15 models evaluated
4 tasks
Organs:
multi-organ

Detailed Results

Model
Average
rank
Average
metric
Cosine Similarity
Embedding Consistency
Top-10 Cross-Scanner
Scanner Robustness
Top-10 Cross-Staining
Stain Robustness
Top-10 Cross-Scanner/Staining
Combined Robustness
1.750.5410.8000.8640.3180.183
3.000.4980.8460.7520.2410.155
3.250.4800.6850.7440.3270.166
3.500.4640.7770.6090.3060.163
6.000.3730.7480.4350.2000.108
7.750.3330.5700.5920.1180.054
7.750.3320.5910.5010.1900.046
8.000.3250.7640.3460.1470.045
8.500.3250.5470.5320.1690.053
9.500.2650.5940.3560.0920.017
10.250.2440.8780.0540.0400.004
12.000.1930.6220.1250.0210.004
11.750.1830.5690.1150.0410.006
13.500.1640.5570.0640.0300.003
13.500.1470.4900.0610.0300.008