Back to Benchmarks

Plismbench
pathology
robustness
H&E

Robustness benchmark from Owkin evaluating embedding consistency across scanners and staining variations using cosine similarity and top-10 retrieval accuracy.

16 models evaluated
4 tasks
Organs:
multi-organ

Detailed Results

Model
Average
rank
Average
metric
Cosine Similarity
Embedding Consistency
Top-10 Cross-Scanner
Scanner Robustness
Top-10 Cross-Staining
Stain Robustness
Top-10 Cross-Scanner/Staining
Combined Robustness
2.250.5410.8000.8640.3180.183
2.750.5220.7680.7090.4080.203
3.500.4980.8460.7520.2410.155
4.000.4800.6850.7440.3270.166
4.250.4640.7770.6090.3060.163
7.000.3730.7480.4350.2000.108
8.750.3330.5700.5920.1180.054
8.750.3320.5910.5010.1900.046
9.000.3250.7640.3460.1470.045
10UNI
9.500.3250.5470.5320.1690.053
10.500.2650.5940.3560.0920.017
11.000.2440.8780.0540.0400.004
13.000.1930.6220.1250.0210.004
12.750.1830.5690.1150.0410.006
14.500.1640.5570.0640.0300.003
14.500.1470.4900.0610.0300.008