PRBench: Probabilistic Robustness Benchmark

A comprehensive benchmark to evaluate probabilistic robustness (PR) and adversarial robustness (AR) across training methods.

Abstract

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the likelihood of encountering AEs under stochastic perturbations. While PR is widely regarded as a practical complement to AR, training methods specifically designed to improve PR remain underdeveloped compared to adversarial training (AT) for AR. Among the few PR-targeted training methods, we identify some key limitations:

Thus, we introduce PRBench, the first benchmark dedicated to evaluating training methods for improving PR. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalisation error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Specifically, PRBench includes 222 trained models based on 7 widely adopted datasets and 10 model architectures. It uses 4 common AT methods to measure AR performance, 2 types of PR metrics for PR, GE metrics for both AR and PR, the clean accuracy and the training time. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance, while PR-targeted training methods consistently yield lower GE and higher clean accuracy.

Abstract Diagram

Loss functions and AE generation strategies for AT and PR-targeted training methods

Loss functions and AE strategies

Evaluation Metrics

Evaluation metrics

Available Leaderboards

Comparison of standard training (ERM), corruption training using Uniform, Gaussian, and Laplace perturbations (RT method), and PGD training (standard AT) models in terms of clean accuracy (Acc.), PR(γ) and GEPR(γ=0.03) under various perturbation distributions (Uniform, Gaussian, Laplace).

Dataset:
Model:
Method:
Dataset Model Method Acc. % PRUni.(0.03)% PRUni.(0.08)% PRUni.(0.10)% PRUni.(0.12)% PRGau.(0.03)% PRGau.(0.08)% PRGau.(0.10)% PRGau.(0.12)% PRLap.(0.03)% PRLap.(0.08)% PRLap.(0.10)% PRLap.(0.12)% GEPR
(Uni.)%
GEPR
(Gau.)%
GEPR
(Lap.)%

Performance of AT and PR-targeted training methods across different datasets and models, evaluated by clean accuracy (Acc.), AR (PGD/C&W/Auto-Attack), PR (PR(γ)/ProbAcc(ρ,γ=0.03)), GEAR, GEPR(γ), and training time (s/epoch).

Dataset:
Model:
Method:
Dataset Model Method Acc. % PGD10% PGD20% CW% AA% PR
(0.03)%
PR
(0.08)%
PR
(0.10)%
PR
(0.12)%
Prob
AccPR
(0.10)%
Prob
AccPR
(0.05)%
Prob
AccPR
(0.01)%
GEAR PGD20% GEPR
0.03%
GEPR
0.08%
GEPR
0.10%
GEPR
0.12%
Time
s/ep

Select Dataset & Model

Dataset:
Model:

Composite robustness scores of different training methods, aggregated over all test datasets and model architectures, with varying weight assignments for 7 metrics: clean accuracy (Acc.), AR (PGD20), PR(γ), ProbAcc(ρ=0.05), GEAR(PGD20), GEPR(γ=0.08), and training time (s/ep.).

Composite robustness scores

Contribution

welcome to make a contribution for PRBench!
- submit Issue or Pull Request
- share new Leaderboard images
- update files or examples