A comprehensive benchmark to evaluate probabilistic robustness (PR) and adversarial robustness (AR) across training methods.
Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the likelihood of encountering AEs under stochastic perturbations. While PR is widely regarded as a practical complement to AR, training methods specifically designed to improve PR remain underdeveloped compared to adversarial training (AT) for AR. Among the few PR-targeted training methods, we identify some key limitations:
Thus, we introduce PRBench, the first benchmark dedicated to evaluating training methods for improving PR. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalisation error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Specifically, PRBench includes 222 trained models based on 7 widely adopted datasets and 10 model architectures. It uses 4 common AT methods to measure AR performance, 2 types of PR metrics for PR, GE metrics for both AR and PR, the clean accuracy and the training time. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance, while PR-targeted training methods consistently yield lower GE and higher clean accuracy.
Comparison of standard training (ERM), corruption training using Uniform, Gaussian, and Laplace perturbations (RT method), and PGD training (standard AT) models in terms of clean accuracy (Acc.), PR(γ) and GEPR(γ=0.03) under various perturbation distributions (Uniform, Gaussian, Laplace).
| Dataset | Model | Method | Acc. % | PRUni.(0.03)% | PRUni.(0.08)% | PRUni.(0.10)% | PRUni.(0.12)% | PRGau.(0.03)% | PRGau.(0.08)% | PRGau.(0.10)% | PRGau.(0.12)% | PRLap.(0.03)% | PRLap.(0.08)% | PRLap.(0.10)% | PRLap.(0.12)% | GEPR (Uni.)% |
GEPR (Gau.)% |
GEPR (Lap.)% |
|---|
Performance of AT and PR-targeted training methods across different datasets and models, evaluated by clean accuracy (Acc.), AR (PGD/C&W/Auto-Attack), PR (PR(γ)/ProbAcc(ρ,γ=0.03)), GEAR, GEPR(γ), and training time (s/epoch).
| Dataset | Model | Method | Acc. % | PGD10% | PGD20% | CW% | AA% | PR (0.03)% |
PR (0.08)% |
PR (0.10)% |
PR (0.12)% |
Prob AccPR (0.10)% |
Prob AccPR (0.05)% |
Prob AccPR (0.01)% |
GEAR PGD20% | GEPR 0.03% |
GEPR 0.08% |
GEPR 0.10% |
GEPR 0.12% |
Time s/ep |
|---|
Composite robustness scores of different training methods, aggregated over all test datasets and model architectures, with varying weight assignments for 7 metrics: clean accuracy (Acc.), AR (PGD20), PR(γ), ProbAcc(ρ=0.05), GEAR(PGD20), GEPR(γ=0.08), and training time (s/ep.).
welcome to make a contribution for PRBench!
- submit Issue or Pull Request
- share new Leaderboard images
- update files or examples