PRBench: Probabilistic Robustness Benchmark

A comprehensive benchmark to evaluate probabilistic robustness (PR) and adversarial robustness (AR) across training methods.

Abstract

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the likelihood of encountering AEs under stochastic perturbations. While PR is widely regarded as a practical complement to AR, training methods specifically designed to improve PR remain underdeveloped compared to adversarial training (AT) for AR. Among the few PR-targeted training methods, we identify some key limitations:

i) They use different evaluation metrics, with none of them adopting a comprehensive set;
ii) PR study typically evaluates only a limited subset of AT methods;
iii) There is no unified theoretical framework for comparing the generalisability of those training methods.

Thus, we introduce PRBench, the first benchmark dedicated to evaluating training methods for improving PR. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalisation error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Specifically, PRBench includes 222 trained models based on 7 widely adopted datasets and 10 model architectures. It uses 4 common AT methods to measure AR performance, 2 types of PR metrics for PR, GE metrics for both AR and PR, the clean accuracy and the training time. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance, while PR-targeted training methods consistently yield lower GE and higher clean accuracy.

Loss functions and AE generation strategies for AT and PR-targeted training methods

Evaluation Metrics

Available Leaderboards

Comparison of standard training (ERM), corruption training using Uniform, Gaussian, and Laplace perturbations (RT method), and PGD training (standard AT) models in terms of clean accuracy (Acc.), PR(γ) and GE_PR(γ=0.03) under various perturbation distributions (Uniform, Gaussian, Laplace).

Dataset:

Model:

Method:

Dataset	Model	Method	Acc. %	PR_Uni.(0.03)%	PR_Uni.(0.08)%	PR_Uni.(0.10)%	PR_Uni.(0.12)%	PR_Gau.(0.03)%	PR_Gau.(0.08)%	PR_Gau.(0.10)%	PR_Gau.(0.12)%	PR_Lap.(0.03)%	PR_Lap.(0.08)%	PR_Lap.(0.10)%	PR_Lap.(0.12)%	GE_PR (Uni.)%	GE_PR (Gau.)%	GE_PR (Lap.)%

Performance of AT and PR-targeted training methods across different datasets and models, evaluated by clean accuracy (Acc.), AR (PGD/C&W/Auto-Attack), PR (PR(γ)/ProbAcc(ρ,γ=0.03)), GE_AR, GE_PR(γ), and training time (s/epoch).

Dataset:

Model:

Method:

Dataset	Model	Method	Acc. %	PGD¹⁰%	PGD²⁰%	CW%	AA%	PR (0.03)%	PR (0.08)%	PR (0.10)%	PR (0.12)%	Prob AccPR (0.10)%	Prob AccPR (0.05)%	Prob AccPR (0.01)%	GE_AR PGD²⁰%	GE_PR 0.03%	GE_PR 0.08%	GE_PR 0.10%	GE_PR 0.12%	Time s/ep

Select Dataset & Model

Dataset:

Model:

Composite robustness scores of different training methods, aggregated over all test datasets and model architectures, with varying weight assignments for 7 metrics: clean accuracy (Acc.), AR (PGD20), PR(γ), ProbAcc(ρ=0.05), GE_AR(PGD20), GE_PR(γ=0.08), and training time (s/ep.).

Contribution

welcome to make a contribution for PRBench!
- submit Issue or Pull Request
- share new Leaderboard images
- update files or examples