Multi-scenario integrity prototype

Monitor training and inference behavior before failures become incidents.

DefenseML Input-Weight Association (IWA) is a runnable PyTorch prototype for building integrity models that learn normal behavior signatures and flag anomalies across different attack patterns.

Training Integrity Inference Integrity JSON Exportable
Preview of DefenseML integrity report
Project report preview View full report

Attack scenarios in scope

The prototype supports multiple integrity checks across model lifecycle phases, with attack configuration exposed through CLI flags.

Training-time poisoning: label-flip and backdoor-trigger batch attacks.
Inference-time perturbation: adversarial examples via FGSM in the inference demo.
Controls: tune --attack-type, --attack-prob, --target-class, --trigger-size, and --eps.

Core capabilities

The prototype learns conditional associations in model behavior and turns them into portable integrity artifacts.

Training-time detection

Captures batch to update association and identifies suspicious updates under poisoning attacks.

Inference-time detection

Models input behavior against activation and sensitivity signatures to catch runtime anomalies.

Portable integrity JSON

Exports selected hooks, learned conditional weights, inverse covariance, and threshold without pickle or joblib.

How the pipeline works

A simple four-step flow from instrumentation to deployment-ready integrity artifacts.

Instrument

Attach hooks and telemetry capture for training updates or inference activations.

Fit

Learn conditional associations that represent expected behavior under clean operation.

Score

Compute anomaly scores and thresholds while clean and attacked samples are mixed in evaluation.

Ship

Export JSON integrity models and validate them through import and verification commands.

Quickstart CLI (Multi-scenario)

Run one of the demos below from the repository root based on what you want to test.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Training-time integrity demo (CIFAR-10 + label flip)
python -m iwa_integrity train-demo \
  --warmup-steps 200 \
  --steps-after-warmup 600 \
  --attack-prob 0.10 \
  --attack-type label-flip \
  --out-dir assets

# Training-time integrity demo (CIFAR-10 + backdoor)
python -m iwa_integrity train-demo \
  --warmup-steps 200 \
  --steps-after-warmup 600 \
  --attack-prob 0.10 \
  --attack-type backdoor \
  --target-class 0 \
  --trigger-size 4 \
  --out-dir assets
CLI module: python -m iwa_integrity ...
Training attack switch: --attack-type label-flip | backdoor
Inference attack level: --eps 0.25 (FGSM strength)
JSON export: --export-json assets/integrity.json
JSON import verify: python -m iwa_integrity import assets/integrity.json
Generated assets: training and inference ROC/histogram plots plus report artifacts in assets/

Ready to demo integrity coverage?

Use this page as the front door for both training-time and inference-time integrity workflows.