FRAMEWORM DEPLOY¶

One command from trained model to production API — with drift detection, latency monitoring, and auto-rollback built in.

Quick Start¶

frameworm deploy start \
  --model experiments/checkpoints/best.pt \
  --name face_generator \
  --type dcgan \
  --version v1.2 \
  --shift face_generator \
  --build-docker

That one command:

Exports your model to TorchScript + ONNX
Generates a FastAPI server with the correct input/output schema for your architecture
Builds a multi-stage Docker image (non-root user, HEALTHCHECK baked in)
Starts p50/p95/p99 latency tracking on every request
Auto-attaches FRAMEWORM SHIFT drift monitoring
Starts a background rollback controller

All Commands¶

# Deploy
frameworm deploy start --model experiments/checkpoints/best.pt --name my_model

# Check status of all versions
frameworm deploy status --name my_model

# Promote a version to production
frameworm deploy promote --name my_model --version v2.0 --stage production

# Manual rollback to previous version
frameworm deploy rollback --name my_model

# Stop and archive
frameworm deploy stop --name my_model

Every Deployed Model Gets¶

Endpoint	Method	Description
`/predict`	POST	Run inference
`/health`	GET	Liveness — always 200 while alive
`/ready`	GET	Readiness — 503 until model is loaded
`/metrics`	GET	Live p50/p95/p99 + error rate

Auto-Rollback¶

DEPLOY watches every deployed model in a background thread. It checks every 30 seconds.

Triggers rollback when: - p95 latency exceeds threshold (default: 2000ms) for 3 consecutive checks - Error rate exceeds threshold (default: 10%) for 3 consecutive checks

On rollback, automatically: 1. Looks up the previous production version in the registry 2. Stops the current Docker container 3. Starts the previous version's container 4. Promotes the old version back to production in the registry 5. Fires a Slack alert with reason, p95 value, and timestamp 6. Writes the event to experiments/deploy_logs/ as a JSONL entry

No human needed.

Model-Aware Server Generation¶

Generic deployment tools (BentoML, TorchServe) treat every model identically. FRAMEWORM DEPLOY knows all 6 built-in architectures and generates architecture-specific inference code.

Architecture	Input	Output
VAE	Image tensor (B, C, H, W)	Reconstruction + mu + log_var
DCGAN	Noise vector (B, latent_dim)	Generated images
DDPM	batch_size + num_steps	Denoised images
VQ-VAE-2	Image tensor	Reconstruction + commitment loss
ViT-GAN	Noise vector	Generated images
CFG-DDPM	batch_size + class_labels + guidance_scale	Conditional generated images

Model Lifecycle¶

dev → staging → production → archived

Every version is tracked in the model registry with: git hash, config snapshot, dataset checksum, and training metrics — so you always know exactly what is running in production and where it came from.

Generated Server Structure¶

deploy/generated/<name>/
├── server.py            ← model-type-aware FastAPI server
├── requirements.txt     ← pinned dependencies
├── Dockerfile           ← multi-stage, non-root, HEALTHCHECK included
└── docker-compose.yml   ← one-command local deployment

How DEPLOY Reuses FRAMEWORM¶

Existing piece	DEPLOY usage
Model checkpoints	Exported to TorchScript + ONNX
SHIFT ShiftMonitor	Auto-attached to every deployed model
Slack integration	Rollback + degradation alerts
`experiments/` DB	Deployment history + lineage
p50/p95/p99 monitoring	Latency tracking per endpoint
Model registry	dev → staging → production → archived lifecycle

Configuration¶

configs/deploy_config.yaml

deploy:
  export_format:         ["torchscript", "onnx"]
  quantize:              false
  latency_threshold_ms:  2000
  error_rate_threshold:  0.10
  rollback_checks:       3
  monitor_interval_s:    30
  alert_on:              ["slack", "log"]
  log_path:              "experiments/deploy_logs"
  registry_path:         "experiments/model_registry"

Tests¶

python test_deploy_steps6_10.py   # 14 tests, no pytest needed