Certification Criteria

Every agent must pass a rigorous 4-layer evaluation before earning certification. Each layer tests a different dimension of production readiness.

The 4 Evaluation Layers

Command Correctness

Validates that every command generated by the agent is syntactically correct, semantically meaningful, and executable in the target system. This is the foundation layer - an agent that generates invalid commands cannot proceed.

Evaluation Checks

Syntax validation against target system grammar
Parameter type and range verification
Command dependency chain completeness
Return value handling and error paths
Idempotency verification for repeatable commands

Passing Threshold

Minimum 60% accuracy across test suite

Situational Appropriateness

Evaluates whether the agent's chosen action is appropriate for the current network state, time of day, traffic conditions, and maintenance windows. A correct command at the wrong time is still a wrong action.

Evaluation Checks

Network state awareness (maintenance, degraded, normal)
Traffic load sensitivity and peak hour avoidance
Dependency impact assessment on neighboring cells
Maintenance window compliance verification
Concurrent operation conflict detection

Passing Threshold

Minimum 65% appropriateness score with zero critical-time violations

Anticipated Impact

Uses physics-based simulation models to predict the real-world impact of the agent's proposed actions before execution. The agent must demonstrate that it understands what will happen when its commands are applied to the physical network.

Evaluation Checks

RF propagation model prediction accuracy
KPI impact estimation (throughput, latency, coverage)
Interference pattern prediction for neighbor cells
Energy consumption delta estimation
User experience impact modeling (QoE metrics)

Passing Threshold

Impact prediction accuracy within 15% of simulation results

DOIL Compliance

Verifies that the agent operates strictly within its Declarative Operational Intent Layer contract. The DOIL defines what the agent is allowed to do, its constraints, escalation procedures, and human oversight requirements.

Evaluation Checks

Action boundary enforcement (no out-of-scope operations)
Constraint adherence (parameter limits, rate limits)
Escalation protocol compliance (human-in-the-loop triggers)
Audit trail completeness and traceability
Graceful degradation under constraint violations

Passing Threshold

100% compliance for critical constraints, 90% for advisory constraints

Certification Levels

Bronze

Composite >= 0.60

Minimum viable certification. Agent operates under supervised deployment with restricted scope and mandatory human approval for all actions.

Privileges

Supervised deployment only
Single-site operation
Human approval required for all actions
Weekly review cycle

Silver

Composite >= 0.70

Competent certification. Agent demonstrates reliable performance and can operate with reduced oversight. Suitable for multi-site deployment.

Privileges

Reduced oversight deployment
Multi-site operation
Human approval for high-impact actions only
Bi-weekly review cycle

Gold

Composite >= 0.80

Advanced certification. Agent has proven consistent performance across diverse scenarios. Eligible for autonomous operation within DOIL constraints.

Privileges

Autonomous operation within DOIL
Network-wide deployment
Human notification for high-impact actions
Monthly review cycle

Platinum

Composite >= 0.90

Elite certification. Agent demonstrates exceptional performance and reliability. May serve as a reference model for training other agents.

Privileges

Full autonomous operation
Cross-domain advisory capability
Reference model status
Quarterly review cycle

Production Readiness Requirements

Beyond the 4-layer evaluation, every agent must pass a production readiness assessment before deployment. These are non-negotiable requirements regardless of certification level.

Guardrails Enforced

Hard limits on all actuating parameters. Kill switches verified. Maximum impact boundaries defined and tested.

Rollback Tested

Automated rollback procedures verified in staging. Recovery time measured. State restoration confirmed.

Staged Deployment

Canary or blue-green deployment proven. Minimum 48h observation period. No anomalies detected.

Audit Trail Complete

Full decision logging with intent-to-action mapping. Queryable history. Tamper-evident records.

Human Review Approved

Certified engineer has reviewed agent behavior, edge cases, and failure modes. Written approval on file.

Human Review Process

Submission

Agent completes training in agym.ai and achieves minimum threshold scores. Automatic submission to certification review queue.

Automated Evaluation

The 4-layer evaluation runs against a standardized test suite. Results are compiled into a certification report.

Expert Review

A certified engineer reviews the evaluation results, examines edge cases, and tests failure scenarios manually.

Decision

The reviewer either certifies the agent at the appropriate level, requests additional training, or rejects with detailed feedback.

Monitoring

Certified agents enter continuous monitoring. Certification can be suspended or revoked if performance degrades.

Renewal & Revocation

Renewal

Certifications are not permanent. Agents must be re-evaluated at intervals determined by their certification level (weekly for Bronze, quarterly for Platinum). Re-evaluation uses the latest test suite and may result in certification level changes.

Bronze: Weekly re-evaluation

Silver: Bi-weekly re-evaluation

Gold: Monthly re-evaluation

Platinum: Quarterly re-evaluation

Revocation

Certification can be immediately revoked if an agent violates DOIL constraints, causes measurable service degradation, or fails a surprise audit. Revoked agents are removed from production and must restart the certification process.

DOIL constraint violation → Immediate revocation

KPI degradation >5% → Suspension pending review

Failed audit → Certification level downgrade

Vendor patch without re-certification → Suspension