Installation & Configuration

Configure your AI agent with LLM providers, observability integrations, and auto-remediation tiers.

LLM Configuration

Supported LLM Providers
Choose from multiple LLM providers to power your AI operations
ProviderModelsNotes
OpenAIgpt-4, gpt-3.5-turboRequires API key
Anthropicclaude-3-opus, claude-3-sonnetRequires API key
OllamaVarious modelsSelf-hosted, runs locally
Z.AIglm-4.7, glm-4, glm-4-air, glm-4-flash, glm-4-plusRequires API key from Z.AI Coding Plan

Observability Integration

Prometheus
Metrics collection for anomaly detection

Seamlessly integrates with Prometheus for real-time metrics collection. Uses custom metrics and recording rules to enhance incident detection accuracy.

Loki
Log aggregation for pattern recognition

Connects to Loki for centralized log aggregation. Parses and analyzes logs to identify error patterns and correlations with system events.

Jaeger
Distributed tracing for correlation

Integrates with Jaeger for distributed tracing. Tracks request flows across microservices to pinpoint performance bottlenecks and failure points.

OpenTelemetry
Unified observability pipeline

Supports OpenTelemetry for standardized telemetry collection. Provides a unified approach to metrics, logs, and traces across your entire infrastructure.

Auto-Remediation Tiers

Tier 1: Safe Actions
Automatic, no approval required

Low-risk operations that can safely run without human intervention.

  • Restart failing pods automatically
  • Scale replicas based on load
  • Clear stale cache entries
Tier 2: Approval Required
Medium-risk actions need human approval

Operations that require approval before execution, with configurable timeout.

  • Rollback deployments to previous versions
  • Drain nodes for maintenance
  • Restart services across namespaces

Approval Channels: Slack, Teams, Discord, PagerDuty

Tier 3: Manual Only
High-risk actions require manual execution

Critical operations that must be reviewed and executed by operators.

  • Delete namespaces or resources
  • Modify cluster-wide configurations
  • Change networking policies

Environment Variables

Required Environment Variables
Configure these environment variables to enable LLM provider integration

OPENAI_API_KEY

Required for OpenAI provider. Get your API key from the OpenAI dashboard.

Environment Variable
export OPENAI_API_KEY=your-openai-api-key-here

ANTHROPIC_API_KEY

Required for Anthropic provider. Get your API key from the Anthropic console.

Environment Variable
export ANTHROPIC_API_KEY=your-anthropic-api-key-here

ZAI_API_KEY

Required for Z.AI provider. Get your API key from Z.AI Coding Plan.

Environment Variable
export ZAI_API_KEY=your-zai-api-key-here

Configuration Examples

Basic AgentController Configuration
Complete example with OpenAI integration
01-agentcontroller.yaml
apiVersion: ai.aik8s.io/v1alpha1
kind: AgentController
metadata:
  name: my-agent
  namespace: aik8s-system
spec:
  enablePredictiveEngine: true
  enableAutoRemediation: true
  enableKnowledgeGraph: true

  llm:
    provider: openai
    model: gpt-4
    maxTokens: 2000
    apiKeySecret:
      name: openai-key
      namespace: aik8s-system
      key: api-key

  autoRemediation:
    enableTier1: true
    enableTier2: true
    approval:
      platform: slack
      channel: "#ops-alerts"
      timeout: 5m
    rollbackTimeout: 10m

  observability:
    prometheusUrl: http://prometheus-operated.monitoring.svc.cluster.local:9090
    lokiUrl: http://loki.monitoring.svc.cluster.local:3100
    metricsInterval: 30s

  clusters:
  - name: prod-cluster
    region: us-west-2
  - name: staging-cluster
    region: us-west-2
Z.AI Integration Example
Example with Z.AI GLM-4.7 as LLM provider
02-zai-agentcontroller.yaml
# Example AgentController with Z.AI Coding Plan Integration
apiVersion: ai.aik8s.io/v1alpha1
kind: AgentController
metadata:
  name: zai-agent
  namespace: aik8s-system
spec:
  enablePredictiveEngine: true
  enableAutoRemediation: true
  enableKnowledgeGraph: true

  llm:
    provider: zai
    model: glm-4.7
    maxTokens: 2000
    apiKeySecret:
      name: zai-api-key
      namespace: aik8s-system
      key: api-key

  autoRemediation:
    enableTier1: true
    enableTier2: false
    approval:
      platform: slack
      channel: "#ops-alerts"
      timeout: 5m
    rollbackTimeout: 10m

  observability:
    prometheusUrl: http://prometheus-operated.monitoring.svc.cluster.local:9090
    lokiUrl: http://loki.monitoring.svc.cluster.local:3100
    metricsInterval: 30s

  clusters:
  - name: prod-cluster
    region: us-west-2
  - name: staging-cluster
    region: us-east-1

Need More Details?

Explore the complete API Reference for detailed CRD specifications, field definitions, and advanced configuration options.

View API Reference