Installation & Configuration

Configure your AI agent with LLM providers, observability integrations, and auto-remediation tiers.

LLM Configuration

Supported LLM Providers

Choose from multiple LLM providers to power your AI operations

Provider	Models	Notes
OpenAI	gpt-4, gpt-3.5-turbo	Requires API key
Anthropic	claude-3-opus, claude-3-sonnet	Requires API key
Ollama	Various models	Self-hosted, runs locally
Z.AI	glm-4.7, glm-4, glm-4-air, glm-4-flash, glm-4-plus	Requires API key from Z.AI Coding Plan

Observability Integration

Prometheus

Metrics collection for anomaly detection

Seamlessly integrates with Prometheus for real-time metrics collection. Uses custom metrics and recording rules to enhance incident detection accuracy.

Loki

Log aggregation for pattern recognition

Connects to Loki for centralized log aggregation. Parses and analyzes logs to identify error patterns and correlations with system events.

Jaeger

Distributed tracing for correlation

Integrates with Jaeger for distributed tracing. Tracks request flows across microservices to pinpoint performance bottlenecks and failure points.

OpenTelemetry

Unified observability pipeline

Supports OpenTelemetry for standardized telemetry collection. Provides a unified approach to metrics, logs, and traces across your entire infrastructure.

Auto-Remediation Tiers

Tier 1: Safe Actions

Automatic, no approval required

Low-risk operations that can safely run without human intervention.

Restart failing pods automatically
Scale replicas based on load
Clear stale cache entries

Tier 2: Approval Required

Medium-risk actions need human approval

Operations that require approval before execution, with configurable timeout.

Rollback deployments to previous versions
Drain nodes for maintenance
Restart services across namespaces

Approval Channels: Slack, Teams, Discord, PagerDuty

Tier 3: Manual Only

High-risk actions require manual execution

Critical operations that must be reviewed and executed by operators.

Delete namespaces or resources
Modify cluster-wide configurations
Change networking policies

Environment Variables

Required Environment Variables

Configure these environment variables to enable LLM provider integration

OPENAI_API_KEY

Required for OpenAI provider. Get your API key from the OpenAI dashboard.

Environment Variable

export OPENAI_API_KEY=your-openai-api-key-here

ANTHROPIC_API_KEY

Required for Anthropic provider. Get your API key from the Anthropic console.

Environment Variable

export ANTHROPIC_API_KEY=your-anthropic-api-key-here

ZAI_API_KEY

Required for Z.AI provider. Get your API key from Z.AI Coding Plan.

Environment Variable

export ZAI_API_KEY=your-zai-api-key-here

Configuration Examples

Basic AgentController Configuration

Complete example with OpenAI integration

01-agentcontroller.yaml

apiVersion: ai.aik8s.io/v1alpha1
kind: AgentController
metadata:
  name: my-agent
  namespace: aik8s-system
spec:
  enablePredictiveEngine: true
  enableAutoRemediation: true
  enableKnowledgeGraph: true

  llm:
    provider: openai
    model: gpt-4
    maxTokens: 2000
    apiKeySecret:
      name: openai-key
      namespace: aik8s-system
      key: api-key

  autoRemediation:
    enableTier1: true
    enableTier2: true
    approval:
      platform: slack
      channel: "#ops-alerts"
      timeout: 5m
    rollbackTimeout: 10m

  observability:
    prometheusUrl: http://prometheus-operated.monitoring.svc.cluster.local:9090
    lokiUrl: http://loki.monitoring.svc.cluster.local:3100
    metricsInterval: 30s

  clusters:
  - name: prod-cluster
    region: us-west-2
  - name: staging-cluster
    region: us-west-2

Z.AI Integration Example

Example with Z.AI GLM-4.7 as LLM provider

02-zai-agentcontroller.yaml

# Example AgentController with Z.AI Coding Plan Integration
apiVersion: ai.aik8s.io/v1alpha1
kind: AgentController
metadata:
  name: zai-agent
  namespace: aik8s-system
spec:
  enablePredictiveEngine: true
  enableAutoRemediation: true
  enableKnowledgeGraph: true

  llm:
    provider: zai
    model: glm-4.7
    maxTokens: 2000
    apiKeySecret:
      name: zai-api-key
      namespace: aik8s-system
      key: api-key

  autoRemediation:
    enableTier1: true
    enableTier2: false
    approval:
      platform: slack
      channel: "#ops-alerts"
      timeout: 5m
    rollbackTimeout: 10m

  observability:
    prometheusUrl: http://prometheus-operated.monitoring.svc.cluster.local:9090
    lokiUrl: http://loki.monitoring.svc.cluster.local:3100
    metricsInterval: 30s

  clusters:
  - name: prod-cluster
    region: us-west-2
  - name: staging-cluster
    region: us-east-1

Need More Details?

Explore the complete API Reference for detailed CRD specifications, field definitions, and advanced configuration options.

View API Reference