Hosted Kubernetes for AI

Run AI agents
on real K8s.
Pay by the minute.

Liveness checks, TLS certs, service mesh, ingress routing. All the things Kubernetes gives you for free, without the six weeks of setup. Built for builders shipping AI agents to production.

$ lattice deploy --gpu a100 --replicas 3

Provisioning 3x A100 pods...

✓ agent-worker-01 running (12s)

✓ agent-worker-02 running (14s)

✓ agent-worker-03 running (13s)

TLS cert issued, mesh route configured

→ https://my-agent.lattice.run

The Problem

Hosting AI agents shouldn't require a platform team

You've got 5 machines sitting idle 75% of the day. Your agents restart and nobody notices. Cert renewal is a cron job and a prayer. There's a better way.

Setup

SSH keys, VPS configs, 40-page runbooks

One command. Live in seconds.

Health

Agent crashed at 3am, nobody knew

K8s liveness probes restart automatically.

Networking

Manual nginx, Let's Encrypt, iptables

TLS, ingress, DNS. Handled.

Billing

Reserved instances burning cash overnight

Per-minute billing. Scale to zero.

Scale

SSH into each box, deploy one by one

Horizontal pod autoscaler. Done.

Orchestration

Custom queues, brittle webhook chains

Service mesh routes agent-to-agent traffic.

Built for Agents

Kubernetes primitives, zero Kubernetes pain

Service Mesh for Agent Orchestration

Agents that talk to other agents need more than HTTP. Lattice's built-in service mesh handles routing, retries, circuit breaking, and observability between your agents. mTLS by default. Trace every request from ingress to response.

agent-planner → mesh → agent-researcher
agent-planner → mesh → agent-writer
agent-writer → mesh → agent-reviewer

latency p99: 12ms
mTLS: enabled
traces: 100%

GPU Pods On Demand

Run your own models when API latency or cost doesn't cut it. Provision A100s, H100s, or consumer GPUs by the minute. Scale to zero when idle. No reserved instances, no commitments, no waste.

$ lattice gpu list

A100 80GB   $0.018/min   available
H100 80GB   $0.042/min   available
RTX 4090    $0.008/min   available
T4 16GB     $0.004/min   available

Real Kubernetes, No Guardrails

This isn't a PaaS pretending to be K8s. You get real pods, services, configmaps, secrets, CRDs. Bring your own Helm charts. Run kubectl. Full control with none of the provisioning headache.

$ kubectl get pods

NAME                STATUS    AGE
planner-7d4f     Running   2h
researcher-a91c Running   2h
writer-3b8e      Running   45m
gpu-infer-f2d1   Running   12m

Pricing

Pay for minutes, not months

No reserved instances. No annual commits. Your agents scale up, you pay more. They scale down, you pay less. Scale to zero, pay nothing.

CPU Agents

$0.002/min

Perfect for agents calling LLM APIs. Most agent workloads start here.

2 vCPU, 4GB RAM per pod
Auto-scaling included
Service mesh + TLS
Scale to zero

GPU Compute

$0.008/min

Run models locally. Inference, fine-tuning, embeddings. Starting from T4.

T4 through H100 available
NVIDIA GPU Operator
Per-minute billing
Multi-GPU pods

Dedicated Cluster

Custom

Your own K8s cluster. Custom networking, compliance, SLAs.

Isolated control plane
Custom node pools
VPC peering
99.9% SLA

Run AI agentson real K8s.Pay by the minute.