I Tried Replacing My DevOps Workflow with an LLM Agent

Cerebrix

Cloud & DevOps

Saturday, July 12, 2025

I Tried Replacing My DevOps Workflow with an LLM Agent — Here's What Broke

Franck Kengne

Can LLMs Really Replace DevOps Work?

Modern DevOps workflows are repetitive, complex, and extremely automatable. So I asked a bold question:

Can a large language model (LLM) powered agent build, test, deploy, and monitor a real production app without human assistance?

To find out, I built an agentic DevOps system using GPT-4, LangChain, and a few plugin-like tool wrappers. I let it run an end-to-end DevOps workflow on a real application hosted on Azure.

In this post, I’ll walk through:

How I designed the LLM DevOps agent
What it succeeded at (with code examples)
Where it failed (security, observability, memory)
What to do if you’re adopting AI-driven DevOps today

Let’s dive into the real findings.

Architecture: How I Built the DevOps Agent

Stack Overview

LLM core: OpenAI GPT-4 (via API, temperature 0.2)
Agent orchestration: LangChain AgentExecutor with ReAct-style prompting
Tools:
- DockerTool: Analyze repo structure and output Dockerfile
- CIConfigTool: Create GitHub Actions workflows
- IaCTool: Scaffold Terraform or Bicep for cloud resources
- LogTool: Parse logs, match error signatures, suggest fixes

The Application Target

Language: Java (Spring Boot, Maven)
Infra target: Azure Container Apps
Database: Azure PostgreSQL
Secrets: Azure Key Vault

The repo was public, and the LLM had full read/write access to the source, shell, and terminal commands via a secure proxy. I acted only as supervisor.

What Worked Surprisingly Well

1. Dockerfile Generation

The agent examined the repo, detected the Maven wrapper, and wrote this on the first try:

FROM eclipse-temurin:17-jdk
WORKDIR /app
COPY . /app
RUN ./mvnw clean package -DskipTests
CMD ["java", "-jar", "target/app.jar"]

It even updated it later when I added environment variables and health checks.

2. CI/CD Pipeline via GitHub Actions

I gave the agent a single prompt:

“Set up CI/CD that builds the project, runs tests, builds a Docker image, and deploys to Azure.”

It generated:

It even asked if I wanted to deploy to staging vs production.

3. Terraform + Bicep Scaffolding

Using only the app description and cloud target (Azure), the agent scaffolded:

Azure Container Registry
Azure Container Apps + managed identity
PostgreSQL with secret injection
Monitoring workspace

Example:

Where It Broke (Critically)

1. Security Missteps

Injected secrets in plaintext YAML (AZURE_CREDENTIALS: xyz...)
No SameSite, Secure, or HttpOnly flags for cookies
Skipped RBAC for key services (Key Vault access via public IP)
Didn't audit for OWASP Top 10 or CVEs in dependencies

It lacked an opinionated security baseline and assumed permissive defaults.

2. Lack of Observability

Despite provisioning a Log Analytics workspace, it:

Forgot to bind logs to the app runtime
Didn’t enable metrics collection or log-based alert rules
Never wrote structured logs in app code (missed context)

I had to manually add:

And hook that into Azure Monitor manually.

3. Context Drift and Memory Loss

After 2-3 iterations:

It forgot previous Terraform state names
It reverted Docker optimizations made earlier
It proposed redundant changes

LangChain's memory was insufficient for stateful workflows. I eventually used a Redis-backed conversation store, but the agent still couldn’t track state transitions in pipelines over time.

4. No Error Recovery

When deployment failed with:

The LLM only replied:

“Would you like me to try again?”

It didn’t:

Detect that the group was missing
Suggest az group create
Retry with inferred params

This is where true AI-powered autonomous remediation still fails.

Key Lessons for DevOps Engineers

Use LLMs for:

Bootstrapping IaC and CI/CD templates
Dockerfile creation and language detection
Converting bash scripts into portable workflows
Generating docs from code

Avoid LLMs for:

Security enforcement or audit trails
Identity & access provisioning
Long-term pipeline state management
Observability architecture
Incident detection & rollback

Final Verdict: AI Helps, but Doesn’t Replace

The LLM agent helped me build faster. I spent less time writing boilerplate and more time reviewing logic. But when it broke, it broke hard.

Today, AI can generate 60–80% of your DevOps scaffold. But that last 20%? That’s the part where real SREs, security engineers, and ops teams earn their keep.

Don’t fear the LLM. Train it. Supervise it. And never deploy blindly.

NEVER MISS A THING!

Subscribe and get freshly baked articles. Join the community!

Join the newsletter to receive the latest updates in your inbox.

July 24, 2025