Saturday, July 12, 2025

I Tried Replacing My DevOps Workflow with an LLM Agent — Here's What Broke

llm

Can LLMs Really Replace DevOps Work?

Modern DevOps workflows are repetitive, complex, and extremely automatable. So I asked a bold question:

Can a large language model (LLM) powered agent build, test, deploy, and monitor a real production app without human assistance?

To find out, I built an agentic DevOps system using GPT-4, LangChain, and a few plugin-like tool wrappers. I let it run an end-to-end DevOps workflow on a real application hosted on Azure.

In this post, I’ll walk through:

  • How I designed the LLM DevOps agent

  • What it succeeded at (with code examples)

  • Where it failed (security, observability, memory)

  • What to do if you’re adopting AI-driven DevOps today

Let’s dive into the real findings.

Architecture: How I Built the DevOps Agent

Stack Overview

  • LLM core: OpenAI GPT-4 (via API, temperature 0.2)

  • Agent orchestration: LangChain AgentExecutor with ReAct-style prompting

  • Tools:

    • DockerTool: Analyze repo structure and output Dockerfile

    • CIConfigTool: Create GitHub Actions workflows

    • IaCTool: Scaffold Terraform or Bicep for cloud resources

    • LogTool: Parse logs, match error signatures, suggest fixes

The Application Target

  • Language: Java (Spring Boot, Maven)

  • Infra target: Azure Container Apps

  • Database: Azure PostgreSQL

  • Secrets: Azure Key Vault

The repo was public, and the LLM had full read/write access to the source, shell, and terminal commands via a secure proxy. I acted only as supervisor.

What Worked Surprisingly Well

1. Dockerfile Generation

The agent examined the repo, detected the Maven wrapper, and wrote this on the first try:

FROM eclipse-temurin:17-jdk
WORKDIR /app
COPY . /app
RUN ./mvnw clean package -DskipTests
CMD ["java", "-jar", "target/app.jar"]

It even updated it later when I added environment variables and health checks.

2. CI/CD Pipeline via GitHub Actions

I gave the agent a single prompt:

“Set up CI/CD that builds the project, runs tests, builds a Docker image, and deploys to Azure.”

It generated:


It even asked if I wanted to deploy to staging vs production.

3. Terraform + Bicep Scaffolding

Using only the app description and cloud target (Azure), the agent scaffolded:

  • Azure Container Registry

  • Azure Container Apps + managed identity

  • PostgreSQL with secret injection

  • Monitoring workspace

Example:


Where It Broke (Critically)

1. Security Missteps

  • Injected secrets in plaintext YAML (AZURE_CREDENTIALS: xyz...)

  • No SameSite, Secure, or HttpOnly flags for cookies

  • Skipped RBAC for key services (Key Vault access via public IP)

  • Didn't audit for OWASP Top 10 or CVEs in dependencies

It lacked an opinionated security baseline and assumed permissive defaults.

2. Lack of Observability

Despite provisioning a Log Analytics workspace, it:

  • Forgot to bind logs to the app runtime

  • Didn’t enable metrics collection or log-based alert rules

  • Never wrote structured logs in app code (missed context)

I had to manually add:

And hook that into Azure Monitor manually.

3. Context Drift and Memory Loss

After 2-3 iterations:

  • It forgot previous Terraform state names

  • It reverted Docker optimizations made earlier

  • It proposed redundant changes

LangChain's memory was insufficient for stateful workflows. I eventually used a Redis-backed conversation store, but the agent still couldn’t track state transitions in pipelines over time.

4. No Error Recovery

When deployment failed with:

The LLM only replied:

“Would you like me to try again?”

It didn’t:

  • Detect that the group was missing

  • Suggest az group create

  • Retry with inferred params

This is where true AI-powered autonomous remediation still fails.

Key Lessons for DevOps Engineers

Use LLMs for:

  • Bootstrapping IaC and CI/CD templates

  • Dockerfile creation and language detection

  • Converting bash scripts into portable workflows

  • Generating docs from code

Avoid LLMs for:

  • Security enforcement or audit trails

  • Identity & access provisioning

  • Long-term pipeline state management

  • Observability architecture

  • Incident detection & rollback

Final Verdict: AI Helps, but Doesn’t Replace

The LLM agent helped me build faster. I spent less time writing boilerplate and more time reviewing logic. But when it broke, it broke hard.

Today, AI can generate 60–80% of your DevOps scaffold. But that last 20%? That’s the part where real SREs, security engineers, and ops teams earn their keep.

Don’t fear the LLM. Train it. Supervise it. And never deploy blindly.

NEVER MISS A THING!

Subscribe and get freshly baked articles. Join the community!

Join the newsletter to receive the latest updates in your inbox.

Footer Background

About Cerebrix

Smarter Technology Journalism.

Explore the technology shaping tomorrow with Cerebrix — your trusted source for insightful, in-depth coverage of engineering, cloud, AI, and developer culture. We go beyond the headlines, delivering clear, authoritative analysis and feature reporting that helps you navigate an ever-evolving tech landscape.

From breaking innovations to industry-shifting trends, Cerebrix empowers you to stay ahead with accurate, relevant, and thought-provoking stories. Join us to discover the future of technology — one article at a time.

2025 © CEREBRIX. Design by FRANCK KENGNE.

Footer Background

About Cerebrix

Smarter Technology Journalism.

Explore the technology shaping tomorrow with Cerebrix — your trusted source for insightful, in-depth coverage of engineering, cloud, AI, and developer culture. We go beyond the headlines, delivering clear, authoritative analysis and feature reporting that helps you navigate an ever-evolving tech landscape.

From breaking innovations to industry-shifting trends, Cerebrix empowers you to stay ahead with accurate, relevant, and thought-provoking stories. Join us to discover the future of technology — one article at a time.

2025 © CEREBRIX. Design by FRANCK KENGNE.

Footer Background

About Cerebrix

Smarter Technology Journalism.

Explore the technology shaping tomorrow with Cerebrix — your trusted source for insightful, in-depth coverage of engineering, cloud, AI, and developer culture. We go beyond the headlines, delivering clear, authoritative analysis and feature reporting that helps you navigate an ever-evolving tech landscape.

From breaking innovations to industry-shifting trends, Cerebrix empowers you to stay ahead with accurate, relevant, and thought-provoking stories. Join us to discover the future of technology — one article at a time.

2025 © CEREBRIX. Design by FRANCK KENGNE.