skip to content
Lucas Reiners

The Problem with My Previous Solution

A few days ago, I wrote about a PoC for automating our release changelog process using an AI assistant. The results were impressive - what took 8 hours of manual work was reduced to minutes. The workflow worked: fetch commits from GitHub, parse semantic commit messages, enrich with Jira context, and use an LLM to polish the output into customer-friendly language.

But there was a problem I didn’t fully acknowledge in that post: the solution wasn’t production-ready.

The AI assistant I used was convenient for prototyping, but it came with serious limitations for a system I wanted to rely on long-term:

Security concerns: While the assistant runs locally, it has broad system access with no sandboxing. A published CVE (CVE-2026-25253) revealed a one-click remote code execution vulnerability affecting 770,000+ agent installations. Even more concerning, supply chain attacks compromised roughly 20% of the skill marketplace - 800+ malicious skills were planted, with some of the most popular skills designed for remote prompt injection and code execution. Cisco’s security team called it “an absolute nightmare.”

Lack of control: The entire workflow lived inside the assistant’s environment. I couldn’t version control it properly, couldn’t run it in CI/CD, couldn’t monitor it, and couldn’t debug when things went wrong. It was a black box.

No audit trail: When the changelog generation failed or produced incorrect results, I had no way to trace what went wrong. Was it the Git parsing? The Jira API call? The LLM prompt? No logs, no observability.

The experiment proved the concept worked. Now I needed to rebuild it the right way.

Rebuilding with LangChain and AWS Bedrock

I decided to reimplement the entire workflow as a proper TypeScript application using LangChain as the orchestration framework and AWS Bedrock for the LLM capabilities. Here’s why these technologies made sense:

Why LangChain?

LangChain provides a structured framework for building LLM-powered applications. Rather than manually managing prompts and API calls, it offers:

  • Chain abstractions for composing multi-step workflows
  • Built-in retry logic and error handling
  • Standardized interfaces for different LLM providers
  • Memory and context management for multi-turn operations
  • Observability hooks for logging and monitoring

Most importantly, LangChain is just code. I can version control it, test it, deploy it, and understand exactly what it’s doing at every step.

Architecture Philosophy: Structured Pipelines vs. Agentic Tools

One of the most important architectural decisions I made was how to integrate the LLM into the workflow. LangChain supports two fundamentally different approaches:

The Structured Pipeline Approach

To start with, I decided to build a deterministic pipeline where traditional code handles all data retrieval and orchestration, and the LLM is invoked only for the specific tasks it excels at.

Stage 1: Deterministic Data Collection (No LLM)

  • Fetch commits between two refs using the GitHub API
  • Parse semantic commit messages with regex to extract ticket IDs
  • Query Jira’s API in parallel batches to enrich each commit with ticket context
  • All of this is plain TypeScript with explicit error handling, retries, and logging

Stage 2: Per-Commit Summarization (LLM)

  • For each commit, invoke Bedrock once with a carefully crafted prompt
  • Input: the raw commit message + the full Jira ticket description
  • Output: a user-friendly changelog entry that extracts the business value
  • This happens in parallel batches to enhance throughput

Stage 3: Final Polish (LLM)

  • Take all the summarized entries and invoke Bedrock one final time
  • Ensure consistent tone, fix grammar, remove redundancies
  • Output: the final changelog ready to publish

The Agentic Approach

The “modern” way to build LLM applications is to give the model a set of tools—functions it can call to interact with external systems—and let it reason about which tools to use and when. In my case, this would mean:

  • Providing the LLM with tools like fetch_git_commits, get_jira_ticket, parse_commit_message
  • Giving it a high-level goal: “Generate a changelog between v1.0.0 and v2.0.0”
  • Letting the LLM decide: “First I’ll fetch commits, then for each commit I’ll parse the ticket ID, then I’ll fetch that Jira ticket, then I’ll summarize…”

This is powerful and flexible. The LLM can adapt its strategy based on what it discovers. If a Jira API call fails, it might retry with different parameters or skip that entry. It’s autonomous.

But it’s also unpredictable, more expensive, and probably slower.

Every decision the LLM makes requires a round-trip to Bedrock. Every tool call is another LLM invocation. For a workflow processing lots of commits, you’re looking at several LLM calls just for orchestration, on top of the actual summarization work. The latency adds up. The costs skyrocket. And worst of all: debugging becomes way more difficult, because the execution path changes every time based on the model’s reasoning.

What’s Next: From Structured Pipelines to Adaptive Agents

The deterministic pipeline works perfectly for the current use case: generate a changelog between two specific version tags. But it’s inflexible by design.

What a user might want to know instead:

  • “What changed in February?”
  • “Were there any authentication-related changes since the last release?”
  • “Show me all breaking changes since version [xyz]”
  • “What bug fixes are coming in the next release?”

The system only understands one command pattern because the workflow is hardcoded. To answer these natural questions, I’d need to write separate pipelines for each query type.

An agent-based system could handle these queries naturally. Instead of a fixed pipeline, I’d give the LLM tools to access Git and Jira and then let it reason about how to answer, e.g. by finding a specific time range of commits in jira, or all commits with a specific semantic type and so on and then doing the Jira research.

While is will be harder to implement and debug, it would be much more flexible and user-friendly. The LLM would be able to adapt to new query types without needing new code. It would be a true AI assistant for release management, not just a changelog generator.