How I Built an AI LinkedIn Post Generator with Autonomous Agents

Most AI writing tools have the same problem: they sound like AI writing tools. You ask for a LinkedIn post, you get something that starts with "In today's rapidly evolving landscape…" and you immediately close the tab. The content is fluent, technically correct, and completely hollow.

That's the problem I set out to solve with Narrative AI. Not "make LinkedIn posts faster" — that's trivial. The harder problem is: can a system produce content that carries your actual perspective, grounded in real current events, that a professional colleague would read and think that sounds like them?

The answer required rethinking the generation pipeline from scratch. Instead of a single LLM call with a long system prompt, I built a coordinated team of specialized agents — each owning a distinct phase of the content creation process. This post walks through the architecture, the design decisions that weren't obvious, what didn't work, and what I'd do differently.

Background: Why Single-Agent Approaches Fail at Content Quality

The naive approach to AI content generation is a single prompt: "Here's my topic, here's my tone, write me a LinkedIn post." Tools like this exist in abundance. They're fast, cheap, and produce text that's immediately recognizable as AI-generated — not because of specific word choices, but because of what's missing: recency, specificity, and the author's particular angle on a topic.

Three distinct failure modes drove the design of Narrative AI:

Hallucinated context. A single LLM call has no access to current events. Ask it to write about AI regulations and it will confidently synthesize something plausible-sounding but temporally unmoored. LinkedIn audiences in professional niches will notice immediately.

Undifferentiated voice. Without a specialized writing phase, the model defaults to its training distribution — which means it sounds like the median of everything it was trained on. Authoritative, but generic.

No editorial pass. First-draft LLM output tends toward over-qualification, hedging, and awkward transitions. A human writer drafts, steps back, and edits with different eyes. A single-agent pipeline can't separate these concerns.

Core constraint: The target output isn't just readable — it has to survive the "did a human write this?" test when read by a professional in the author's own industry. That's a much higher bar than grammatical correctness.

The insight was to decompose what a skilled human content creator actually does into discrete, composable agents — each optimized for one job.

The Architecture: A Three-Agent Team on Agno

Narrative AI is built on the Agno Framework v2.0 (formerly Phidata), a Python-native runtime for multi-agent systems with first-class support for agent memory, tool integration, structured output, and session persistence. Every agent runs Google Gemini as its underlying model.

The key architectural decision was to use a coordinator pattern: a Team Leader agent that orchestrates three specialists. Each specialist has a narrow responsibility and only receives the output of the previous stage plus its own instructions — not the full task context.

User Input (Topic · Tone · Audience)
           │
           ▼
   ┌──────────────────┐
   │  Team Leader     │  ← Orchestration, routing, final assembly
   └──────┬───────────┘
          │
    ┌─────┼────────────────┐
    ▼     ▼                ▼
Research  Writing        Editor
 Agent    Agent           Agent
    │         │               │
 Brave     Pydantic       Format +
 Search  Structured       Hashtags
 + Jina    Output         + Polish
          
          ▼
   ┌──────────────────────────────────┐
   │  Tools Layer                     │
   │  Brave API · Jina · SQLite/Neon  │
   │  FastAPI Playground :7777        │
   └──────────────────────────────────┘

The Team Leader routes between specialists sequentially. Tools are attached to specific agents rather than shared globally — this prevents the Writing Agent from attempting searches mid-generation instead of working with what Research already surfaced.

The decision to give each agent only the tools it needs was intentional. Tight tool scoping forces cleaner agent boundaries and far more predictable behavior. When debugging a bad output, you can isolate which agent in the chain introduced the problem.

Deep Dive: The Research Agent

The Research Agent's job is to make the final post feel current. It receives the user's topic and target audience, runs targeted searches via the Brave Search API, then uses Jina's async scraper to fetch and extract actual article content — not just headlines.

The non-obvious part: raw web content is noisy. Full articles contain navigation, comments, ads, cookie banners. Jina's reader mode strips these down to structured markdown, but the agent still needs to synthesize across multiple sources and surface only the 3–5 most relevant data points — not dump everything into the next agent's context window.

# JinaToolAsync.py
# Wraps Jina's reader endpoint for concurrent article extraction

class JinaToolAsync:
    async def fetch_article(self, url: str) -> str:
        jina_url = f"https://r.jina.ai/{url}"
        headers = {"Accept": "application/json"}

        async with aiohttp.ClientSession() as session:
            async with session.get(jina_url, headers=headers) as resp:
                data = await resp.json()
                # Return clean markdown — no boilerplate
                return data["data"]["content"]

    async def fetch_all(self, urls: list[str]) -> list[str]:
        # Concurrent fetches — research doesn't wait serially
        tasks = [self.fetch_article(url) for url in urls]
        return await asyncio.gather(*tasks)

Running article fetches concurrently was important for latency. Waiting serially on 4–5 article fetches before writing begins would add 8–15 seconds to the pipeline. With asyncio.gather(), the research phase collapses to the slowest individual fetch — typically 2–3 seconds.

Deep Dive: Structured Output with Pydantic

One of the subtler architectural wins was constraining the Writing Agent's output using Pydantic models rather than free-form text. LLMs, even good ones, drift in structure if you let them. Asking for "a LinkedIn post with a hook, body, and CTA" in a system prompt gets you varying interpretations across runs.

Defining a PostDraft schema forces the model to populate specific fields, making the Editor Agent's job deterministic — it always receives the same envelope, even if the content varies.

# structured_models.py
from pydantic import BaseModel, Field

class PostDraft(BaseModel):
    hook: str = Field(
        description="First 1-2 sentences. Must create tension or curiosity."
    )
    body: str = Field(
        description="3-5 paragraphs. Include at least one research-backed insight."
    )
    call_to_action: str = Field(
        description="1 sentence that invites a specific response from the reader."
    )
    suggested_hashtags: list[str] = Field(
        description="3-5 relevant hashtags. No generic ones like #AI or #Tech."
    )
    tone_note: str = Field(
        description="Internal note to Editor about intended tone/voice."
    )

The tone_note field is worth highlighting. It's not visible to the end user — it's a structured communication channel between the Writing Agent and the Editor. The Writing Agent can signal "this is intentionally provocative, preserve the edge" and the Editor knows not to sand it down into corporate blandness. Agent-to-agent communication doesn't have to be implicit.

Deep Dive: Session Memory with SQLite and Neon

A content tool is useless if it can't remember what you've written before. Repeating the same angles, hooks, or hashtag clusters across 10 posts destroys the illusion of authentic voice. Narrative AI's agent memory is backed by SQLite by default, with optional NeonDB or Supabase for production deployments.

# narrativeai_run.py (simplified)
from agno.storage.sqlite import SqliteStorage
from agno.storage.neon import NeonStorage
import os

def get_storage():
    if os.getenv("USE_NEONDB") == "True":
        return NeonStorage(connection_string=os.getenv("NEON_CONNECTION_STRING"))
    if os.getenv("USE_SUPABASE") == "True":
        return SupabaseStorage(url=os.getenv("SUPABASE_URL"), key=os.getenv("SUPABASE_KEY"))
    # Default: local SQLite, zero config
    return SqliteStorage(db_path="narrative_sessions.db")

team = AgentTeam(
    name="NarrativeAI",
    storage=get_storage(),
    # Agents retain memory of prior sessions
    memory_mode="persistent"
)

The storage abstraction is clean: swap the provider via .env without touching agent code. For local development, SQLite requires no provisioning. For a deployed version with multiple users, Neon or Supabase handles concurrent sessions and persistence across restarts.

Results

The pipeline is slower than a single LLM call — that's the expected cost of coordination. But the qualitative gap is large enough that users consistently chose the slower, higher-quality output. The most important number: the tool compresses a 45-minute workflow (research → draft → edit → format) into about 9 minutes, with user time mostly spent on prompt refinement and final review.

The honest caveat: the "sounds like me" rating is self-reported by early adopters and highly sensitive to how much context the user provides in their initial prompt. More detail in → better output out. Sparse prompts still produce generic results.

What We Tried That Didn't Work

A single orchestrator doing everything. The first version had one agent with all tools — search, scrape, write, edit — and a very long system prompt. Output quality was inconsistent. The model would sometimes skip research entirely and fabricate citations. Decomposing into specialists and removing tool access from agents that didn't need it fixed this completely.

Parallel agent execution. An early design ran Research, Writing, and Editor in parallel, then merged outputs. The Writing Agent — with no research results yet — would begin generating and anchor its entire draft before the research arrived. Merging conflicting drafts was worse than sequential generation. The pipeline is strictly sequential now, which costs some latency but produces coherent output.

Lesson: Multi-agent parallelism is only safe when agents operate on independent subproblems. When one agent's output is another's input, you need sequential coordination, not a race.

Using a shared memory store naively. Early on, all agents wrote to and read from the same session store. The Editor would sometimes read the Research Agent's raw scraped data and try to incorporate it directly, producing posts with awkward news-snippet insertions. Scoping memory by agent role — Research writes to its own namespace, Writing reads only the curated research summary — eliminated this immediately.

Takeaways

Decompose by concern, not by capability. The instinct is to give one powerful agent all the tools and let it figure out the workflow. This leads to unpredictable, hard-to-debug behavior. Narrower agents with single responsibilities are more reliable and far easier to iterate on.

Structured output is a first-class design primitive. Pydantic models aren't just for validation — they're a coordination mechanism between agents. If you're passing freeform text between agents and parsing it downstream, you're accumulating technical debt. Define the contract first.

The storage layer determines whether a tool is a toy or a product. Local SQLite is fine for a demo. The moment you want persistent memory across sessions, user isolation, or horizontal scaling, you need an external store. Building the abstraction from day one (environment-configurable provider) made adding NeonDB and Supabase trivial later.

Give your agents a way to communicate with each other, not just with the user. The tone_note field in PostDraft — invisible to users, read only by the Editor Agent — was a small addition that meaningfully improved voice consistency. Model agent-to-agent communication explicitly; don't rely on implicit context passing.

What's Next

The core pipeline works. What it doesn't yet do well is learn your style over time. Today, voice consistency comes from the system prompt. A better version would maintain a user-specific style profile — extracted from their past posts, refined with each generation — and inject it into the Writing Agent's context rather than asking users to re-explain their tone every session.

The other open problem is evaluation. Right now, "sounds like me" is self-reported. A more rigorous approach would fine-tune a small classifier on each user's historical LinkedIn posts and use it as an automated quality gate before output reaches the Editor.

If you're building in this space or running into similar agent coordination problems, open an issue on GitHub — I'd genuinely like to compare architectures.

The project is free for personal use under the PolyForm Noncommercial license. Commercial licensing is available for teams.

Thanks to the Agno Framework maintainers for making multi-agent memory and session persistence feel like a solved problem. And to the early adopters who gave honest feedback — especially the ones who told me when it didn't sound like them at all.