Skip to content
~/dipjyoti
Go back

An AI Agent in Your CI: Self-Hosted Coding Agents with GitHub Actions

· 12 min read

A teammate opens an issue: the signup form is missing server-side validation and the existing client-side check is trivially bypassed. You could write the fix yourself, or you could assign it to an AI agent and move on. The agent reads the issue, plans the change, edits the handler and the test, runs the test suite, and opens a pull request with a clean diff and a description of what it did.

That workflow already exists, but the existing tools force an uncomfortable choice. SaaS agents like Copilot Workspace or Devin want your repository in their cloud — fine for open source, harder for code under NDA or regulated data residency rules. Self-hosted alternatives like OpenHands need a persistent VM, a model endpoint, a vector database, and someone to keep them running. What most teams actually have is a GitHub repository and a Actions minutes quota. That should be enough.

So I built deep-agent-action: a GitHub Action that runs an AI coding agent entirely inside your CI runner. Mention @agent on an issue or PR, and the agent plans, edits, runs your toolchain, and lands the result as a pull request — all in-process on the runner your workflow already owns. No SaaS, no extra infrastructure, no secrets leaving your boundary.

This post is about why that architecture matters, what the agent can do, and the safety model that makes it viable on a repository you actually care about.

The architecture: an agent that lives where your code already lives

The action is triggered by standard GitHub events — issue comments, PR comments, review comments, new issues, or manual workflow_dispatch runs. When someone comments @agent fix the failing test on an issue, the workflow wakes up, checks permissions, and runs the agent harness in-process on the runner.

flowchart TD
    User[👤 Developer] -->|"@agent fix the signup validation"| Event[GitHub event<br/>issue_comment]
    Event --> Workflow[GitHub Actions workflow]
    Workflow --> Auth{Authorize?}
    Auth -->|no| Refuse[Post refusal comment]
    Auth -->|yes| Ack[👀 Reaction + sticky<br/>tracking comment]
    Ack --> Agent[Deep Agents harness<br/>plans + edits + runs tests]
    Agent --> Land{Mode?}
    Land -->|implement| PR[Open PR with changes]
    Land -->|review| Review[Post inline review<br/>comments on PR]
    PR --> Finalize[Update sticky comment<br/>with PR link + cost]
    Review --> Finalize

The agent itself is powered by the Deep Agents JavaScript harness — a model-agnostic agent loop that plans tasks, reads files, makes edits, runs shell commands, and reports results. Because it runs inside the Actions job, it sees the same checked-out code, the same environment variables, and the same toolchain (npm, pytest, go test, cargo, whatever your repo uses) that your existing CI already has. There is no network hop to a remote service and no credential sync beyond the GITHUB_TOKEN and your model provider key.

Two modes: implement and review

The action enters implement mode by default. It reads the issue or PR context, plans the change, edits files, runs the toolchain, commits, and opens or updates a pull request. On an issue it creates a new branch and PR; on an existing PR it pushes to the PR branch.

Review mode is triggered by starting the instruction with review on a pull request. The agent reads the full diff, analyses code quality, security, and correctness, and posts inline review comments on the PR — the same kind a human reviewer would leave. This is useful for pre-reviewing a junior developer’s PR or catching issues in a large refactor before a senior has time to look at it.

ModeTriggerWhat the agent does
Implement@agent implement … on an issuePlans, edits, runs tests, opens a PR
Implement@agent fix … on a PR commentPlans, edits, runs tests, pushes to PR branch
Review@agent review on a PRReads diff, posts inline review comments
Manualworkflow_dispatch with promptRuns the agent with an explicit prompt, no mention needed

Both modes use the same agent harness and the same model — the only difference is the final output format. This matters because you do not need a separate tool for reviews; one action covers both the code-authoring and the code-reviewing halves of the loop.

The safety model: why in-runner is not the same as unsafe

Running a model-generated agent inside your CI sounds reckless. It is model-generated code executing shell commands in an environment that can push to your repository. The action is defensive by default because it has to be.

Permission gating. Only human collaborators with write or admin access can trigger the agent. Bot accounts are ignored entirely to prevent feedback loops where one bot triggers another. This is the first and most important line: the agent only acts when a trusted human explicitly asks it to.

Fork-PR protection. Pull requests from forks are denied by default. A maintainer can opt in per-PR by applying a label, but the default is refusal. This prevents a malicious fork from crafting an issue comment that triggers the agent on your repository.

Secret-free shell. The agent’s shell runs in an allow-listed, secret-free environment. Your GITHUB_TOKEN, model provider keys, and any repository secrets are never exposed to the agent process. The agent can run npm test but it cannot read PROVIDER_API_KEY. This is not a sandbox — it is an allow-list — but it means a compromised model generation cannot exfiltrate credentials.

Command guardrails. The shell has two lists: an allow-list of permitted commands and an always-on deny-list of dangerous ones. By default the agent can run git, npm, pytest, go, cargo, and other common dev tools. It is blocked from curl, wget, ssh, sudo, nc, dd, and anything that touches the network or escalates privilege. You can add to the allow-list per repository, but the deny-list is immutable.

Human-approval gate. Set require_push_approval: true and the agent opens a draft PR instead of a real one, or pushes to a proposed branch and posts a compare link. A human must review and mark it ready before it becomes mergeable. This turns the agent from an autonomous committer into an autonomous drafter.

These are guardrails, not guarantees. The action is open source; you can read the exact rules in docs/security.md. The point is that safety is not an afterthought bolted onto a demo — it is the starting assumption that makes the whole thing deployable.

Cost controls that actually stop the run

Model API calls cost money, and an agent in a loop can burn through tokens quickly if it gets stuck iterating. The action supports two spend ceilings:

Both are checked after every model call, and both count subagent spend (if the agent spawns child tasks). When a cap trips, the agent stops, lands whatever it has done so far as a draft, and reports the stop in the tracking comment. This is not a budget alert you read later — it is a hard kill switch that prevents a runaway agent from burning your monthly quota on a single issue.

Cost and token usage are also surfaced in the sticky tracking comment and the job summary, so you know what every @agent mention cost before you decide whether to run it again.

Cross-run memory without a backend

One problem with short-lived CI agents is context loss. If you mention @agent on an issue today and @agent again tomorrow with a follow-up, the second run starts from scratch unless something persists the first run’s context.

The action solves this with a compact history stored in the sticky tracking comment itself. Each @agent turn on the same issue appends a summary to the comment. On the next mention, the action reads that comment history and feeds it back as context. No database, no backend, no state machine — just the GitHub comment thread carrying its own memory forward.

sequenceDiagram
    participant U as Developer
    participant C as Sticky Comment
    participant A as Agent (run 1)
    participant A2 as Agent (run 2)

    U->>A: @agent implement signup validation
    A->>C: Post plan + progress
    A->>C: Append: opened PR #42, cost ~$0.04
    U->>A2: @agent also add the email regex test
    A2->>C: Read prior history from comment
    A2->>A2: Build on context from run 1
    A2->>C: Update: pushed to PR #42, cost $0.03

This is limited by GitHub comment size, so the history is compact — file list, plan summary, cost, PR link — but it is enough for the agent to understand what has already been tried and what still needs doing.

MCP tools: extending the agent without extending the action

The agent supports MCP servers via the mcp_config input. Pass a JSON configuration pointing at local or remote MCP servers, and the agent can call them as tools during its planning and execution. This is how you give the agent access to your internal APIs, documentation search, or test result databases without hardcoding those into the action itself.

- uses: dipjyotimetia/deep-agent-action@main
  with:
    model: "claude-sonnet-4-6"
    mcp_config: |
      {
        "mcpServers": {
          "internal-api": {
            "command": "node",
            "args": ["./mcp-servers/internal-api.js"]
          }
        }
      }

Because MCP is the standard, you can reuse servers you already built for Claude Desktop or your IDE. The agent gets the same tool discovery and invocation protocol — tools/list and tools/call over stdio — that any other MCP client uses.

Eight providers, one input

The action supports Anthropic, OpenAI, Azure OpenAI, Google Gemini, OpenRouter, any OpenAI-compatible endpoint (Groq, xAI, DeepSeek, Together, Ollama, vLLM), AWS Bedrock, and GCP Vertex AI. You set the model input and optionally prefix it with a provider. The default is claude-sonnet-4-6, which infers Anthropic.

ProviderExample modelAuth
Anthropicclaude-sonnet-4-6PROVIDER_API_KEY
OpenAIopenai:gpt-5PROVIDER_API_KEY or OPENAI_API_KEY
Azure OpenAIazure:<deployment>AZURE_OPENAI_* env vars
Google Geminigoogle:gemini-2.5-proPROVIDER_API_KEY or GOOGLE_API_KEY
OpenRouteropenrouter:openai/gpt-4oPROVIDER_API_KEY or OPENROUTER_API_KEY
OpenAI-compatibleopenai-compatible:llama-3.1-70b + base_urlPROVIDER_API_KEY
AWS Bedrockbedrock:anthropic.claude-3-5-sonnet-20241022-v2:0AWS env chain
GCP Vertex AIvertexai:gemini-2.5-proADC / GOOGLE_APPLICATION_CREDENTIALS

Provider selection is one input. There is no provider-specific wiring, no SDK differences, no auth format translation. The action handles the provider abstraction so your workflow does not have to.

Why this is not Copilot Workspace

GitHub Copilot Workspace exists and does something similar. The difference is where the agent runs and who controls it.

Copilot Workspacedeep-agent-action
Where it runsGitHub’s cloud infrastructureYour Actions runner
Code accessSaaS; code leaves your boundaryIn-process; code never leaves the runner
TriggerUI-based or @github mentions@agent comments or workflow_dispatch
Model choiceGitHub’s modelsAny of 8 providers, including self-hosted
Shell accessNone (plan-only)Full dev toolchain on the runner
Cost modelSubscriptionPay per model API call; no action surcharge
SafetyGitHub’s policiesYour allow/deny lists, your permission gate, your spend caps

Copilot Workspace is excellent for planning and drafting changes in the browser. Deep-agent-action is for when you want the agent to actually run pytest, compile the code, and verify the fix before opening the PR — because a plan that does not compile is not a fix.

Getting started

Add one workflow file and one secret. The workflow below listens to issue comments, PR comments, review comments, and manual dispatch. For workflow_dispatch, pass prompt explicitly; for issue/PR comment triggers, the action reads the comment body automatically and you can omit prompt.

name: Deep Agent

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  issues:
    types: [opened, assigned]
  workflow_dispatch:
    inputs:
      prompt:
        description: "Instruction for the agent"
        required: true

permissions:
  contents: write
  pull-requests: write
  issues: write

jobs:
  agent:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0

      - uses: dipjyotimetia/deep-agent-action@main
        with:
          model: "claude-sonnet-4-6"
          prompt: ${{ github.event.inputs.prompt }}
        env:
          PROVIDER_API_KEY: ${{ secrets.PROVIDER_API_KEY }}

Add PROVIDER_API_KEY to Settings → Secrets and variables → Actions. Then open an issue and comment:

@agent add server-side validation to the signup form and a test for it

The agent posts a tracking comment, works through a plan, and opens a pull request. That is the whole setup. For a review-only workflow, an approval-gated workflow, or a GitHub App setup so the agent’s PRs trigger your CI, see the examples directory.

What I learned building this

Two things surprised me.

The sticky comment is the UI. I started with a separate dashboard, then realised the one place a developer already looks is the GitHub issue thread. A single comment updated in place — with a live checklist, a plan, a PR link, and a cost estimate — is more useful than any external page. The comment is the product surface.

Spend caps are not optional. In early testing I watched an agent loop on a failing test for twenty minutes, burning tokens each iteration. Without a ceiling it would have kept going until the job timed out. The max_cost_usd and max_total_tokens inputs were added immediately after that run. They are not budgeting features — they are safety features.

The right place for an agent

SaaS agents are convenient until they are not. The moment your code is under NDA, subject to data residency, or simply too large to sync to a third party, the SaaS model breaks. Self-hosted agents are powerful until you realise you are now running infrastructure for a tool that was supposed to save you time.

The middle path is the infrastructure you already have. Your CI runner is already there. It already has your code, your tests, your toolchain. It already has a permission model, an audit log, and a cost ceiling (Actions minutes). Running the agent inside that boundary does not add new infrastructure — it adds intelligence to existing infrastructure.

That is what deep-agent-action is: an agent that runs where your code already lives, using the safety model you already trust, with the cost controls you already need. The only new thing is the model, and you pick that yourself.

If you have a GitHub repository and a model provider API key, the right question is not “which agent SaaS should I trust with my code?” It is “why isn’t the agent already running in my CI?”


Share this post:

Related Posts


Next Post
Containing AI Agents at the OS Level: A Hands-On Look at Microsoft's MXC SDK