It's a crowded market. On the CLI-agnostic cloud agent positioning, there are only startups so far. Only incumbent is Github Agents as you mentioned in another thread.
Yes, broadly. The main structural difference is that we’re agent-agnostic, so we can combine lab-native CLIs in one workflow. GitHub will likely struggle there because they have direct partnerships with Anthropic and OpenAI.
On the features themselves, we have a better UX across integrations, and more advanced features like video recording.
This seems like a weak argument. GitHub is already agent (not just model) agnostic, they have Copilot and Claude Code. I just don't see how this is a business, sorry.
On gh-aw: it looks solid for the event-driven automation shape (triage, docs sync, CI fix). We're after a slightly different shape: interactive back-and-forth, steering from Slack or Linear, persistent sandboxes with a booted dev server for live previews. Thanks for the pointer, I'll dig into it more.
On labs eating our lunch: it's definitely a risk. Our bet is that reusing lab-native CLIs is enough to position ourselves in the market
On behind the firewall: it's something we're looking into. We open-sourced agentbox-sdk in that direction.
On computer use: Yes. Sandboxes come with a computer-use CLI for driving Linux GUI apps via X11.
On triggers: Cron, GitHub (PRs, issues, @twill mentions in review comments), Slack, Linear, Notion, Asana webhooks, plus CLI and web. Our PR-comment workflow is you would have to tag @twill with an instruction. That being said, you can also setup a daily cron on Twill that checks PRs with a specific label like Confidence Score : x/5 and tell it to auto-approve when 5/5 for example.
On setup scripts: Per-repo entrypoint script, env vars, and ports, all accessible on the UI. There is a dedicated Dev Environment agent mode that you start with to setup the infra. You can steer the agent into how to setup if it gets stuck. So this should be smooth. The agent can also rewrite the entrypoint mid-task.
There is also a Twill skill you can add to your local agents to dispatch tasks to Twill. Meaning you can research and plan locally using your CLI and delegate the implementation to a sandbox on Twill.
Jules is similar to Twill with the following differences:
- Twill is CLI-agnostic, meaning you can use Claude Code, Codex or Gemini. Jules only works with Gemini.
- We focus on the delegation experience: Twill has native integrations with your typical stack like Slack or Linear. The PRs comes back with proofs of work, such as screenshots or videos.
Similar but reusing lab-native CLIs like Claude Code or Codex, which they perform RL on. And so in the long-run, we believe this approach wins over custom harnesses.
We’re focused on SWE use cases. Code is nice because there’s already a built-in verification loop: diffs, tests, CI, review, rollback. But you do quickly get to a state where the agent needs to make a risky action (db migration, or an infra operation). And this is where the permissions features from the agents are handy: allowlist, automode, etc. So you have approve/reject only the high risk actions.
And I think this risk model is valid for both technical and non-technical use cases
reply