Drawing the Rest of the Owl 🦉

Backend Engineering at Gradient Labs 🥼

and

May 03, 2024

We are building an operating system of AI agents that automate manual, repetitive work—starting with customer service.

A natural starting point for building AI agents is to think about prompting large language models (LLMs). But what else needs to happen? The software engineering practices of taking agents into production and turning their output into automated work are complex and not well trodden. And handling LLMs is only one slice of what otherwise needs to become an end-to-end system that integrates with companies that operate with a host of diverse systems.

The entire arena beyond LLM prompting is what we, at Gradient Labs, are affectionally calling “the rest of the owl.” Or, more simply: our backend platform.

The core of our backend platform now has five areas

The crux of our backend platform is similar to what we have used in the context of building banks and infrastructure automation software: Go services.

External-facing services. AI agents need to respond to and interact with the outside world. These services bridge between it and our platform. They connect us to a growing range of support platforms and enable companies to integrate directly with our API.
There are a growing range of resources (conversations, documents, procedures, tasks) that form the core of our platform. Each one lives in its own service, with a bounded context.
A finite-state machine that models conversations and is responsible for triggering our first AI agent, dispatching actions, and handling failures.
The agents themselves, which we currently deploy separately in order to enable more rapid experimentation, and
Finally, an orchestrator over many of today’s popular language model APIs, like Open AI’s GPTs and Anthropic’s Claude(s).

Our stack is both familiar and new

Deciding on which technology to use is an exercise in budgeting innovation tokens—we love to try new tools, but we're building AI agents, not infrastructure, so it's important to pick the ones that give us the greatest leverage. There are two that now have a cornerstone role in our backend platform:

Encore.dev is the backend engine that we use to ship our Go services backed by Postgres databases and Pub/Sub to our own cloud provider account. Its code-first approach means we don’t need to think about provisioning and maintaining anything under the hood; Encore manages everything from environments through to deployments. We even do our similarity-search using Postgres, which is natively supported by Encore, and pgvector. Most serendipitously, Encore gave us a lot of convention and structure, out of the box—which we would otherwise have had to create.

Temporal.io is our choice of toolkit for tackling a range of issues that plague distributed systems. Requests partially fail or time out, providers get overwhelmed and fall over, autoscalers abruptly terminate instances, and—especially for companies like us—LLMs get rate-limited or return a garbage completions. We are now crafting our way of intersecting Encore APIs with Temporal workflows, activities, and signals to structure our long-running, highly parallel processes for resilience and fault-tolerance.

Beyond these, we’ve also adopted Incident.io, Vercel, Google’s BigQuery, and more as we expand our platform. But this is just the start! We are 2,831 pull requests into this journey, and this post was just coarse brush strokes of what we’re building and there is much more to come.

To hear from us again, please make sure that you’ve subscribed below!

Gradient Labs Team

Discussion about this post