API Architecture
Last updated
Last updated
To be fair they are both beautiful buildings! Photo by Jene Yeo on Unsplash and Shinya Suzuki.
So, what’s the architecture of your web application? It sounds like a deceptively simple question — maybe even a softball first-round interview question.
Turns out, after nearly 30 years of the Internet, the answer is far from straightforward, as page one of a Google search for the above term reveals:
We should start by clarifying what we mean by the word “architecture.” At its most basic, this is a question whose answer is just a picture and some arrows.
At its most basic, software architecture is just simple shapes with arrows — just a map and a path
The Clean Architecture looks like this:
Android app architecture, as proposed by Google, looks like this:
What about web application architectures? Surely something as straightforward as a web application — one of the foundational pieces of software we were building back in the mid 1990s — will be a well-documented victory lap of classes, hierarchies and data flows by the 2020s?
The godfather of modern, opinionated web application architecture is probably Ruby on Rails — so what happens when we go searching? (Maybe skip the results from April 2021 😬). Not a lot of helpful results:
The Internet generally understands Rails architectures to be based on MVC, though a diagram like this doesn’t offer much in the way of guidance in how to construct even a moderately complex app. The Rails Guides site isn’t a guide so much as an index of every class in the Rails universe, with little in the way of explanation about how they fit together. Here is a more recent take as of 2017:
Even then, this seems to suggest that our app requires 3 (maybe 4?) classes, dominated by Action controllers and Active Records (aka models).
We got it node. You’re proud of your event loop.
The remaining web results when googling nodejs web application architecture are a hodgepodge of SEOd links for new developers learning to program, with a few shady certification programs thrown in the mix. I’d normally suggest you see for yourself, but I suspect these results have a higher-than-normal likelihood of giving you spyware.
But it still feels like we are a step away from answering the meat-and-potatoes problems of software developers tasked with building web apps and APIs.
What are the classes I should expect to write when building a web application or API, and how do they fit together?
To be fair, Microsoft does have a number of smart points to make, including use of Clean architecture, dependency inversion, and the importance of testability. However, those points are buried so deep under jargon about ASP.NET and Azure deployment models that it largely passes unnoticed amongst the non-MSFT developer world.
As an example, “Client performs a request to a back-end service…Users can configure the internal functionalities of a client, like a request and response transformation, validation of schema, circuit breaking and retries, timeout and deadline management, and error handling”.
So…how do you write the “back-end service?” And doesn’t the service have the same considerations (validation, parameter extraction, etc) of this top-level API gateway? According to Conway’s law, your API structure is your team structure.
Most companies aren’t going to have dedicated teams for each aspect of their API, so the “Client” layer described by Uber probably won’t exist at your company — it will be a “Service” layer that does the thing the API is supposed to do, and we need to know, now, how to design that Service.
Larger applications by FANG have teams devoted to every piece of the stack, plus tooling, security, source code management, etc. But what about developers and projects who want to build testable, maintainable, scalable, large production apps? What are the classes, data flows, and hierarchies that we should expect to build?
At Perry Street Software, we have been building a large API for the past decade to power our family of dating apps. We recently have devoted time to codifying and upgrading our server API architecture in a way that actually maps quite closely to the Clean architecture, with inspiration from functional/reactive programming paradigms that have (rightly) become popular of late.
Too often the line and/or the arrows are skipped right over — we should pause to explore the nature of the arrows themselves.
In this series we are exploring patterns for web API architecture. In order to build our case, we must first take a detour to explore common patterns of code execution whenever you call a remote API endpoint.
When code is executed, it passes through a series of layers, each of which takes some amount of time, until each layer has run to completion and returned, and a value is ultimately presented to the user.
Normally, each of these steps takes a consistent amount of time, unless one of these layers happens to be i/o bound — it is hitting network, disk, or a database.
We call this the V shaped execution path. For small web apis, modeling your execution this way works pretty well. As your application begins to scale, however, the V becomes harder to maintain for all but the most trivial of endpoints.
As an app scales, write operations to relational databases will soon become your bottleneck. Though it’s also possible that read operations can cause slowdowns, generally you can just “put a cache in front of it” and speed things up.
It should be intuitive that relational write operations are a bottleneck — the ACID guarantees provided by relational storage systems like Mysql and Postgres are costly, and as more and more users write to the same table or tables, those systems will slow. Services like AWS Aurora have raised the bar of performance for relational systems, but limits still exist. For our company, our relational database service, powered by AWS Aurora, is one of the most expensive components in our AWS bill each year, driven primarily by Database Storage and IOs.
Ultimately, web application performance starts to resemble a “U”, not a “V”, when your app waits on the results of a database operation. Even if you have multithreaded or non-blocking application servers like nodejs, your application code will still be waiting for a response from the database before providing the user with a response to his POST, PUT or DELETE operation.
At a certain point it becomes clear that, for consistent, reliable, and resilient endpoints, any state mutation requires a queue and an asynchronous confirmation. It is the job of every endpoint to get data back to the caller as soon as possible. In the event of a GET operation, once data is retrieved, that operation is concluded. In the event of a POST, PUT or DELETE operation, that endpoint should again respond to the caller as quickly as possible with a promise of an additional, later callback, likely over a websocket, with the results of that state mutation.
Indeed, this is the exact architecture of gRPC, Google’s remote procedure call framework. Clients make calls and the server responds with 1 or many messages, until the channel is closed. While our API does not (yet) use gRPC, the basic need for a communication loop is at the core of any modern, high performance web API.
As part of the response to the initial request, we additionally fire off a new asynchronous process that itself may invoke more business logic and data layer operations, with the results ultimately delivered via an asynchronous protocol like websockets.
In order to have an “architecture” to your code, you need a map and a path, a diagram and some arrows. But what, exactly, do we mean by a “path” — what is the nature of the arrows and how do they behave?
Which eventually gets mapped to some sort of HTTP response code + a response body.
Imagine we have a simple API endpoint, post ‘/user/email’
. We want to update the email address of a customer and notify him that it has been changed.
You might imaging the following method in our v1 quick-and-dirty method (implemented here in Sinatra/Ruby):
OK, but how do we handle errors here? We need to check if it is valid:
We also want to check if there is no customer record:
What if the database fails to do its write operation?
And what if our smtp service fails to notify about the change? Maybe we want to log that fact?
With imperative code, error handling is critical, and as APIs scale more errors and edge cases become possible or visible. Error handling in imperative code makes the code much harder to read.
In 2014, a developer from Microsoft coined the term “Railway Oriented Programming”, which derived much inspiration from functional programming methods. He rightly pointed out the importance of “designing the unhappy path” and thinking about functions with more than one output.
The realization that we need functions that return multiple values, either success or error responses, is the foundation of railway oriented programming and the basis of our API design today.
A failure can occur anywhere in our processing chain, and we needed a streamlined way to exit processing and return a Failure to the client. Or, if everything is good, to return a Success.
The series of classes, or steps used to process the endpoint represents a functional unit, which always results in a Success or Failure, which contain resulting data (for a successful read operation for example) or information about the failure that occurred (such as during validation).
The construction and composition of two-track functions is based on theories in functional programming. There are number of important rules that govern how to build and compose “two-track functions”. We suggest reading through all of the slides at: https://fsharpforfunandprofit.com/rop/
To build a two-track execution path we use dry-monads. Dry monads defines monads, which is a formal name for a special ruby mixin that defines two new Result types: Success and Failure
Now, any time we might have returned a value, such as an ActiveRecord model object, we now will return a Success(ActiveRecord)
Moreover, whereas we may have called raise SomeError
or even halt 4XX
, we now will return a Failure(SomeError)
The documentation for dry-monads is described here, but the final form can be seen here:
In this implementation, you can see that find_user now returns a monad, not a user directly. The bind method unwraps success values and makes them available to the blocks. With additional metaprogramming magic thanks to ruby, we can simplify this block even further:
Thus far we have been talking about server-side architecture, but it is worth noting that the principles we are describing have a direct mapping to client-side architectural patterns as described by Reactive programming.
In Swift, we could rewrite that code block in Combine, which has the concept of a Publisher, which defines Output
and Failure
types similar to dry-monads above:
In Kotlin, we could rewrite that code block in RxJava using the Single operator, which again defines both value and error outputs:
We know that, in the event we transition to an error state in one of those layers, we will remain on the “Error” track until we ultimately return a value to the user via a POST (or potentially, though not necessarily always, in a Websocket).
It’s time we start defining some layers! Taking inspiration from Clean architecture and using the principles of V-U-W execution flows and railway oriented programming, we are ready to define the architecture that we use today for our API endpoint design.
We have the following layers (and colors) which map to the original Clean architecture:
🔵 FRAMEWORKS & CLOUD ↓ 🟢 INTERFACE ADAPTERS ↓ 🔴 APPLICATION LOGIC ↓ 🟠 ENTITY LOGIC ↓ 🟡 DATA
None of the layers have visibility into higher layers. They may have references to their child layer, but definitely not their grandchildren
This diagram also depicts our W-shaped execution flow we described in an earlier blog post, this time represented by arrows that start both at the HTTP layer and again at the Interface Adapter layer from a Queued asynchronous job.
Now it’s time to show you the classes of our architecture.
Any endpoint request must be routed to the appropriate code path through a Load Balancer, Web Server, Application Server, and an API / Web Framework. (We use Sinatra for this last part, but popular frameworks include Rails, Django, Spring Boot, and others). API frameworks offer the most documentation online and is a common (the most common?) architectural structure.
Once a request reaches your API or web framework, however, patterns can diverge widely.
Consequently, everything that comes after the Frameworks + cloud layer is (mostly) novel and inspired by the Clean architecture. We will be making the case for our architecture and our decisions in the remainder of this series.
In any given layer, we will have one or more supporting classes. These are classes that are going to be single-purpose and have no references to other layers above or below them. They help with code re-use and duplication, and enable us to avoid writing complex god classes in a given layer.
We include cloud services as supporting classes of our Framework layer. AWS services like EC2, SQS, RDS and ElastiCache are supporting classes — NOT “inner” or central layer — because, as others have pointed out, the UI and the database depend on the business rules, but the business rules don’t depend on the UI or database.
Once a request comes in via our framework, a Controller
orchestrates the processing of the endpoint by invoking a Request
object to extract each parameter, validate its syntax, and authenticate the user making the request.
Controllers are the first place our application code is introduced. Controllers instantiate our classes and move data between classes in the 🔴 Application Logic layer.
It’s important to note that Controllers are not the only orchestration object in the Interface adapter layer. We also have Jobs
, which are used in our asynchronous queue processing layer (more to come).
Controllers depend on classes including Validators
, which check the syntax of incoming data, and Presenters
, which format outgoing data, and Response
objects, which map objects and/or hashes into JSON, HAML and other formats.
Socket relay
classes communicate state changes to the client over a socket communication channel, such as Websockets.
Request classes are typed data structures that bring together the necessary components for the request being made. This is different from standard HTTP requests, which (assuming CGI) are made up of key/value string pairs.
Response classes are like renderers in Rails, and enable you to return HAML, JSON or other types.
Parameter extractors extract data out of the params hash and converts it to properly typed values, such as ints, floats and strings.
GET request made to read endpoints are next passed to the Application Logic layer where a Service
ensures the validity of the inputs, makes sure the user is authorized to access data, and then retrieves data from the Entity Logic Layer through a Repo
(for databases) and/or Adapter
(for APIs). Service objects return Result
objects as defined by dry-monads.
POST, DELETE and PUT requests made to write endpoints do the same thing as read endpoints, but defer processing by enqueuing Service
inputs through our queue — Amazon SQS — and write the data to the Entity Logic Layer through a Job
or Service
.
Jobs are used to orchestrate side effects, such as sending a socket message through a Relay after a data mutation completes
In our implementation of the Clean API architecture, the Service
class itself assembles a distinct collection of Validator
classes to provide an additional layer of semantic validation for a request. Thus, we have two layers of validation — syntactic, happening via the Request
layer, and semantic, happening via the Service
.
Entity logic refers to components that are common not only to this endpoint, but others as well. Repository
classes, which provide us access to persistent stores like Mysql or Postgres databases, and Adapter
classes, which provide us access to APIs, including AWS storage apis like S3, ElastiCache and others. We expect classes in this layer to be used over and over again; classes in the layer above are often single-purpose to an endpoint.
This is ideally a very simple layer that provides an actual interface into our different storage systems. If you use Rails you will likely be receiving ActiveRecord objects; APIs are ideally returning ruby structs.
In other domains — such as Android or iOS development — we have created interfaces to our data storage layer. We use dependency injection so that, when we run tests, we are doing things like creating mocked, in-memory versions of SQLite data storage, rather than using real filesystem-backed data storage systems.
On our webserver, because of the dynamic nature of ruby, we use stub_const to overwrite singleton cloud services with mocked versions, or we will point singleton cloud services to local docker containers running Redis, Memcached, Mysql, etc.
Storage systems that power the Data layer, such as Mysql and Postgres, that are implicitly at the bottom of the diagram are very likely going to be process-wide singletons that are ideally injected or mocked. ActiveRecord maintains its connection pool; systems Redis and Memcached will also likely need some kind of global pool or singleton managing access.
Stateless HTTP-based APIs, such as S3, DynamoDb, and others, are typically going to be mocked by instance_doubles or overridden connection parameters that point to local mocks.
We know what you’re thinking. Isn’t this just another example of extra engineering and complexity that I don’t need, and that maybe you don’t need either?
The first line, post ‘favorite’ do
, is hook into Sinatra 🔵 framework, as you might expect.
The last line is a form of presentation logic that is part of our 🟢 Interface Adapter layer. It is simply a 200
response with no body.
What about the other layers in our Clean Api architecture? Can we see echos of those in this 6-line snippet?
Params are passed directly without any extraction, and together the two symbols — :target_id
and :creator_id
— form an implied Request
, comprising a target_id
and creator_id
.
Because of the lack of any validation or presentation customizability, there is no need for a controller so there is no equivalent in this code.
The where
clause takes your request — a target_id and creator_id — and finds a domain object — a Favorite
.
The first_or_create
provides the Entity logic — interacting with persistent storage via a special ActiveRecord method
The Favorite
model is equivalent to your Data layer, providing a definition for the object being used.
Each of these steps we outlined in our Clean API Architecture is still happening in our simplified example, they are just happening implicitly. As a result, if we stick to our simple example, it makes things harder to test, and more importantly, harder to decompose as logic grows. How do you add more validations? Modify more classes? Add push or socket notifications? Endpoints become dozens, hundreds of lines long, and thus responsibilities become scattered throughout the code. We also fail to explain what happens if we encounter errors or how we handle authentication.
The takeaway is clear: even the simplest APIs have to make these architectural choices. In the absence of explicit classes and rules, these choices are either assumed, implied or ignored, but they cannot be avoided.
We’ve hopefully convinced you that this isn’t quite as easy as you might have first thought. Before we show how to build an endpoint, lets spend time rigorously documenting everything we expect our endpoint to do for us, and connect those responsibilities back to our Clean API Architecture.
In our series on Clean API Architecture, we have explained how there are multiple layers involved in responding to an HTTP request. Broadly speaking, there are two kinds of requests — the kind that mutate state by writing to a database (POST/DELETE/PUT), and the kind that merely return state from a database (GET).
(For brevity, when we say POST from now on, we implicitly mean POST/DELETE/PUT).
For both GET and POST endpoints, you must fulfill the following responsibilities:
🔲 RESTful Routing (Which endpoint should we execute based on verb and URL pattern?)
🔲 Control Flow (Order our execution, also handle errors)
🔲 Logging (Write out request info for debugging and analysis)
🔲 Parameter Extraction and Coercion (Read inputs and store in variables)
🔲 Syntactic Validation (Are the inputs in the correct format?)
🔲 Semantic Validation (Do the inputs align with our business rules?)
🔲 Authentication (Which profile is making this request?)
🔲 Authorization (Does this profile have access to the requested data?)
🔲 Presentation (Format the data into the correct format for clients, if data is being returned)
To prevent potential data loss and reduce the perceived user impact of each of these scenarios, in our architecture, all write operations are queued and performed by a task running on a separate machine (or in a separate cluster). We call these machines DBWriter machines.
Additional responsibilities of write endpoints, over and above what read endpoints are doing, include:
🔲 Job Configuration (Creating a hash with the necessary data to perform work in the background)
🔲 Job Enqueuing (Writing a message to a queue, like AWS SQS)
🔲 Queue Processing (DBWriter processes that pull Job Configs from the queue and process Jobs)
🔲 State mutation (Writing to one or more data sources)
🔲 Async Relay (Returning write confirmations back to clients via WebSockets or chat cluster)
We’ve identified no fewer than 14 separate responsibilities of an endpoint.
In our previous post we discussed the Clean API architecture. We will now explain how and when in our Clean API architecture we address each of the responsibilities in the checklist above.
The steps in this execution path can be summarized as:
🔵Receive→🟢Validate (Syntactic)→🔴Validate (Semantic)→🟢Enqueue→🟢Respond
The request is processed on an API server. Note the two stages of validation — one syntactic, one semantic. After validation, the job is enqueued and returns a success code to the client. In case of failure, no job is enqueued and an error code is returned to the client.
When the API server receives a POST request, it sends that request to a queue, in our case AWS SQS. Later, on a DBWriter server, the execution path is summarized as:
🟢Dequeue→🔴Mutate→🟢Notify
Let’s now connect our checklist with the two execution paths for a POST request.
Circles 🟢🔴🔵 come from our execution path; beneath each we identify which parts of our ✅ checklist they address.
On the API server
🔵 Receive: Route the request to the correct Controller
RESTful routing ✅
Control flow ✅
2. 🟢 Validate: Extract the input parameters, validate their syntax, and authenticate the profile in the Request, using Params and Validators. Return the inputs required to perform the operation at the service layer.
Parameter Extraction and Coercion ✅
Authentication ✅
Syntactic validation ✅
3. 🔴 Validate: Validate the inputs against business rules using a Service
Semantic Validation ✅
Authorization ✅
4. 🟢 Enqueue: Configure the JobConfig and enqueue the job.
Job Configuration ✅
Job Enqueuing ✅
5. 🟢 Respond: Hand back receipt confirmation, with a 200 OK status code in the Response, that the write operation has been successfully enqueued and the optional JSON data.
Presentation ✅
On the DBWriter server
🟢 Dequeue: Receive enqueued job with Job or Service
Queue Processing ✅
2. 🔴Mutate: Change the database via operations defined in the Service
State mutation ✅
Logging ✅
3. 🟢 Notify: Send back an asynchronous Socket Message to the client that the operation has succeeded or failed.
Async Relay ✅
In the section above we defined a checklist of operations that all endpoints must fulfill. We then connected that checklist to the execution paths suggested by our Clean API architecture. Now we are going to see what this looks like in practice.
By now, hopefully you have gotten the memo about the JavaScript event loop. No? Don’t worry, because every nodejs architectural diagram is just this:
What is the architecture of your nodejs web API? Event loop.
The Java world has a bit more maturity when it comes to describing application architecture, though Java was widely ridiculed in the late 2000s and early 2010s for its perceived complexity and the overwrought nature of its main development IDE, Eclipse.Or any IDE, really
This ancient blog about Spring Boot shows at least 4 layers with perhaps 6–7 different class types:
More recent blog posts have tried to rationalize this layered approach with the Rails MVC architectural approach:
O’Reilly predictably gives us a more disciplined discussion of Java EE architecture:
Can we look to the giants for guidance? Microsoft has plenty to say about common web application architectures. So much so, in fact, they helpfully included 14 diagrams to aid your understanding.
Also, they need more animated GIFs…Too. Many. Words.
Uber recently documented its API approach, with the following stack:
This approach comes closest, I believe, to offering practical advice, but it’s predicated on big-company patterns like microservices.Car = Microservice
The truth is — most web applications are small, and their architectures don’t matter. They’re going to be rewritten every N years, so they emerge as a monolith maintained by one or a handful of developers…and it’s fine.I’ll just rewrite it once Coinbase releases its JavaScript framework in 2025
We call this our Clean API Architecture. Before we begin, however, we are going to need to take a brief detour into the execution flow of typical web applications. Remember, any architecture is just a picture and some arrows, a map and a path.The island is code; the red line is an execution path; the map is an Architecture.
Never hurts to refresh the basics
Regardless of your application architecture — 1, 2, 3, or 10 layers, all “happy path” execution basically looks like a “V” shape. A call stack is built, probably culminating in a data access request from a database (or caching layer), and that data is returned, massaged, and delivered to the client via an HTTP response.
At a certain scale your execution looks more like this:
Below you can see an example of what a basic single-response operation might look like:
Put another way — as we pass data between the layers, what are we passing? In a simple architecture, it is likely that you would be passing a parameters hash down the layers:
And receive a model object up the layers
This strategy works for simple APIs, but what happens when we start to have errors?Follow the (functional) road!
(Note: this section adapted from a presentation by F# for Fun and Profit)MS actually has a lot of good ideas.
When we first model endpoints, we think about them as simple functions with one input and one output:
As the endpoint increases in scope, we may think to break it apart into various steps, layers, components, etc:
But in the real world, we learn that each of these steps has the potential to fail, and we have to provide mechanisms for the interruption of control flow at each of these steps:
Errors are Responses too
In turn, each step can return a Success or Failure, which the orchestrator (the Controller) uses to stop or continue processing.
We visualize the functional processing above as two parallel railroad tracks. For a completely successful request, the train proceeds along the green (Success) track until finished. However, if a failure occurs anywhere along the way, the train switches to the red (Failure) track, and we return the error to the client.
Haskell humor. See Stack Overflow.
In a previous blog post we described patterns of web API execution flows. The final pattern we recommended, W-shaped execution, can be updated in the following way:Two lanes on each path — one for success, one for failure
We started this 6-part series on how to build web APIs by introducing the variety of architectures that have been proposed or put into use by various languages and frameworks over the years. Among the most commonly discussed architectures online is the Clean architecture, which aspires to produce a separation of concerns by subdividing a project into layers. Each layer abides by the Single Responsibility Principle, ensuring each class is only handling one part of the process, and is more easily and thoroughly unit tested.
The Clean architecture can be used in many domains. In another blog series we describe how we applied Clean to our mobile applications. Today, we are going to talk about how we apply Clean to API endpoints. We call this the Clean API Architecture:
You had me at Clean
Frameworks like Rails employ an MVC pattern that works well for smaller projects, but have weaknesses when it comes to large, high availability APIs such as ours. Rails models and controllers get fat very quickly when applied to large APIs.The inevitable course of a Rails model class
Let’s explore this for a minute or two. Imagine the world’s simplest API — adding a favorite for a profile. This is what it might look like in a simple endpoint:
In fact, this endpoint is implicitly relying on each of the layers we have defined above. Here is how:
The write operations we listed earlier each need to update columns in an RDBMS like Mysql. As anyone who has managed RDBMS systems at scale can tell you, they can be unreliable — bad queries can cause slowdowns of other requests, schema changes can lock tables for short or long periods of time, disk slowdowns lead to query slowdowns which lead to thread pool exhaustion.Your database when you have too many requests.
Yes this will be on the test
Let’s start by visually identifying the execution path of a GET request in our Clean API architecture.The path highlighted in yellow is for GET requests
The path highlighted in yellow is for POST requests
Writing endpoints should be