cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Dealing with Chaos

vdramba
Level 3 Flexeran
Level 3 Flexeran
1 0 426

vdramba_2-1623217518451.png

A C4 Architecture Diagram

Dealing with Chaos

So you finally got to it, you kept your weekend schedule clear, woke up early and did that thing you have been putting off for too much time — dealing with the mess in your room. Look at it, it’s in perfect order now, wall to wall. Everything is in the right place, sorted by access frequency, organized to please the eye and your visual memory. Even a guest would find things here, it’s perfection.

If only there was a way to make it last. For some reason, the tech debt of your room keeps piling up with time. And that happens with only one person interacting with the room. Imagine if there were a few teams constantly adding and moving stuff. Things would go downhill fast.

Welcome to large software development. My analogy, inspired by the way entropy is sometimes explained, might look funny, but it’s not far from reality. Sure, instead of a room you might have a large hall for a monolithic design, or, at the other end of the spectrum, a group of buildings each with apartments, each with rooms. You get the idea.

At Flexera, adopting a microservices architecture with multiple levels of modularity places us in the later group. While the architecture pattern helps, it doesn’t solve the problem by itself. Keeping things in order is a challenge, the bigger the system, the harder it is.

I remember, some years ago, in a different context, I had to add a little feature to an old application. Connecting to the database was part of it. There were several popular database adapters for that language. I looked around in the code to see which one they use, so I could follow the same pattern. I found them all and some custom inventions to boot in that single codebase.

Smart people are smart, but each in her own particular way. No amount of documentation, training or enterprise process can solve this problem completely. We had to come up with a better solution. We have teams distributed around the globe, each with its own domain knowledge, developing an ever-growing number of microservices organized in components connected via clean domain-centric APIs. Or at least that’s the big plan. We constantly face this entropy force, melting the architecture into an amorphous, highly connected graph.

Our solution is a set of tools, automation and associated practices:

  • Design First — define APIs using a ubiquitous DSL which makes it easy to list all API specs in one place
  • Diagrams as Code — author C4 architecture diagrams in a DSL that links back to the source code
  • Static code analysis via “Chaos” — check the code against the architecture and best practices
  • Service Map — a unified portal for accessing API specs, documentation, architecture and Chaos quality score

Design First

APIs are central to developing a microservices architecture. When we add a new microservice, after the abstract intent of the service is defined, we start with the design of the API. This is where the interface with the rest of the system is defined as well as the main constructs that the service exposes together with their semantic. We use Goa, a framework that allows expressing the API in a simple DSL and generates specialized service helper code, client code and documentation. For APIs that cross the team boundary, approval is requested from the consumer side. For external APIs, there’s even more scrutiny. There’s a dedicated team that reviews them to ensure they are consistent with the rest of the Flexera public APIs.

Diagrams as Code

Documenting a distributed architecture is not an easy task. After several iterations, we settled on using C4 as the common visual language. The C4 model helps teams to efficiently and effectively communicate their software architecture at different levels of detail, like zooming in and out in Google Maps. Another important choice we made is to author the diagrams in code, by using a DSL. This way we can not only collaborate and track changes, but we have tools that automatically check if implementation and diagrams are in sync. This diagram authoring solution grew quite a bit, and we decided to open-source it as part of the goadesign project. We wanted to have it all in code, but it turns out no automatic layout is good enough for complex graphs. Therefore, I felt the need to contribute a brand new graphical editor that allows for a clean, meaningful visual layout. The full C4 authoring solution is now open-source. The final diagrams are uploaded to the Service Map website, and the full model is provided to Chaos as JSON so that it can be compared with the implementation.

Chaos

By now, you probably get the irony. The tool that fights disorder is called Chaos. It’s Code Health Adviser Open Service, but no one cares anymore. The important part here is that the tool is developed as an internal open-source project so devs can study the code and contribute. But let’s dive in. The core of the tool is an executable which runs in the CI/CD stage once all tests have passed and does the following:

  • collects API specs and documentation from Goa DSL
  • collects README files for microservices and modules
  • parses and checks the code against a list of validation criteria including:
    - microservices code should not import code from other microservices
    - code should check the health of other services that are consumed
    - microservices should use authentication for all APIs
    - repository layouts should follow agreed upon directory/package scheme
  • parses and checks the code against the architecture model, and validates that:
    - a diagram representing this code repository exists
    - all microservices are represented in the diagram and vice-versa
    - relationships between microservices are correctly represented (best-effort at this point)
  • checks that README files exist for the repository and for each microservice

Service Map

Service Map is the internal website aggregating all of the above. There are 3 main sections: architecture, APIs and Chaos.

vdramba_3-1623217621056.png

API Browsing section in Service Map

Architecture

You can browse the entire architecture, starting from an overview - the System Landscape in C4 terms - and you can zoom in to see the composition of each module, represented as Software Systems in C4 language. The website supports deep linking so you can share views in various channels. Diagrams contain metadata that link elements to code and makes it possible to navigate from diagrams to the API specs, documentation, Chaos reports or directly to source code.

API Browser

One of the important plugins of Goa is docs. It generates a transport agnostic view of the API specification allowing you to focus on the business logic and modeling. That’s the first tab in the API browsing section. Next, you can see transport specific spec (OpenAPI, gRPC proto) and Readme files. You can also quickly see the composition of the team that owns each microservice or module.

Chaos Results

This section displays reports for each microservice, with extensive details: what has been detected, why we think it’s not the best and how to fix it. There’s also a link to the code that generated that particular notice, and an invitation to contribute. A score is computed for each microservice, and it’s summed up for the module. The lower the score, the better. When the score is zero, you get a green check mark, and you can brag about it.

Unexpected benefits

Chaos solves the problems it was designed for pretty well, but sometimes, it goes even further.

In one instance, scanning a git repository for microservices, it identified that a certain executable (main go package) was a microservice. The team owning the repository pushed back saying it’s just a tool run as a daily cron job. This triggered a debate on what defines a microservice. The conclusion was that such a daemon should be considered one and should thus it should follow best practice.

In another surprising find, Chaos detected a subtle, hard to spot code dependency that had the potential to blow things up at the worst time. The import was channeled through a goa DSL feature that makes it possible to map user types with generated types. While the feature has its place for some use cases, it creates a coupling between the generated API client and the service implementation. It again prompted for a debate, and we decided it’s best to avoid the feature for our microservices.

These are just a couple of many other unexpected useful results.

To conclude…

Though our entropy defeating strategy might seem complex, and it’s not really fail-proof, it turned out to be the right investment. We introduced this system gradually, starting with pilot programs followed by several rounds of fixes and validation. By the time passing the checks become part of the requirements, teams are already using the tools, and the added value speaks for itself.

Traditional monolithic applications counter entropy by involving frameworks, SQL data modeling and domain partitioning mostly done via naming conventions. It’s rare to see a clear modular architecture in such context. A microservices architecture is modular by design, so we get some level of order by default. But as the system grows, the distance between the size of a microservice and the size of the whole system becomes too big. To fill this gap and successfully maintain large, enterprise software on the long run, an integrated and automated solution is not optional.

vdramba_4-1623217700929.png

Service Map System Diagram

Engineering

Engineering Careers

Come utilize next level developer tools to design next level cloud solutions.