The Rise of The Monorepo?
Conventional wisdom used to be that different projects should exist in different version control repositories. But maybe that shouldn’t necessarily be the case. Monorepos not only challenge that conventional wisdom, they turn it upside down. But is a monorepo the best direction for your organization?
Image by Noppol Mahawanjam on iStock by Getty Images
Different projects should be loosely coupled by definition. That’s what makes them, well, different projects. So, it stands to reason that they should be managed independently and have their source code isolated, right? Maybe?
What seems like a good idea can have some interesting consequences. It gets even more interesting based on where those project boundary lines are drawn. For example, a system consisting of microservices and/or microfrontends could conceivably have several projects, each with one or more independently deployable artifacts. What if one or more of those projects share a common dependency? What if two or more of those projects were tested together? What if someone worked on more than one of those projects? It’s easy to see that jockeying between multiple repositories and maintaining consistent versioned dependencies across different repositories can quickly become a significant productivity drag factor.
What is a Monorepo?
For simplicity’s sake, we’ve adopted the following definition for a monorepo.
A monorepo (mono repository) is a single repository that stores all of your code and assets for every project.
Monorepo vs Independent Repositories
Monorepo | Independent Repositories |
---|---|
One place for source code | Many different places for source code |
Builds must be segmented within the repository | Repository may be able to have a single build |
Unified versioning - dependencies can be shared within the repo | External dependency management for shared dependencies |
Shared source control management | Teams own their source control management |
Scale becomes a significant consideration as the codebase grows | Scale is not a significant issue |
Anyone who has ever worked in an organization that required them to check out multiple repositories at a time has certainly bemoaned granular source control repository demarcations. That doesn’t mean a monorepo is the answer. There are trade-offs with having either a monorepo or independent repositories.
Consider the following workflow.
Checkout -> Make Code Changes -> Check-In -> Review
That [very basic] workflow could be for trunk-based development or feature branch development (add in a step for a PR/merge) or whatever. But regardless, there are some implications that should be addressed. Presumably, at some point prior to or during review, the code changes will be built in the context of their dependencies and tested. If this were being done in a repository that only contained relevant source code for a single project build then that single build would be executed with some level of automated testing. However, with a monorepo, building everything on an incremental change to a single project is probably not realistic - it would require multiple build definitions and an understanding of the artifacts and their dependencies. This is something many widely used Continuous Integration tools do not natively support - not without jumping through a few hoops, anyway.
This problem, along with the very real problems that come with scaling a monorepo are the reasons why Google rolled their own monorepo and associated tooling to support the use of that repository.
Matt Klein also expresses an entertaining and well reasoned opposition to monorepos in his post, Monorepos: Please don’t!.
Single Project Repository
In many cases, and for the reasons above, a monorepo is not going to be the solution for an entire organization. But what about a repository for a single project that consists of multiple independently deployable projects? That’s not exactly a monorepo by our definition, but it doesn’t follow the one repository per logical project model, either. Some of the benefits are…
- One source of truth/code for the entire system
- Dependency management across projects is virtually eliminated since many project dependencies are collocated within the same repository
- Scale becomes less of an issue since it’s based on the size of the system’s codebase, and the people supporting it, not the scale of the entire organization
**Note: Some might consider this a monorepo of sorts.
Let’s take a project/code structure like this:
System Root:
-Service A (Java)
-Service B (Java)
-Service C (Python)
-Library A (Java)
-Frontend A (ReactJS)
-Frontend B (ReactJS)
The languages and the project archetype are not important. What is important is that there are different systems, with different languages and different dependencies within the System Root. To solidify this point, let’s further assume that Frontend A is dependent upon Service A, Frontend B is dependent upon Service B and both Service A and Service B are dependent upon Library A.
The Build Problem
The problem with single project repository builds is the same problem that was seen with a monorepo, albeit at a smaller scale: the entire repository does not need to be built/tested/validated for integration that is local to a single project. If someone makes a change to Service C and Service C doesn’t have any dependency relationship with Frontend B, then why should an integration only involving Service C assume the time and complexity required to build/test/validate other decoupled projects? The answer is that it shouldn’t.
Solving this problem requires the build automation to be smart enough to know that only Service C must be built/tested/validated on integrations that are localized to Service C. More than that, the build system needs to be smart enough to track metrics and trends per project and across the entire repository as a whole. This also means that at some point, not necessarily on integration (which needs to provide a very short feedback loop), the entire system does need to be built/tested/validated together.
Possible Solutions
1. Root Build Definition
It’s a pretty common practice for modern CI tools to look for a build definition file at the root of the repository using some kind of convention. Jenkins expects a Jenkinsfile to be relative to the root of the repository, TravisCI expects a .travis.yml config file at the root of the repository and CircleCI expects a .circleci/config.yml config file relative to the root of the repository. The root configuration could define the configurations for all the projects within the repository and how they relate to each other. Heck, with a Jenkinsfile, you could just write your own bespoke build program (please don’t do this).
The problem with this approach is that the different projects within the repository are supposed to be independently maintained. Coupling them all to one root build definition doesn’t decouple them; it has the opposite effect!
2. Each Project Owns Its Build Definition
This is a better solution for several reasons: changing one project’s build definition does not impact another project, build definitions are contained within the projects that they build - a project can be treated as standalone and dependencies are always defined local to their project, not at a global repository level. Of course, the obvious rub here is the CI tool must recognize and support such a configuration.
Summary and Conclusions
The idea of a monorepo, given credibility through adoption by tech giants like Google and Facebook is an interesting idea that comes with some very real technical challenges. Scaling is probably the most notable technical hurdle associated with monorepos, which will likely make it impractical for most organizations. However, at the other extreme, segmenting each independently deployable project into its own repository can have its own hurdles in the form of added complexity and management overhead.
A more reasonable approach to code organization might be to have repository demarcations drawn against project boundaries, which may contain multiple loosely coupled subprojects, each producing one or more deployable artifacts. Drawing repository boundaries using this approach is best served by decentralized build automation that is localized on a per-project basis. Such a configuration should be natively supported by build automation tooling to avoid scaffolding complexities and maintenance overhead.