Habushu: Supporting Python in Polyglot Monorepo Builds
Synchronizing Maven and Poetry to achieve enterprise-class Python delivery
Why Use a Monorepo?
Monorepos are increasingly popular for good reason. The rationale behind monorepos solves many problems regularly encountered on everyday software engagements, including but not limited to:
Improved build speed
Advanced local and remote caching
Repeatable build order
Intelligent partial builds that target downstream modules automatically
Project sharing with source control
Monorepos for Developer Influencers
Projects are often enticed to adopt the build tooling used by major technology companies leading the monorepo charge. However, leveraging the tooling from these companies and their “developer influencers1” can be a significant challenge, especially without the massive corporate investment and resource pool of these FAANG-class companies. This challenge is further complicated when projects adopt the polyglot solution stacks that are intrinsic to modern software delivery.
For instance, a common open source tooling stack for artificial intelligence (AI) applications likely includes some combination of Jupyter Notebooks, Python, and Docker - at a minimum. Throw in a user interface to front inferencing or reporting and the list expands to includes solutions such as Javascript, Java, Flutter (for Web, Android, and iOS) as well as infrastructure as code (Kubernetes and Helm). Adoption of a monorepo build tool like Bazel for one of these technologies might prove doable for a delivery team, but when that list expands to nearly a half dozen or more, the task becomes more daunting.
Polyglot Monorepos for the Rest of Us
In our business, Maven is the effective ground truth build system based on established expertise, excellent support for common technologies, and strong, well documented patterns. While Gradle is well-known for its task avoidance and incremental build support, Maven also now truly fits the bill of a monorepo build system since introducing first class local and remote build cache support in early 2023.
With that said, support for Python, the lingua franca of data science, is poor. The most common solution for handling Python in Maven requires delegating direct calls to the maven-exec-plugin
(any use of this plugin is an almost certain sign of a hack that will have poor maintenance qualities and cause much future pain). To improve on this situation, Habushu was created.
Habushu Brings Order to Python Projects
Habushu, an open sourced Maven Plugin, extends Maven’s monorepo support to truly handle Python projects. It brings order and consistency to a language that tends to suffer from haphazard, unversioned, and unreproducible qualities. Central to the challenge of managing Python projects is its use of virtual environments to handle project isolation as well as general lack of a common lifecycle to test, package, and deploy projects. Habushu uses the standard Maven lifecycle to bind these Python activities to the build process and drastically increase the consistency and repeatability of Python delivery. The initial implementation leveraged venv
, pip
and pyenv
, to pave the most common paths used by Python developers. However, this proved to be fragile and required too many manual interventions over time. Habushu then updated to leverage Poetry to underpin Python standards and commands, substantially improving the stability and maintenance quality of the projects. Specifically, Habushu provides the following build lifecycle hooks2:
Building a Python module in a consistent, repeatable fashion improves the embedding of Python projects into your polyglot system. But it does not take long before your project likely needs Python modules shared across multiple components to encapsulate some reusable functionality. Habushu helps here as well, providing functionality via the related poetry-monorepo-dependecy-plugin to seamlessly incorporate local dependencies from your monorepo structure into other projects.
When completed, your polyglot system can run a traditional mvn clean install
command that implicitly handles that above lifecycle - all within the same build as many other popular technologies. All typical Maven constructs, such as profiles, release builds, settings.xml
value encryption, etc. all work with Habushu natively.
As with all Habushu functionality, care has been taken to ensure that Python developers can also simply leverage Poetry directly. This allows Python developers to work in a manner that is natural to them, as desired, while simultaneously gaining the benefits of a consistent build process and monorepo structure.
But Python Doesn’t Need a Build, Right?
One related mental challenge often occurs when discussing the idea of Habushu with Python developers. Python is an interpreted language, so there is a strong sense that your don’t need to build it. Builds, however, are only partially about compiling source code. The rest of the build lifecycle discussed above is arguably even more valuable than code compilation in software delivery. Once this is explained and Python developers experience the benefits of the build lifecycle across a couple of releases, we’ve found that their buy in increases substantially.
Borrowed from Jean Yang, who incepts the term “Developer Influencer” and summarizes this sentiment well in her X (formerly Twitter) thread from November 2, 2021
At the time of this article, Habushu 2.10.0 was the latest. You can always find the latest Habushu Lifecycle bindings in the project’s documentation.