Incepting Artificial Intelligence (AI) Efforts with an AI Systems Engineering Mindset
Should Artificial Intelligence (AI) efforts start with data collection or requirements? This question is easily answered when we use an AI systems engineering rather than a tool-driven mindset.
If you find this post useful, please ❤️ it so other people can find it more easily in Substack!
How We Got Here
In this series, we have looked at why Artificial intelligence (AI) systems engineering matters and how AI roles fit into an approach to sensibly deliver AI-enabled applications. We will now dive into kicking a project off in a sensible fashion. Along the way, we’ll discuss the critical AI consideration of whether to start your effort by amassing all the data you can or using requirements to drive your effort.
Incepting a Release
The creation of a holistic, detailed roadmap across several releases remains a common approach to starting a new initiative. While some value falls out of this approach, future release planning is best kept high level. This allows validated learning to drive future release details rather than assumptions made at the outset of initial planning. That is, you don’t know how things will work out until you do them; focusing on what's next provides real data on which to determine if your current vision is supported by reality. With better data, planning a subsequent release has substantially more fidelity.
Inception activities revolve around establishing a common understanding of the project's scope - specifically focusing on primary use cases, priorities, and risks. With more clarity on goals, work on architecture and initial system realization can also begin. The following drill-down of our Rational Unified Process (RUP) inspired Hump Chart diagram outlines specific focus areas during inception.
Please note we are changing the disciplines compared to those traditionally used in the classic RUP Hump Chart. At the end of the series, we will circle back and discuss why in more detail.
Focus on a Common Functional and Technical Understanding
AI systems engineering focuses on ensuring that everyone involved in an initiative has a common understanding of the effort’s functional and technical goals. This means that all roles (business analysts, data scientists, and software architects) should have a solid understanding of the business objectives, risks, and processes involved in the project. This common understanding will help everyone involved to collaborate and communicate more effectively, making the project more successful.
Establish High Level AI Requirements
A system cannot exceed the quality of its requirements. Unfortunately, many AI efforts skip requirements gathering completely. Before we jump into our AI systems engineering approach for establishing high-level requirements, we’ll discuss the most common pitfall.
Extreme Data Hoarding Does Not Pave the Road to AI Success
AI needs data - and lots of it. As a result, many projects start by accumulating data. It is not uncommon for AI efforts to start by building a massive data lake. This approach hopes to accumulate all the data so Data Scientists can then sift through it and learn the “art of the possible”. But is this the right approach?
If your organization is otherwise looking to adopt a data lake strategy, preemptively collecting data may make sense. That said, it’s unlikely your best course of action. A better approach is to determine what you want to accomplish with AI in the first place. It is rarely advisable to build a massive piece of infrastructure before testing how well it supports your desired capabilities1. If you still want to adopt a data lake strategy, start with a “data pond” by only populating the data needed to achieve existing requirements. Then learn from the result and refine your data lake approach requirement-by-requirement.
Systems Engineering Drives a Common Understanding
Understanding desired system capabilities requires the expertise of business analysts, data scientists, and software architects. Business analysts provide the mission context and goals. Data scientists help match applicable machine learning approaches to available data in order to address business needs. Software architects provide the technical framework to scale and consistently deliver business and data science needs. All roles help identify risk and set relative prioritization. An approach should be established to retire risks as early as possible.
Nominate a Candidate Architecture
Our candidate architecture will represent the technical approach that we believe will suit our solution. Establishing the major pillars and communicating this approach is critical so we can achieve a consensus path forward and retire technical risks. Many efforts confuse this step with the specific technologies they are going to implement. Using Databricks or Sagemaker is fine; however, either tool requires detailed planning to map its capabilities into an initiative-specific solution. Otherwise said, simply using a popular tool does not obviate the need to have candidate architecture.
AI projects need a candidate architecture for both machine learning and software. For the software concepts, this is not a novel approach. For machine learning, designing a candidate approach is often met with pushback. Data scientists commonly argue you must explore the data and potential fit of machine learning models before developing any plan of attack. These exploratory analysis activities are part of the standard machine learning development lifecycle - and not that different than how software architectures are designed. Engineering solutions should reasonably be written down and communicated - whether for data science or software. If that plan needs to be revised, that's just part of the normal process. You wouldn’t tolerate researchers evaluating random solutions in a back room in any other part of the system - don’t allow this with machine learning either. Projects can't norm on an approach or validate progress without transparency.
The evaluation of your candidate architecture should focus on your real requirements to reduce technical risk as quickly as possible. Testing it on toy examples often hides the real challenges that are lurking in your nuanced solution. Bigger risks should be tackled first. For AI efforts, this certainly includes validating data and model assumptions. It likely also includes appropriate “responsible AI” concerns (e.g., bias detection).
Supporting Inception Considerations
While inception is heavily focused on requirements elicitation and establishing an architectural approach, it is also important to begin work on upcoming AI systems engineering disciplines.
Start Implementing Risky Architectural Concepts
Once a solid architectural approach is defined, unleash machine learning engineers and software engineers to validate the key risks and concepts. This achieves real progress towards the initiative's final solution and opportunities to course correct. It will almost always be appropriate to buy down ML risk, which in turn will leverage approaches like machine learning notebooks. Keep track of these notebooks as ML prototypes that need subsequent engineering to become repeatable, high-quality software.
Determine Your Test Approach
How fast projects can move is ultimately enabled, or limited, by the effort's test approach. As implementation begins it is advisable to also start testing in an automated fashion (where possible). This both validates the test approach and creates a repeatable test suite that keeps the team confident that each new change does not negatively impact prior work.
Kick Start Operations
DevOps and MLOps are often the last discipline to start on a project. Several benefits can be realized by starting these during inception. With a candidate architecture available, work can begin on the deployment architecture. AI efforts should understand storage size assumptions and computing power needs. Overlooking these concerns is easy with common infrastructure as code (IaC) techniques, but they need to be validated from a fit-for-purpose and price perspective. Projects frequently overlook the recurring cost of AI infrastructure. Leverage inception to begin testing deployment assumptions to tune infrastructure to your bottom line (e.g., use the operator pattern vs. always on resources). Stand-up needed Continuous Development/Continuous Deployment (CI/CD) infrastructure to dovetail with emerging implementation and test activities.
Information assurance (IA) is a crucial Operations activity to begin during Inception. Many large organizations (e.g., financial, power, Government agencies) have strict security postures. Start vetting your chosen products and platforms as well as create the required accreditation documentation for your effort. Failure to start this activity early creates a common pitfall where the system is ready to deploy but stuck in the organization’s IA approval queue.
Up Next - Elaboration
Incepting your project with an AI systems engineering approach provides the foundation for a successful implementation. Bringing key resources together to ensure a solid understanding of the effort's AI business requirements, candidate architecture, and risks drives consensus and spreads the common approach across the various teams of analysts and engineers. Getting an early start on follow-on tasks around implementation, testing, and operations sets the project up for success.
Next we will shift focus to elaboration phase activities. Building on the concepts in this post, we'll discuss how to drive technical risk out of our AI system by focusing on demonstrable progress.
Blog Series Links:
Incepting Artificial Intelligence (AI) Efforts with an AI Systems Engineering Mindset (this post)
Data lakes are often supported by vendors like DataBricks or Palantir. It’s worth understanding what you have to pay to keep your data lake operational with these types of tools. There are often license and SaaS costs that can quickly multiply as your data lake fills. These costs may crowd out your ability to fund other business needs if you fill your lake with data that isn’t critical to your business operations.
This does NOT mean these tools are bad; rather, it means you should ensure you understand how the costs for any tool are likely to play out over time.