We Read Article 9 Fifteen Times. Here's What It Actually Means.

Article 9 is 847 words. That's about two pages. It took me three weeks to fully understand what those 847 words actually require you to build.

The problem isn't that Article 9 is poorly written. It's actually quite logical once you see the structure. The problem is that every sentence creates obligations, and those obligations interact with each other in ways that aren't obvious on first read.

I've now read Article 9 about fifteen times. I've compared it against ISO 31000 (risk management standards), mapped it to ISO 42001 requirements, and tried to implement it for two different AI systems. Here's what I've learned.

What Article 9 actually requires

Article 9(1) opens with this: "A risk management system shall be established, implemented, documented and maintained."

Four verbs. Four obligations. Most teams focus on "established" and "documented" because those feel like one-time activities. They're not. The words "implemented" and "maintained" mean this is ongoing. Forever. As long as you operate the AI system.

The risk management system must be "a continuous iterative process run throughout the entire lifecycle of a high-risk AI system." That's a direct quote from Article 9(1). It means your risk management isn't a document you create once. It's a process you run continuously.

The four components of a risk management system

Article 9(2) defines what the risk management system must include. I've broken it down into four components:

Component 1: Risk identification and analysis

You need to "identify and analyse the known and the reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights."

Notice the scope: health, safety, and fundamental rights. Most technical teams focus on safety (will it crash? will it give wrong outputs?). But fundamental rights is equally important. That includes discrimination, privacy violations, impacts on human dignity.

"Reasonably foreseeable" is doing a lot of work here. You can't just document risks you've already seen. You need to anticipate risks that a reasonable person could predict. That means threat modeling, misuse analysis, and edge case documentation.

Component 2: Risk estimation and evaluation

Once you've identified risks, you need to "estimate and evaluate the risks that may emerge when the high-risk AI system is used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse."

Two scenarios matter:

Intended use: What happens when the system is used correctly?
Foreseeable misuse: What happens when users inevitably use it wrong?

The second one trips up most teams. "But that's not how it's supposed to be used!" doesn't help you when an auditor asks about foreseeable misuse. If a reasonable person could misuse your system in a harmful way, you need to document that risk and explain how you've addressed it.

Component 3: Risk mitigation

You must "adopt appropriate and targeted risk management measures designed to address the risks identified."

The word "targeted" is important. Generic risk measures don't count. If you've identified a specific risk, you need a specific mitigation. "We have quality processes" isn't sufficient. "We have input validation that rejects [specific problematic inputs]" is getting closer.

Component 4: Documentation and communication

Everything needs to be documented. But Article 9 also requires that risk management measures be communicated to users through instructions for use (per Article 13). Your risk management isn't just internal—it affects what you tell deployers.

The lifecycle requirement everyone misses

Article 9(1) says the risk management process must run "throughout the entire lifecycle." Most teams interpret this as: design phase, development phase, deployment.

But the lifecycle extends further. Article 9(2)(c) requires evaluation of "risks that may emerge... based on the analysis of data gathered from the post-market monitoring system."

Translation: your risk management system must continue running after deployment. When you collect data about how the system performs in production (which you're required to do under Article 72), that data must feed back into risk identification and evaluation.

This creates a feedback loop:

Phase	Risk Activity	Input
Design	Initial risk identification	Intended purpose, known hazards
Development	Risk evaluation and testing	Training data analysis, model behavior
Pre-deployment	Mitigation verification	Test results, validation data
Post-deployment	Continuous risk monitoring	Production data, incident reports, user feedback

Each phase generates information that must flow back to risk identification. If you discover a new risk in production, it needs to be analyzed, evaluated, and mitigated. The loop never ends.

Residual risk and the acceptability problem

Article 9(4) introduces a concept that caused us significant debate: residual risk.

After you've applied all your mitigation measures, some risk remains. That's residual risk. The regulation requires that this residual risk be "judged acceptable."

But acceptable to whom? And by what standard?

Article 9(4) provides some guidance: residual risks must be "communicated to the user." And when evaluating acceptability, you must consider "state of the art" and "the expectations of users."

This is where I'm still working through the implications. My current interpretation:

State of the art: What risk levels are achievable with current technology? If competitors can achieve lower error rates, your "acceptable" threshold should probably match.
User expectations: What do users reasonably expect from this type of system? A medical diagnostic AI has different expectations than a recommendation engine.

Testing requirements that will hurt

Article 9(5) through 9(7) cover testing. This is where compliance gets expensive.

Pre-deployment testing

Testing must happen "prior to placing on the market or putting into service." Your risk mitigation measures need to be verified before deployment, not after. This seems obvious, but many teams ship first and verify later.

Real-world conditions

Article 9(6) requires that testing be conducted "under conditions that are as close as possible to real-world conditions." Lab testing with clean data isn't sufficient. You need to test with messy, real-world inputs.

For us, this meant:

Testing with production-like data (properly anonymized)
Testing with adversarial inputs
Testing under load conditions matching expected production
Testing with edge cases from actual user behavior

Specific group testing

Article 9(7) adds a requirement that trips up many teams: testing must cover "specifically the persons or groups of persons on which the system is intended to be used."

If your AI will make decisions about job applicants, you need to test it with data representative of actual applicants. If your system will be used by elderly users, test with elderly users. Demographic representation in testing data is now a compliance requirement, not just a best practice.

How to actually implement this

After mapping all of Article 9, here's the implementation approach that worked for us:

Step 1: Create your risk register

Start with a structured document listing every identified risk. We use a spreadsheet with these columns:

Risk ID (for traceability)
Risk description
Risk category (health, safety, fundamental rights)
Likelihood (before mitigation)
Impact (before mitigation)
Mitigation measures
Residual likelihood
Residual impact
Acceptability justification
Review date

Step 2: Define your lifecycle process

Document when risk reviews happen. For us:

Major model updates: full risk review
Monthly: post-market monitoring data review
Quarterly: comprehensive risk register review
Annually: full system re-assessment

Step 3: Create test protocols

For each risk mitigation measure, define how you'll verify it works. Include:

Test data requirements (demographics, edge cases)
Pass/fail criteria
Documentation requirements
Sign-off responsibility

Step 4: Connect to post-market monitoring

Establish the feedback loop. Production incidents should automatically trigger risk register updates. User complaints should be categorized against your risk taxonomy.

Key Takeaways

Risk management is continuous: Not a one-time document. Article 9 explicitly requires a "continuous iterative process" throughout the AI lifecycle.
Four components matter: Identification, evaluation, mitigation, and documentation. Miss any one and you're non-compliant.
Foreseeable misuse counts: You must analyze risks from misuse, not just intended use. "But that's not how it's supposed to work" isn't a defense.
Testing requirements are specific: Real-world conditions, representative populations, pre-deployment verification. Lab testing alone won't cut it.
Residual risk needs justification: Document why remaining risks are acceptable. Consider state of the art and user expectations.

What to Do Next

Audit your current risk documentation: Do you have the four components? Is it treated as continuous, or a one-time document?
Check your test coverage: Does your testing meet the real-world conditions and demographic representation requirements?
Establish the feedback loop: How does production monitoring data get back into risk assessment? Build that connection now.

Stay compliant out there.
— The Compliantist