Expertise

Blog

When the Assessors Come: FATF Mutual Evaluations, the Effectiveness Era, and What They Mean for Banks

AI

Compliance

Regulatory

Every few years, a small team of assessors arrives in a country and spends roughly a year judging it against a single global rulebook: the FATF 40 Recommendations .

The assessment is not a loose verdict on whether a country is ‘doing enough’ about financial crime; it is a structured measurement against an agreed international standard. The 40 recommendations are that standard: the common benchmark, adopted by more than 200 jurisdictions, for what a credible anti-financial-crime regime must contain.

They set out how a country should criminalise money laundering and terrorist financing; how banks and other firms must identify their customers, monitor activity and report what looks suspicious; how the true (beneficial) owners of companies and trusts are made transparent; how targeted financial sanctions are applied; and how authorities investigate, prosecute, confiscate proceeds and co-operate across borders.

A mutual evaluation measures how faithfully a country has written these recommendations into law and, increasingly, how well they actually work in practice. The verdict can move capital, reprice risk, and reshape the day-to-day work of every bank in the jurisdiction.

This is the FATF mutual evaluation: the most consequential exam most people in finance have never read. This paper explains what these evaluations are for, how they work, what one of them did to an entire banking sector, and which countries, including Canada, are now in the firing line.

What a Mutual Evaluation Is For

The Financial Action Task Force (FATF) is the global standard-setter behind those recommendations, and the body that polices them. It does not supervise banks directly. Instead, it defines what a credible national regime should look like and then checks whether countries actually live up to it. A mutual evaluation is that check: a peer review in which assessors from other countries examine a jurisdiction’s laws, institutions and, crucially, its results.

The purpose is threefold. Firstly, it aims to hold governments accountable to a single global benchmark, so that a dollar laundered through one country is not simply someone else’s problem. Secondly, it’s used to surface concrete weaknesses and force a remediation plan. Thirdly, it acts as a signal to the market: a poor evaluation, and especially ‘grey-listing’, tells every correspondent bank, investor, and counterparty in the world to treat that jurisdiction as higher risk.

The evaluation is therefore not an academic exercise, it is the mechanism that converts an abstract international standard into pressure that lands ultimately on the desk of a bank’s money-laundering reporting officer.

How The Process Works

A mutual evaluation assesses two very different things, and understanding the distinction is the key to understanding everything that follows.

Technical compliance asks whether the laws and rules are on the books. Each of the 40 recommendations is rated compliant, largely compliant, partially compliant, or non-compliant.

Effectiveness asks whether any of it actually works. Against 11 ‘immediate outcomes’ – covering everything from supervision to confiscation to terrorist-financing investigations – a country is rated high, substantial, moderate, or low.

That second dimension is the one that changed the game. Until 2013, a country could paper its way to a good grade by passing laws it never enforced. The introduction of effectiveness assessment meant a jurisdiction now has to demonstrate outcomes, such as prosecutions secured, assets recovered, and suspicious activity detected and acted upon.

Dimension

Rating scale

What assessors look for

Technical compliance

Compliant, Largely Compliant, Partially Compliant, Non-Compliant

Are the laws, regulations and powers in place?

Effectiveness

High, Substantial, Moderate, Low

Do the system's outcomes actually disrupt financial crime?

The mechanics run roughly as follows. The country completes a detailed self-assessment and submits data. Assessors review it, then conduct an on-site visit: weeks of interviews with regulators, prosecutors, banks, and other private-sector firms. A draft Mutual Evaluation Report (MER) is debated and then adopted at a FATF plenary.

From there, the country enters a follow-up process. If the deficiencies are serious enough, the report can trigger the International Co-operation Review Group (ICRG) , the pathway that leads to the ‘grey list’ – jurisdictions under increased monitoring – or, in the gravest cases, the ‘black list’ – currently including Iran, North Korea, and Myanmar.

The timeline is long and the consequences are sticky. From the start of the assessment to an adopted report is typically around a year, but the follow-up that follows can run for many more.

A country that scores poorly must report back on its progress at regular intervals, often under “enhanced” follow-up, and a grey-listing ends only when the FATF is satisfied, on the strength of a fresh on-site visit, that the action plan has been completed and the reforms are sustainable.

Importantly, the private sector is not a bystander in this process. Assessors specifically test whether banks understand their risks and whether supervisors hold them to account, which is why a national evaluation rapidly becomes a set of very specific demands on individual institutions.

A mutual evaluation no longer asks only “do you have the rules?” It asks “can you prove they work?

The New Round: Why People Argue Over ‘Fifth’ Versus ‘Sixth’

FATF began a new round of evaluations in 2024 under a revised 2022 Methodology, running on a six-year cycle to around 2030. Confusingly, this is FATF’s fifth round, but several FATF-Style Regional Bodies – MONEYVAL in Europe, for example – count their equivalent as a ‘sixth round’ – because their own cycle history is numbered differently.

The standards and methodology are the same; only the label differs. For a FATF member such as Canada, the correct description is its fifth-round assessment.

The new round sharpens the effectiveness turn in four ways that matter for banks:

  • More weight on effectiveness and on a country’s specific risks and context, so assessors focus where the money and the threats actually are, not where convictions are easiest.

  • The financial sector and the designated non-financial businesses and professions – lawyers, accountants, real estate, dealers in precious metals – are now assessed separately, so weak links can no longer hide behind strong banks.

  • A dedicated focus on the effective outcomes of countering proliferation financing, an area long under-examined.

  • Greater scrutiny of new technologies under Recommendation 15, covering both the risks they create and the tools institutions use to manage them.

The follow-up process is also more focused on fixing effectiveness gaps, with faster escalation for countries that drift. In short: the bar is higher, the timelines are tighter, and “we have a policy” is no longer an acceptable answer.

Case Study: The UAE, From Grey List to AI-led Reform

No recent evaluation illustrates the stakes better than the United Arab Emirates (UAE) . In March 2022, following its mutual evaluation, the UAE was placed on the FATF grey list for strategic deficiencies in how it tackled money laundering and terrorist financing. For a global trade, gold and financial hub, the signal was severe: heightened due diligence from correspondent banks, slower cross-border flows, and a reputational cost measured in investor confidence.

The response was unusually fast and unusually deep. The UAE exited the grey list in February 2024, roughly two years and at the quick end of the range, by demonstrating not just new rules but new results.

That turnaround translated into a wave of concrete work for banks and other financial institutions:

  • Customer due diligence and beneficial-ownership processes were tightened and re-papered across the customer base.

  • Transaction-monitoring systems were rebuilt, including real-time monitoring that linked higher-risk sectors – crypto exchanges, gold traders, real-estate firms – into a shared financial-intelligence picture.

  • Sanctions and name screening were re-tuned to raise detection while controlling the false-positive load that effectiveness-era supervision punishes.

  • The Central Bank backed expectations with enforcement, issuing substantial fines to banks and exchange houses for AML failures, a clear signal that demonstrable effectiveness, not box-ticking, was now the test.

Two things made this an AI inflection point rather than a hiring spree. Firstly, the volume and complexity of the work – more monitoring, more screening, higher-quality suspicious-activity reporting – simply could not be met by adding analysts alone. Secondly, the supervisory demand had shifted from “show me your process” to “show me your outcomes,” which rewards systems that can detect more, explain their decisions, and scale.

Leading institutions moved accordingly: Mashreq, for instance, deployed cognitive-AI technology to strengthen its AML framework, and the Central Bank of the UAE has since issued guidance on the responsible adoption of AI and machine learning by licensed financial institutions. The post-evaluation period did not just create work for banks; it pulled the entire market towards AI-led screening and monitoring.

The grey list did not just demand more compliance from UAE banks; it made AI-led screening and monitoring the only realistic way to deliver it.

Who Is Next: The Forward Look, Including Canada

Because the fifth round runs to roughly 2030, most major economies will be re-assessed against the tougher effectiveness bar over the next several years. The country watching most closely right now is Canada.

Canada was last evaluated in 2016, when assessors flagged weaknesses in beneficial-ownership transparency, real-estate oversight, and enforcement. Its fifth-round on-site visit is scheduled for November 2025, with the Mutual Evaluation Report due to be adopted at the June 2026 plenary; the assessment considers improvements made up to October 2025 and will focus heavily on whether Canada can demonstrate effectiveness, in particular its ability to investigate and prosecute financial crime.

Ottawa has moved to strengthen its position ahead of the exam with:

  • A federal beneficial-ownership registry, launched in 2024, requiring disclosure of individuals with significant control.

  • Legislative reform raising penalties and expanding AML coverage.

  • A new dedicated financial-crime agency, announced in 2025, to centralise and sharpen enforcement.

For Canadian banks, the implication is direct: supervisory expectations will rise around beneficial-ownership verification, transaction monitoring, sanctions screening, and the quality, not just the quantity, of suspicious-transaction reporting.

The UK is also preparing for its own upcoming evaluation, with industry and government bodies already standing up dedicated taskforces. The pattern is consistent: the months around a mutual evaluation are precisely when boards approve the investment in screening and monitoring capability that they had previously deferred.

Jurisdiction

Status (5th round)

What banks should expect

UAE

Grey-listed 2022; exited 2024

Completed: rebuilt monitoring, AI-led screening, enforcement-backed effectiveness.

Canada

On-site Nov 2025; MER adopted Jun 2026

Effectiveness scrutiny – beneficial ownership, real estate, prosecutions – higher SAR quality.

United Kingdom

Preparing for upcoming round

Taskforces standing up; focus on outcomes and supervisory effectiveness.

Others

Rolling through to ~2030

Major economies re-assessed against the higher effectiveness bar.

What A Mutual Evaluation Means for a Bank

Although the FATF assesses countries, the workload lands on institutions. A jurisdiction cannot demonstrate effectiveness unless its banks can, so the period around an evaluation typically converts into a concrete remediation agenda.

Five demands recur:

  1. Risk understanding you can evidence: Supervisors increasingly expect a living, data-driven enterprise risk assessment, not a static document, that demonstrably shapes where controls are focused.

  2. Beneficial-ownership and due-diligence rigour: Verifying who really owns and controls a customer, and refreshing that view, is a perennial weak point that assessors probe hard.

  3. Monitoring and screening that detect more while flooding less: Effectiveness rewards genuine detection and penalises both missed risk and the false-positive backlog that buries it.

  4. Suspicious reporting judged on quality: The test is no longer how many reports are filed, but whether they are timely, accurate, and genuinely useful to the financial-intelligence unit.

  5. Demonstrable, explainable decisions: When a supervisor asks why an alert was closed or a customer retained, the institution must be able to show its reasoning, at scale and on demand.

Each of these is easier to assert than to prove, and proving them is precisely what the effectiveness era demands. That gap, between what a regime claims and what a bank can actually demonstrate, is where the real cost, and the real opportunity, of a mutual evaluation now sits.

The Effectiveness Era Is an AI Inflection Point

Put the pieces together and a clear thesis emerges. The mutual-evaluation regime has shifted decisively from technical compliance to demonstrable effectiveness. Effectiveness means detecting more genuine financial crime, acting on it faster, and proving you did so, all while controlling the false-positive burden that drowns analyst teams.

Those objectives are difficult to reconcile with rules-based systems and headcount alone, especially in jurisdictions where volumes are high and threats are sophisticated.

That is why each evaluation cycle now doubles as a technology cycle. The UAE showed the template: a grey-listing turned into a national programme that pulled banks towards AI-led screening, monitoring, and investigation.

Canada, the UK, and the rest of the fifth-round cohort face the same logic. For banks, the lesson is not to wait for the assessors to arrive. The institutions that fare best treat the mutual evaluation as a forcing function they can get ahead of, investing early in screening and monitoring that can detect more, explain its decisions to a supervisor, and scale without a linear increase in cost.

In the effectiveness era, that capability is no longer a competitive advantage. It is the price of admission.

Contributor

James Booth

Head Anti-Money Laundering, Counter Terrorism & Sanctions

James Booth

Head Anti-Money Laundering, Counter Terrorism & Sanctions

Share article

Latest news

Discover how AI is Revolutionising Compliance and Risk Adjudication

Download our latest collateral to stay ahead.

Discover how AI is Revolutionising Compliance and Risk Adjudication

Download our latest collateral to stay ahead.

Discover how AI is Revolutionising Compliance and Risk Adjudication

Download our latest collateral to stay ahead.