Skip to content

Canada's Business and Tech Newsroom

  • Professional Subscription
  • Partnerships & Advertising
  • Licensing & Syndication
Log In Subscribe
Welcome,
  • My Account
  • Log Out
  • Business
  • Tech
  • National
  • The Big Read
  • Briefings
  • Commentary
Search
Log In Subscribe
Welcome,
  • My Account
  • Log Out
News

Big businesses are creating their own tests to find the best AI models

TORONTO — Major firms including Manulife and RBC are now using their own internal tests to better evaluate the performance of new AI models in an attempt to cut through the industry hype.

News

Big businesses are creating their own tests to find the best AI models

Faced with a bewildering array of AI performance claims, firms are building their own tools to figure out what different models can and can’t do

By Murad Hemmadi
The exterior of a building displays the Manulife logo and name above a row of windows.
Manulife’s tech team has built its own set of AI model tests based on use cases in areas like customer service and risk assessment. Photo: The Canadian Press/Cole Burston
Jul 29, 2025
A A
A Small A Medium A Large
Share

Gift

Share

TORONTO — Major firms including Manulife and RBC are now using their own internal tests to better evaluate the performance of new AI models in an attempt to cut through the industry hype.

While AI developers regularly tout the performance of their models on popular third-party evaluations, executives say such assessments don’t really show whether the systems are well-suited for their business needs. “The benchmarks can actually lead you astray,” according to Jodie Wallis, global chief AI officer at Manulife. 

Manulife’s tech team has built its own set of tests based on 47 actual use cases in areas like customer service and risk assessment, Wallis said. The program lets the firm plug the latest releases from AI labs into its existing tools, using its own data. That cuts the time to evaluate a new model down from weeks to minutes, and allows it to more quickly adapt when better technology becomes available, she claimed.

Talking Points

  • Large firms like Manulife and RBC have developed their own benchmarks for testing AI models to work out how well-suited they are for their businesses
  • Tech firms often tout how well their models perform against popular benchmarks, but mathematical or knowledge exams don’t always reflect how well they do on real business tasks

Model makers typically cite the performance of their products on popular tests like MMLU, which tests accuracy across different tasks; MATH, which assesses mathematical problem-solving; or BIRD-SQL, which checks coding ability. Developers also watch AI models’ scores on university admissions exams.

Manulife doesn’t need its tools to ace the LSAT or the MCAT, but it does, Wallis said, want models that deliver the most accurate results at the lowest cost for its in-house applications.

New models out-performing their predecessors by a few percentage points isn’t that important for large firms adopting AI, said Foteini Agrafioti, senior vice-president of data and AI at RBC. Rather than raw power, the bank needs models that it can use securely and quickly scale across the company.

RBC plans to use Cohere’s technology for most of its generative tools. The bank wants to standardize and centralize its AI platform, the same way it has with software for other functions like cybersecurity or engineering, Agrafioti said. A single system makes it easier to keep data safe and transfer tools between different parts of the business, she added.  

RBC tested technology from other firms, including OpenAI, Agrafioti said, but the bank wasn’t willing to send client data to a system hosted on another company’s cloud. So, RBC bought access to North, a Cohere product that lets users launch AI agents powered by the Toronto firm’s large language models. It will run Cohere’s technology on its own servers; the bank has built what it claims is Canada’s largest cluster of the graphics processing units used to power AI. 

Related Articles

Meta founder and CEO Mark Zuckerberg sitting on a white chair, holding a microphone, smiling, against a blue background.

Tech firms are gaming the most popular ranking of AI models, researchers claim

By Murad Hemmadi
OpenAI CEO Sam Altman speaks at the Asia-Pacific Economic Cooperation CEO Summit in San Francisco, in November 2023.

This new test shows the pros and cons of major AI models

By Murad Hemmadi

RBC and Cohere are co-developing a version of North that meets the bank’s security and regulatory requirements. RBC employees building new AI applications that touch sensitive data must use North, which is rolling out to developers at the company this summer. In March, Canada’s biggest bank announced it’s aiming to generate up to $1 billion in earnings using AI by 2027. “It’s graduating out of experiments,” Agrafioti said, to become “mainstream.”

Betting on Cohere as RBC’s main generative AI provider comes with risks, Agrafioti said, since other AI firms might advance the technology faster. The bank will monitor those alternatives and give staff access to the latest models, she said. Still, she’s keen to avoid the distraction of flashy new AI models. “We have something in our hands that does the job we need it to do really well, and we’re going to use that,” she said.

Gift the full article

While big businesses may have the data, the technical staff and use cases to test AI models, smaller firms rely on third-party benchmarks such as LLM Arena and MMLU, including some that researchers have accused tech firms of trying to game. 

Toronto’s Vector Institute is trying to provide a more independent analysis. In April, it released its first “state of evaluation” study, which ran 11 models through 16 tests. “Large companies have built this infrastructure to do evaluations internally,” said Deval Pandya, the institute’s vice-president of AI engineering. “But most of it is not openly accessible.”

#artificial intelligence #Manulife #RBC #Tech #Vector Institute

Loading...

Thanks for sharing!

You have shared 5 articles this month and reached the maximum amount of shares available.

Close
This account has reached its share limit.

If you would like to purchase a sharing license please contact The Logic support at [email protected].

Close
Want to share this article?

Upgrade to all-access now

Close
Gift the full article!

You have gifted 0 article(s) this month and have 5 remaining.

Copy link and gift
Copy Link
Email to a friend
Send Email
Gift on Social Media

Recipients will be able to read the full text of the article after submitting their email address. They will not have access to other articles or subscriber benefits.

The exterior of a building displays the Manulife logo and name above a row of windows.

Photo: The Canadian Press/Cole Burston

Most Popular This Week

A person in glasses and a blue top is sitting and typing on a laptop in an office. A desktop screen next to the laptop displays some blurred-out coding work.
News

A niche white-collar role is becoming the AI industry’s hot new job

By Anita Balakrishnan
A logo that reads AI in blue lettering against a light yellow background.
News

What happened when a VC firm let AI do almost everything

By Catherine McIntyre
News

Canada joins the movement to make AI more open source

By Murad Hemmadi
A close-up of a made-in-Canada stamp on the end of a cylindrical piece of raw aluminum.
Analysis

It turns out Trump does need something from Canada—aluminum

By Joanna Smith

In-depth, agenda-setting reporting

Great journalism delivered straight to your inbox.

Workers position pipe during construction of the Trans Mountain pipeline expansion in Abbotsford, B.C., in May 2023.
News

Carney’s new deal for B.C. paves way for West Coast pipeline

By David Reevely and Meghan Potkins

Briefing

A $4.6B power project tied to a Meta-linked Alberta data centre gets the green light

By Meghan Potkins   |   Jul 2, 2026 | 4:17 PM ET

Quebec launches $1B water infrastructure housing program

By Martin Patriquin   |   Jul 2, 2026 | 4:11 PM ET

Radical Ventures backs TwelveLabs in US$100M Series B for video AI tools

By Murad Hemmadi   |   Jul 2, 2026 | 3:14 PM ET

Best business newsletter in Canada

Get up to speed in minutes with insights and analysis on the most important stories of the day, every weekday.

Exclusive events

See the bigger picture with reporters and industry experts in subscriber-exclusive events.

Membership in The Logic Council

Membership provides access to our popular Slack channel, participation in subscriber surveys and invitations to exclusive events with our journalists and special guests.

Recent Popular Stories

Analysis

It turns out Trump does need something from Canada—aluminum

By Joanna Smith   |   Jun 25, 2026
A close-up of a made-in-Canada stamp on the end of a cylindrical piece of raw aluminum.
News

What happened when a VC firm let AI do almost everything

By Catherine McIntyre   |   Jun 29, 2026
A logo that reads AI in blue lettering against a light yellow background.
News

Alberta to free up a huge amount of power to attract Big Tech and its data centres

By Meghan Potkins   |   Jun 24, 2026
A wide landscape shot of high-tension power lines over green and golden fields in rolling countryside.
Exclusive

Ssense has laid off photo and make-up teams and says AI will do much of their work

By Catherine McIntyre   |   Jun 22, 2026
News

A niche white-collar role is becoming the AI industry’s hot new job

By Anita Balakrishnan   |   Jun 30, 2026
A person in glasses and a blue top is sitting and typing on a laptop in an office. A desktop screen next to the laptop displays some blurred-out coding work.
News

Canada joins the movement to make AI more open source

By Murad Hemmadi   |   Jun 26, 2026

Canada's most influential executives and policymakers are reading The Logic

  • CPP Investments
  • Sun Life Financial
  • C100
  • Amazon
  • Telus
  • Mastercard
  • bdc
  • Shopify
  • Rogers
  • RBC
  • General Motors
  • MaRS
  • Government of Canada
  • Uber
  • Loblaw Companies Limited
logic-logo

Canada's Business and Tech Newsroom

100% human-crafted journalism

Newsroom

  • News Tips
  • AI Policy
  • Editorial Disclosures
  • Story Pitches

Company

  • About Us
  • Terms of Service
  • Privacy Statement
  • Corporate Information

Contact

  • Contact Us
  • Advertise
  • FAQs
  • Work at The Logic

© 2026 The Logic Inc. All Rights Reserved.

Trusted by leaders

Error

Account creation failed.

Please email us at [email protected].

Create Account

[wppb-register form_name=”cozmo-registration-form-for-modal”]

I do have an account
Login
or

[wppb-login]

I don’t have an account