Skip to content

Canada's Business and Tech Newsroom

  • Professional Subscription
  • Partnerships & Advertising
  • Licensing & Syndication
Log In Subscribe
Welcome,
  • My Account
  • Log Out
  • Business
  • Tech
  • National
  • The Big Read
  • Briefings
  • Commentary
Search
Log In Subscribe
Welcome,
  • My Account
  • Log Out
News

Big businesses are creating their own tests to find the best AI models

TORONTO — Major firms including Manulife and RBC are now using their own internal tests to better evaluate the performance of new AI models in an attempt to cut through the industry hype.

News

Big businesses are creating their own tests to find the best AI models

Faced with a bewildering array of AI performance claims, firms are building their own tools to figure out what different models can and can’t do

By Murad Hemmadi
The exterior of a building displays the Manulife logo and name above a row of windows.
Manulife’s tech team has built its own set of AI model tests based on use cases in areas like customer service and risk assessment. Photo: The Canadian Press/Cole Burston
Jul 29, 2025
A A
A Small A Medium A Large
Share

Gift

Share

TORONTO — Major firms including Manulife and RBC are now using their own internal tests to better evaluate the performance of new AI models in an attempt to cut through the industry hype.

While AI developers regularly tout the performance of their models on popular third-party evaluations, executives say such assessments don’t really show whether the systems are well-suited for their business needs. “The benchmarks can actually lead you astray,” according to Jodie Wallis, global chief AI officer at Manulife. 

Manulife’s tech team has built its own set of tests based on 47 actual use cases in areas like customer service and risk assessment, Wallis said. The program lets the firm plug the latest releases from AI labs into its existing tools, using its own data. That cuts the time to evaluate a new model down from weeks to minutes, and allows it to more quickly adapt when better technology becomes available, she claimed.

Talking Points

  • Large firms like Manulife and RBC have developed their own benchmarks for testing AI models to work out how well-suited they are for their businesses
  • Tech firms often tout how well their models perform against popular benchmarks, but mathematical or knowledge exams don’t always reflect how well they do on real business tasks

Model makers typically cite the performance of their products on popular tests like MMLU, which tests accuracy across different tasks; MATH, which assesses mathematical problem-solving; or BIRD-SQL, which checks coding ability. Developers also watch AI models’ scores on university admissions exams.

Manulife doesn’t need its tools to ace the LSAT or the MCAT, but it does, Wallis said, want models that deliver the most accurate results at the lowest cost for its in-house applications.

New models out-performing their predecessors by a few percentage points isn’t that important for large firms adopting AI, said Foteini Agrafioti, senior vice-president of data and AI at RBC. Rather than raw power, the bank needs models that it can use securely and quickly scale across the company.

RBC plans to use Cohere’s technology for most of its generative tools. The bank wants to standardize and centralize its AI platform, the same way it has with software for other functions like cybersecurity or engineering, Agrafioti said. A single system makes it easier to keep data safe and transfer tools between different parts of the business, she added.  

RBC tested technology from other firms, including OpenAI, Agrafioti said, but the bank wasn’t willing to send client data to a system hosted on another company’s cloud. So, RBC bought access to North, a Cohere product that lets users launch AI agents powered by the Toronto firm’s large language models. It will run Cohere’s technology on its own servers; the bank has built what it claims is Canada’s largest cluster of the graphics processing units used to power AI. 

Related Articles

Meta founder and CEO Mark Zuckerberg sitting on a white chair, holding a microphone, smiling, against a blue background.

Tech firms are gaming the most popular ranking of AI models, researchers claim

By Murad Hemmadi
OpenAI CEO Sam Altman speaks at the Asia-Pacific Economic Cooperation CEO Summit in San Francisco, in November 2023.

This new test shows the pros and cons of major AI models

By Murad Hemmadi

RBC and Cohere are co-developing a version of North that meets the bank’s security and regulatory requirements. RBC employees building new AI applications that touch sensitive data must use North, which is rolling out to developers at the company this summer. In March, Canada’s biggest bank announced it’s aiming to generate up to $1 billion in earnings using AI by 2027. “It’s graduating out of experiments,” Agrafioti said, to become “mainstream.”

Betting on Cohere as RBC’s main generative AI provider comes with risks, Agrafioti said, since other AI firms might advance the technology faster. The bank will monitor those alternatives and give staff access to the latest models, she said. Still, she’s keen to avoid the distraction of flashy new AI models. “We have something in our hands that does the job we need it to do really well, and we’re going to use that,” she said.

Gift the full article

While big businesses may have the data, the technical staff and use cases to test AI models, smaller firms rely on third-party benchmarks such as LLM Arena and MMLU, including some that researchers have accused tech firms of trying to game. 

Toronto’s Vector Institute is trying to provide a more independent analysis. In April, it released its first “state of evaluation” study, which ran 11 models through 16 tests. “Large companies have built this infrastructure to do evaluations internally,” said Deval Pandya, the institute’s vice-president of AI engineering. “But most of it is not openly accessible.”

#artificial intelligence #Manulife #RBC #Tech #Vector Institute

Loading...

Thanks for sharing!

You have shared 5 articles this month and reached the maximum amount of shares available.

Close
This account has reached its share limit.

If you would like to purchase a sharing license please contact The Logic support at [email protected].

Close
Want to share this article?

Upgrade to all-access now

Close
Gift the full article!

You have gifted 0 article(s) this month and have 5 remaining.

Copy link and gift
Copy Link
Email to a friend
Send Email
Gift on Social Media

Recipients will be able to read the full text of the article after submitting their email address. They will not have access to other articles or subscriber benefits.

The exterior of a building displays the Manulife logo and name above a row of windows.

Photo: The Canadian Press/Cole Burston

Most Popular This Week

A yellow ambulance is pictured outside of a hospital in Montreal. A red sign in the foreground reads, “Urgence / Emergency.”
Commentary: Quebec Ink

Quebec just found out what not having digital sovereignty really means

By Martin Patriquin
News

Tech leaders welcome new AI funding but warn against government overreach

By Catherine McIntyre
An image of Mark Carney standing in front of a red podium with the words "AI for All / L'IA pour tous." He is wearing a suit and tie. In the background, people wearing scrubs and white coats are visible.
Special Report

Canada’s new AI strategy sets lofty goals for adoption and growth

By Murad Hemmadi and Laura Osman
Exclusive

Canada’s new AI strategy includes $500M fund to back key firms

By Murad Hemmadi and Catherine McIntyre

In-depth, agenda-setting reporting

Great journalism delivered straight to your inbox.

A row of protest signs bearing messages like "End Uyghur Forced Labour."
News

Rushing law to curb forced labour risks adding red tape with little effect, critics warn

By Joanna Smith

Briefing

Trump envoy’s message to Canada: ‘Make your case’

By Joanna Smith   |   Jun 11, 2026 | 2:10 PM ET

Shopify’s Tobi Lütke to drive 24 Hours of Le Mans

By Murad Hemmadi   |   Jun 11, 2026 | 1:26 PM ET

Cenovus’s Jon McKenzie says there’s no financial case for a new pipeline and major carbon capture

By David Reevely   |   Jun 10, 2026

Best business newsletter in Canada

Get up to speed in minutes with insights and analysis on the most important stories of the day, every weekday.

Exclusive events

See the bigger picture with reporters and industry experts in subscriber-exclusive events.

Membership in The Logic Council

Membership provides access to our popular Slack channel, participation in subscriber surveys and invitations to exclusive events with our journalists and special guests.

Recent Popular Stories

Commentary: Quebec Ink

Quebec just found out what not having digital sovereignty really means

By Martin Patriquin   |   Jun 8, 2026
A yellow ambulance is pictured outside of a hospital in Montreal. A red sign in the foreground reads, “Urgence / Emergency.”
Exclusive

Canada’s new AI strategy includes $500M fund to back key firms

By Murad Hemmadi and Catherine McIntyre   |   Jun 3, 2026
News

Canada’s surprise plan to buy Saab command jets leaves competitors seeking answers

By David Reevely   |   May 29, 2026
A closeup of a scale model of a jet covered in pixellated camouflage, with sensor equipment attached to the top of its fuselage. There are civilians and uniformed military personnel milling in the background.
The Big Read

We found every data centre in Canada

By Murad Hemmadi, David Reevely, Aleksandra Sagan, Chaimae Chouiekh, Martin Patriquin and Catherine McIntyre   |   Apr 8, 2026
Four vertical slices of aerial view photos. From left, a building in downtown Toronto housing several data centres, a picture of the Albertan wilderness where the proposed Wonder Valley data centre would go, a lit-up QScale data centre in Quebec, and a data centre at a Hydro-Quebec dam.
The Big Read

ApplyBoard faces a reckoning as Canada’s immigration boom turns into a bust

By Claire Brownell and David Reevely   |   May 27, 2026
News

A Canadian leader in nuclear fusion comes home—with big plans to make power

By David Reevely   |   Jun 4, 2026
A selfie taken by Spencer Pitcher inside a nuclear fusion facility. He is wearing a blue hardhat with the ITER logo on it, and is standing in front of a cavernous chamber full of fusion reactor equipment.

Canada's most influential executives and policymakers are reading The Logic

  • CPP Investments
  • Sun Life Financial
  • C100
  • Amazon
  • Telus
  • Mastercard
  • bdc
  • Shopify
  • Rogers
  • RBC
  • General Motors
  • MaRS
  • Government of Canada
  • Uber
  • Loblaw Companies Limited
logic-logo

Canada's Business and Tech Newsroom

100% human-crafted journalism

Newsroom

  • News Tips
  • AI Policy
  • Editorial Disclosures
  • Story Pitches

Company

  • About Us
  • Terms of Service
  • Privacy Statement
  • Corporate Information

Contact

  • Contact Us
  • Advertise
  • FAQs
  • Work at The Logic

© 2026 The Logic Inc. All Rights Reserved.

Trusted by leaders

Error

Account creation failed.

Please email us at [email protected].

Create Account

[wppb-register form_name=”cozmo-registration-form-for-modal”]

I do have an account
Login
or

[wppb-login]

I don’t have an account