Aidan Gomez has been training large language models for years, but it’s only in the last few weeks that the general public has started to grok what he’s been working on. The researcher is CEO of Toronto-based Cohere, a startup that lets businesses build AI-generated text into their products and services.
In November 2022, San Francisco-based OpenAI launched ChatGPT, and with it came a deluge of screenshots on social media of AI-generated responses to silly and philosophical questions; think pieces pondering its effect on fields from higher education to marketing; and stories about how other startups and tech giants were reacting.
Talking Points
- Toronto-based Cohere’s platform lets businesses build machine-made text and dialogue into their products and services
- While ChatGPT has increased public awareness and interest in generative AI, CEO Aidan Gomez says there’s plenty of room in the market as his firm looks to build a model of the internet
Gomez had a hand in laying the technical foundations for the current boom at Google Brain, one of the search giant’s AI labs, helping come up with the transformer—the “T” in “GPT.” He and co-founders Nick Frosst and Ivan Zhang started Cohere in September 2019, building a natural-language-processing platform that would let other firms deploy the technology. “There were all these insane demos and super-exciting progress and then nothing—just stagnancy,” he recalls. “After a while, we got frustrated enough to say, ‘People aren’t pushing this forward. We need to build something that lets more people use this.’”
According to PitchBook data, the company has since raised US$170 million to train its models, from investors that include the New York crossover fund Tiger Global Management, Toronto-based Radical Ventures and Index Ventures of San Francisco and London.
Cohere now has 175 employees, with outposts in San Francisco and London. In December, it hired YouTube CFO Martin Kon as its new president and COO. Two months earlier, The Wall Street Journal reported that Google was in discussions to invest US$200 million or more in the firm; Cohere declined to comment on any deal. Gomez also declined to disclose customer numbers, but cited examples like AI writing assistant Hyperwrite as well as audio-streaming services and news organizations.
Other startups in the space already have bigger capital stacks than Cohere, and Big Tech backers which are themselves accelerating attempts to launch generative AI products. Last month, Microsoft committed US$10 billion to OpenAI, and has already rolled out integrations to Azure, Bing and Teams.
Still, Gomez said ChatGPT’s virality has been a boon for public understanding of the technology, and driven more developers to Cohere’s doors with ideas for how to use it. And he sees room for multiple NLP platforms in the emerging market for AI-enabled text and conversation.
This interview has been edited and condensed for length and clarity.
Explain to me like I’m a smart seven-year-old what Cohere does.
The product that we’ve built is an API, a platform for building with these models that solves pretty much every problem you could have with launching one, from data privacy, to needing to have in-house ML experts, to needing supercomputers’ worth of compute.
The technology that we’re building on is the same technology that we see emerging now, both in Big Tech as well as in some other startups like ourselves—these big language models trained on the web. What Cohere does is really try to present them in a way to the world that is maximally accessible. So you don’t need to be a PhD with eight years of education in this thing; you just need to be a developer who knows how to code.
Because of ChatGPT and the popular awareness of the technology now, virtually every tech executive is asking themselves, “How can I make use of this?” because the technology is ready for primetime. At Cohere, our priority is helping them think through, ‘How do we see value?’
What are viable applications now, and what applications do you think will be most interesting?
One example is search. We’ve had search for a quarter-century, but—outside of Google—it’s all been keyword-based and super simplistic, not very robust. It’s still a high-friction interface for most developers and companies. One thing that Cohere does extremely, extremely well is semantic search. So it’s not about, “Is this word literally the exact same as that word that appears in a document?” It’s about the intent. If there are spelling mistakes, if they’re using a synonym [it still works].
We’re also multilingual. For anyone who’s searching over troves of documents that might appear in one of those 109 languages, it’s now as trivial as an API call to get Google-quality search in any application.
One of the first new frontier applications that was unlocked by these big language models—the first one to really find product-market fit—was copywriting. So, the ability for you to basically talk to this model and say, “Hey, write me an email responding to so-and-so who just said this. Write it in such-and-such a tone and express x, y and z,” and it just pumps out an email for you. Or writing blog posts or essays. That’s another huge application domain—speeding up writers [or] helping people with ideation about the sort of articles they can write. Being a thought partner to that author.
The latest of these major frontiers is dialogue as an interface. The model might try to do the thing you ask, and you’ll say, “Yeah, kind of, but could you make it more like this?” And then it will get even more accurate. You can imagine dialogue-based search, where instead of going to Google and searching a query, you’re actually having a conversation with Google. You can dive deeper into topics, and the back-and-forth feels much more natural. With shopping, you can imagine instead of going to Amazon and searching for a product, actually having a conversation with an agent. It will come back with products and say, “Hey, you really like this, but I couldn’t find it. I have this one which is super similar,” in the same way that you might do with a person.
With dialogue as an interface, the surface area of basically every product we interact with changes and becomes much nicer and much more natural. Our natural modality for doing any sort of intelligent interaction is conversation. That’s like the big unlock.
How close to cost break-even is the technology itself? Sam Altman said it’s costing OpenAI single-digit cents per chat to run ChatGPT.
I can’t get into specific economics, of course. But I will say it’s very different between training time and serving time.
Training one of these models, extraordinarily expensive. The resources to train them—the supercomputers—they’re priced very, very high. At serving time, though, there’s so much you can do to take one of these massive models and compress it down, use fewer bits per weight, chop pieces of it off. By the end of that pipeline, once it’s in production, it’s dramatically, dramatically cheaper.
We’ve been at it for three years and we’re quite good at it. But it took us a while to actually get good at training these models. It’s really, really specialized knowledge.
Is it a one-time thing—you build the models and then you’re forever serving?
No, their utility drops off over time. So for instance, if you had a model trained before 2020, “COVID-19, what’s that?” It never heard about it, never saw it. The further you go into the future as new events emerge, new terms and topics, the model becomes less and less useful. And so you keep needing to bump it up, make it more recent.
OpenAI has Microsoft as a funder and is building on top of their Azure cloud. Cohere has a deal to use Google Cloud’s supercomputers. Is it possible for a company like yours to exist in this space without one of the tech giants as a partner or a major backer?
I certainly think so. There are these arrangements of convenience—these tech giants have supercomputers, and those of us building models need supercomputers. But that compute is also available for purchase. So it’s not strictly necessary to have any sort of special arrangement. You can just participate through the market and buy these supercomputers. There are definitely advantages to aligning yourself with one of these large compute providers. OpenAI have certainly done a great job of that. But there are disadvantages, too, about locking yourself into one of them and becoming beholden to them.
Is this space winner-takes-all, or is it going to be a diverse market?
I think as a product of these models being so diverse in their application, it has to be a diverse market. You’re never gonna get one provider which is able to win on all of those fronts simultaneously. There will be some carving up of the space in the end game—some folks are very, very good model builders at x, y and z, and another set of folks are very good at a, b, c. There is certainly enough room within the space of language for many players.
You’re just focused on language?
For now, yes.
Is that a pointed “for now?”
The most exciting project in AI is modelling the internet. Humans have spent a quarter-century accumulating these beautiful documents that have multi-modality—language and images and text and audio. We packaged it up beautifully. There’s nothing more compelling than training on that and modeling that creation. That requires being able to listen to the audio and watch the videos and read the text.
If you want to be a company that models the internet, eventually you have to get there. I do believe that within those modalities, the most compelling to me is language. Language is the medium of intelligence, or whatever. But eventually, we have to push out.
On Nov. 30, ChatGPT goes into wide release and the world changes for a lot of people who aren’t in the industry. It’s that brain-explosion GIF.
Yeah.
What does it do for Cohere to have something that looks similar to your technology going viral?
It’s been fantastic. People have an intuitive grasp of large language models now—the general public does. Which is insane to me. Cohere has been around well before this whole boom, and we’ve been in lots of conversations with folks about our models and the technology. All of a sudden, it has just clicked for people.
It’s really validating and cool. They’re coming to you saying, “Hey, I want to do this with large language models.”