Privacy and Data

Measuring Canada: In conversation with Canada’s chief statistician, Anil Arora

Anil Arora, chief statistician for Statistics Canada, speaks during the Canada Summit in Montreal, Quebec. File: Sept. 7, 2017. Christinne Muschi/Bloomberg

It’s been a difficult 10 days for Anil Arora, head of Statistics Canada. His agency has come under fire after Global News reported that a pilot project requested banking records detailing how 500,000 Canadians spend their money.

The privacy commissioner has launched an investigation into the program. Conservative MPs are accusing the Trudeau government of allowing Statistics Canada, an independent government agency, of creating a massive database on individual Canadians without their consent.  

The pilot project and the public’s reaction to it are a bellwether for how much personal data Canadians are willing to share in the public interest.  

In an extended interview with The Logic conducted in his Ottawa office on Saturday, Arora explained why he thinks the pilot project is crucial for measuring the well-being of Canadians in the innovation economy. He also talked about how securely the data would be managed and touched on the larger policy questions surrounding the rights of citizens to their own information in both the public and private spheres.

What follows is a lightly edited transcript of Anil Arora, chief statistician of Canada, in conversation with The Logic’s Zane Schwartz.

Read this article for free

By entering your e-mail you consent to receiving commercial electronic messages from The Logic Inc. containing news, updates, offers or promotions about The Logic Inc.’s products and services. You can withdraw your consent at anytime. Please refer to our privacy policy or contact us for more details.

Already a subscriber?

ZS: On Thursday, Wayne Smith, who served as chief statistician from 2010 to 2016, said, “To ask for people’s information about their financial transactions to be provided to Statistics Canada is extremely sensitive and Statistics Canada should be able to say: ‘OK, here’s the purpose and here’s why it’s important enough to justify this intrusion. If they don’t have an answer, they should stop now.” What’s your answer to that?

AA: I think that’s a fair enough case. We don’t go fishing. We’re not creating that big database in the sky. Look at the facts on the table. We’re talking about 36 million Canadians—give or take 15 million households—and we’re talking about a request for [the data of] half a million.

I think there’s a misconception out there that we’re collecting everybody’s information on every transaction. And that’s clearly not the case. We wouldn’t have a sample size of a half a million if it was. Now you might say, “Why do you need that information?” Let’s look at individual need for it. You go and get a mortgage, your institution is looking at to what extent are you leveraged, what’s your income, what’s your expenditure base? And where do you think those data come from on which the policies and the rules are based? They come from a representative sample of data that actually tells you: what is the average house price in that area; what is the income in that area; what is household type in that area. This is the data CMHC [Canada Mortgage and Housing Corporation] and the Department of Finance [use to set] policy about who qualifies for a mortgage.

ZS: How will you be able to provide better information by collecting the banking information versus what you have now?

AA: What we do now is generally surveys. There are a number of problems with that. One is that we’re finding less and less Canadians to participate in those kinds of surveys that allow us good unbiased data. People are busy; people don’t want to answer the phones. The second problem with that is that the people that participate look very different from those that aren’t participating.

If you look at the survey of household spending, for example, those surveys involve literally giving you a pen and a diary and making you keep all receipts of every expenditure you make over a period of time and reconcile it to a very small percentage of the variance from your overall income.

And that’s how we get at what the typical household is spending their money on. And we’re getting those results two, three years later because that’s what it takes to do a survey. 60 per cent of people that we approach to keep that diary tell us, “No,” or we can’t get a hold of them. So it’s based on 40 per cent, and that number continues to decline.

And, interestingly, many of those same people are [looking at] their banking transaction data to get the records to tell us what their expenditures are. All that is to say, what I think a lot of people don’t understand is that the starting point of any statistic that we put out is a piece of information that belongs to somebody—that is a sensitive piece of information that we understand has privacy implications to it. If we do it on paper or even online, what is the difference between doing it in that mode as opposed to going to an administrative source and getting that information directly?

And I get it. If you’re trying to convince me that I don’t understand the privacy concerns, that couldn’t be further from the truth. Every single question that we ask on every single questionnaire, we worry about the privacy implications. We ask the most sensitive health data—you name it.

ZS: This is more detailed, though. You don’t necessarily have [today] that I spent $4 at a Starbucks near my doctor’s

AA: Frankly, nobody at Statistics Canada delves into an individual transaction between you and a particular vendor. One also has to understand that it’s not the first administrative file that we’re dealing with. We have procedures in place that as soon as any kind of sensitive data comes in, we immediately scrub it and take off anything that says “This is Zane’s record.”

ZS: How many people at Statistics Canada would have access to the banking information?

AA: If it’s a dozen people, I’d be surprised. So a piece of data comes in. The identifiers are kept in a separate file and an identifier is given to the characteristics of that file. So you need a two-key system, in a sense, to get at it and unlock it—to be able to link that data back to the individual.

Say you have one file for Zane or Anil, then you have another file which says “1-2-3-4-5,” or whatever it is—that “1-2-3-4-5” means something in terms of an economic family or a demographic or geographical kind of thing. Those are embedded in that number. That’s what we are interested in, is the statistical nuggets in there. We then build the household structure, we build the neighborhood characteristic, the demographic characteristics from those kinds of things.

Every single person that works in this organization, from the day that they walk into the door, they’re security cleared, they have to sign an oath; they’re told that this is a secret that they have to keep for the rest of their life. And if they violate that secret, the Statistics Act has fines and jail terms for people who violate that.

ZS: In the 2016 census, there were 20 privacy breaches, including census forms being lost. Those losses weren’t immediately reported to the privacy commissioner. In 2017, Statistics Canada’s website was hacked and down for about 48 hours; no information was taken. So you understand why people may have concerns—you have procedures in place, which are great on paper. In practice, though…

AA: Let’s talk about the census, having managed a few in my life and redesigned a few in my life. We hire tens of thousands of people to help us do the census. While in 2006 we initiated the online transaction, about 30 per cent of Canadians are still doing it either on the phone or prefer to do a paper questionnaire. So just think about that: there are 36 million Canadians that were rated in the 2016 Census. There’s 13-point some odd-million dwellings that were occupied.

We’re talking about tens of thousands of of enumerators that go door-to-door and collect and put it into boxes, and then they get shipped through Canada Post, and then they, in some cases, get carried in the back of the [census collectors’ car] trunk. So when we have multiple points of handling, you increase the risks that a record can go astray. Even one incident with one questionnaire is one too many. But when you look at the number of instances with the scale and size of the operation, it is certainly not in the hundreds or even thousands. You can do the math: 20 divided by 13.6 million.

To go back to your question about the online incident: no data was ever hacked into. We have information that we put on our web servers for dissemination and we want people to use it. Those systems and the systems that we have within Statistics Canada aren’t connected. So to say [hackers] got through the first layer and therefore everything is vulnerable—that if you can penetrate that then you’re now into the crown jewels—that’s a very simplistic stretch of logic.  We’ve taken every single step, including physical separation and isolation of systems, et cetera. This is a hundred years in the making.

ZS: Is it working? Has Statistics Canada ever been hacked?

AA: No. We have never had a single piece of confidential data exposed to the outside.

ZS: Ann Cavoukian, former Ontario privacy commissioner, said, “I urge all online banks to resist providing customer sensitive financial data to Statistics Canada in identifiable form. De-identify the data first at source.” Is that something you’re willing to do?

AA: It sounds very reasonable on the surface. But let’s parse that out: many Canadians have a checking account, a savings account, accounts with various institutions; they’ve got a line of credit. In my household, there’s myself and my spouse and kids, et cetera. So there’s an economic family that we have to look at.

Then we have to know where that record is located in a particular neighborhood, because that’s the kind of detail that people now need to be able to separate what’s going on in, [say], Richmond, as opposed to downtown, on an aggregate level. It sounds really simple, but essentially you’d have to replicate a statistical agency within the institution to be able to do all that un-duplication, and then assign the code and then get back at it. How is that any less privacy-invasive?

ZS: Have you talked to the banks about it?

AA: We’ve been talking to the banks for a year.

ZS: Have they offered to de-identify?

AA: So we give the banks an oversample of the number of dwellings that we want, so that they don’t even know which sample we’re going to use out of that. Out of that 500,000, the number that we’ll use is closer to 300,000 or 350,000. So they don’t even know which ones we’re using. We get a file which says, “Here’s the identifier and the StatsCan number.” Through that transmission that came from them—which is as secure [as], if not more than, the banking information that one does online—we are getting a de-identified file. This is the design of the pilot that I’m talking about.

So the answer is a bit more complicated than “Why don’t you just get it de-identified?”—well, we are in some sense.

ZS: It’s not really, though, because you can still match it back if you needed to for statistical reasons.

AA: We have to be able to do that. If not, then the value of that file starts to diminish significantly, and then why do it? Then we’re back to: let’s get three-year-old data, and let’s get it for a province or for a city, rather than where it’s really needed today.

When you don’t give information on a timely basis and you don’t give it for the level that people need, then people fill in the gaps with whatever it is they think is right. How is that helpful as a society? Haven’t we been watching that debate going on about what is the opportunity cost of that void? If we can’t fill it, as Statistics Canada, with world-leading experts and systems, with the confidentiality and privacy assurances, well, then who else do you trust to fill it? And are you okay with using that information for your decisions, individually or as a society? Let’s have that debate.

Over the course of the last two years, our staff has been from coast-to-coast-to-coast talking to every major association. We’ve talked to Canadians; we’ve talked to different levels of government, and this is exactly what they’ve all told us: we need more granular data. We need it [to be] more timely, and we need a high-quality data set. If we say we’re a data-driven society and we’re based on evidence and fact, the theory has to match with the practice at some point.

ZS: In December 2015, StatsCan put out its policy on the use of administrative data—[data originally collected for a purpose other than statistical analysis]—that is obtained under the Statistics Act. Did you start collecting administrative data three years ago?

AA: We have had administrative data in this agency for at least half a century. And, like I said, the law prohibits us from putting out any individual data to anybody. No minister can ask for it, no government department can ask for it, no individual can ask for it. The Statistics Act gives the powers to have the data for statistical purposes, and it [prevents] us from doing anything with it other than for statistical purposes.

If another department said, “I want you to look at someone we’re concerned is evading taxes or not paying their fair share,” or whatever, that’s not something I would ever do. The Statistics Act is very clear on that. And that’s ingrained in our culture and our systems and our processes.

So coming back to your point: we have education data on individuals, we have justice data on individuals, we have financial data through other sources. This isn’t anything new for us.

I understand the sensitivity of the financial records. That’s why we’ve been working for a year with institutions and with the privacy commissioner. And we’re still in the process of designing it. There is a real educational opportunity for people to understand what it takes. When we put something out on the rate of obesity with our population, where do you think that starts? We have surveys where we physically measure people’s weight and blood types—we know how to deal with the most sensitive pieces of information.

ZS: There’s a petition on the House of Commons website, it has 9,000 signatures [Ed: 13,000 as of Sunday night]. Many Canadians are expressing concern that Statistics Canada will share their banking information…

AA: Impossible.

ZS: Because of the Statistics Act?

AA: The Act is one very, very big deterrent, given the penalties and fines. But it’s beyond that. The culture within this organization just would never allow it because we know the repercussions of that loss of credibility and trust. This is the currency in which we operate. We know the quid pro quo for individuals entrusting us with their most sensitive information is that it will never leave here.

So, do I understand people’s concerns? Remember I also have all of my financial information in an institution. I don’t have my money under a mattress. I get it—I am part of that base. Statistics Canada is like a vault: once [data] comes in here, it is kid-glove treatment all the way through.

And in this fast-paced society, where we’re seeing more and more Canadians do business online—I mean, when was the last time you sent somebody a cheque or, you know, reached into your pocket to pay for something in any kind of substantive way with cash? Eighty per cent of transactions, if not more, today are done electronically. Where should a statistical agency go to to get information? If it’s relegated to pencils and diaries and questionnaires—you can’t have it both ways.

ZS: Why not get consent? You get consent for the diaries. And you’re saying that an increasing number of people don’t necessarily want to share that information. Why couldn’t you do the exact same thing but for online transactions?

AA: That played out in the long-form census debate as well. We know that when voluntarily people give us information, there is a penalty to pay for the quality of the information—the bias that creeps into that information. The people that say “yes” to participating look very different than the ones that say “no.”

ZS: Are you surprised by how much pushback there’s been in the past week?

AA: There is no question that we could have done a better job of priming the population and allaying the concerns of Canadians. In [our] defence, we’re not even in the implementation phase; we’re still in the design phase of this pilot. But it doesn’t change my respect for people’s concerns and views. I get it. It just increases our resolve to be even more diligent. We have to do a better job of explaining what we do as a statistical agency and differentiate ourselves from other policy departments or other institutions.

ZS: What could you have done differently from a communication perspective on this project?

AA: It’s a difficult one to answer because we haven’t actually done anything yet. In the last year we put out data from the housing sector using administrative records—maybe we need to not only say, “Here are the numbers of foreign-owned housing,” but we need to also say, “This is how we got that data.”

ZS: How did you get that data?

AA: We went to the private sector and to our land registries and to real estate companies—we got a lot of that data through administrative records.

ZS: The federal Conservatives and the NDP are asking for this pilot to stop until it can be investigated. Are you open to that? Does this program need to be paused?

AA: I’ve said to the privacy commissioner, “Come in and make it more robust.” I’ve asked that the work with the banks continue and [to] see what more we can do to allay concerns. I understand all their concerns. I want the other side to also understand our needs and what we do and what we don’t do. It isn’t just a one-way conversation about the provision of the information. We have to be looking at both sides of this.

It’s an open debate, and it raises the broader issues about quality of information on which we want to make decisions, the timeliness of the information we want to make the decisions, the localization of the information that we need to make decisions and who should provide that.

Some of these are not new. You get an incident from time to time that raises questions. The long-form census was one of those periods where a lot of questions were raised, and then we brought in further legal changes to strengthen the independence of the agency. So this is a healthy debate. At the end of it, there’s an education piece, there’s a comprehension piece and then out of it there’s a better outcome, hopefully. That’s what a democratic society is all about.

Some of these other questions are far greater than me, and that’s why we have a democracy and I respect that democracy. All I can say is that in our consultations we heard overwhelmingly from Canadians, “Yes, we want you to see how we can harvest more information from administrative sources, we trust you to be able to do that in a privacy-friendly and secure way, and if that’s what it takes to provide the kind of needs for our society going forward, experiment—pilot it.”

Businesses are telling us there’s too many forms, there’s too much burden—why don’t you use the administrative data that we already provide the government?

ZS: Can you give an example?

AA: With the census we used to ask for tax data in a lot of detail. Slowly, people said, “You have it—why don’t you just use it?” You know, attitudes, they change over time. And that’s what we have done as an agency—we’ve always earned the trust of Canadians, and we will continue to do the best we can to continue that process.

ZS: You mentioned in a Global News interview that you’ve been talking to the privacy commissioner regularly over the past year—you said about once every two weeks. Are there things the privacy commissioner asked you on this that you didn’t implement?

AA: It doesn’t quite work where they tell us how and what to do and we do it—there are tradeoffs there. It’s been a very healthy conversation between the two organizations. Many of the design features are reflective of that conversation.

ZS: You’ve talked about the importance of measuring the gig economy. And you’ve mentioned in previous interviews you might be asking for data from Uber or Lyft or Airbnb. What would that allow you to measure?

AA: At the moment, all we know is that people are consuming these services. We have very little information on how prevalent it is, how much of our economy is dependent upon it, what kind of jobs are associated with it, what kind of revenues are associated with it, what should the taxation components be, what is the safety net of people in the gig economy, what the vulnerabilities are if there’s a downturn. We’re so hamstrung by the lack of information on some of these things. So right now the debate is on a smattering of people’s thoughts about this and it’s more projections and models than it is good hard statistics.

ZS: So initially the goal for banking information was January. Is that still the goal?

AA: It was always an estimated time. Nothing happens in February. If over the next few weeks there are suggestions from the privacy commissioner or from working with the banks—then we’ll take them into account, even if we have to phase institutions in. That was always the thinking: to start to put it into a pilot phase early in the new year. So we’re still continuing on.

ZS: Under the Act, you don’t need permission. Are you going to wait for it? If the banks say no…

AA: We don’t work by just taking the Act and throwing it on the table. We didn’t have to work for a whole year consulting with the banks and the commissioner if that was the case—we could have just started with that.

We don’t go throwing the law around. We earn it, we demonstrate the reasonableness. That’s why we have a record—because we work with institutions, we work with Canadians, we earn that trust with every single transaction.

ZS: This has been a main topic of debate in question period all week. Have you spoken to the prime minister? Have you spoken to Innovation Minister Navdeep Bains?

We obviously keep our our minister’s office in the loop and they keep the prime minister’s office in the loop, I suppose. Remember, I’m not a politician.

ZS: Last question. The NDP and the Conservatives are calling for this pause. You’ve said before that legislatively you’re within your rights. However, the law can be changed. There’s calls for a change right now. Are you concerned about that?

I’m a believer in democracy. You know up until then I have a mandate. I validated that mandate with Canadians. This is what they said they want and expect of us. We’ve been transparent about our direction in terms of where we’re going and how we’re modernizing. We’ve demonstrated the value that we can give when going in that direction.

A statistical agency has a mandate to provide good quality data and timely data that responds to individual needs.

That’s what we’ll continue to do until structures and otherwise say that Canadians want a different kind of structure in place. I mean, all I can say is I don’t see the alternative being better. You’ve got an agency with 100 years of building systems and processes to be able to do that. And we’ve got essentially an unblemished track record of doing it.

So the logic is then we live with the gaps as a country and let whatever else fill that gap? I don’t know. You’re not going to just replace a 100-year-old institution with something else that’s going to do it better.