How has the bureau protected people’s privacy in past census data?

As the country waits for more results from last year’s national head count, the U.S. Census Bureau is facing an increasingly tricky balancing act.

How will the largest public data source in the United States continue to protect people’s privacy while also sharing the detailed demographic information used for redrawing voting districts, guiding federal funding, and informing policymaking and research for the next decade?

The state of Alabama has filed a federal lawsuit to try to block the bureau from putting these new protections in place. The case is currently before a three-judge court that is expected to rule soon on a request for an emergency court order. Whichever way it goes, the case is likely to reach the U.S. Supreme Court. The legal challenge could ultimately derail the bureau’s schedule for releasing the data many state and local redistricting officials need to prepare for upcoming elections.

Here’s what else you need to know:

Why does the Census Bureau have to protect people’s privacy?

Under current law, the federal government is not allowed to release personally identifiable information from the census until 72 years after it’s gathered for the constitutionally mandated tally. The bureau has relied on that promise of confidentiality to get many of the country’s residents to volunteer their information once a decade, especially among people of color, immigrants and other historically undercounted groups who may be unsure about how their responses could be used against them.

But it is becoming harder for the bureau to uphold that pledge and continue releasing statistics from the census. Advances in computing and access to voter registration lists and commercial data sets that can be cross-referenced have made it easier to trace purportedly anonymized information back to an individual person.


Immigration Hard-Liner Files Reveal 40-Year Bid Behind Trump’s Census Obsession

For a way out of this conundrum, the bureau has been building a new privacy protection system based on a mathematical concept known as differential privacy. Invented at Microsoft’s research arm, it has served as a framework for privacy measures in smaller Census Bureau projects, as well as at some tech companies.

“Differential privacy is in every iPhone and every iPad,” says Cynthia Dwork, a computer scientist at Microsoft Research and Harvard University who co-invented differential privacy. “That may have a larger scale than the number of respondents to the U.S. decennial census, but there’s a totality and commitment to privacy that’s different here” with the bureau’s plans for 2020 census data, Dwork adds.

How has the bureau protected people’s privacy in past census data?

For decades, the bureau has stripped away names and addresses from census records before turning them into anonymized data. That information is broken down by race, ethnicity, age and sex to levels as detailed as a neighborhood.


COMIC: How Your State Wins Or Loses Political Power Through The Census

But even in a sea of statistics, certain households — particularly those in the minority of a community — can stick out because they live in isolated areas or have other distinctive characteristics that could make it easier to reveal who they are.

As part of additional privacy protections over the years, the agency has withheld some data tables, and sometimes particular cells within tables, from the public in the past. The bureau has also added “noise” — or data for fuzzing the census results — to certain tables before releasing them. Beginning with data from the 1990 count, it has used a technique called “swapping” to switch out data about certain households with those from different neighborhoods.

What prompted the bureau to choose differential privacy to protect 2020 census data?

In 2016, researchers at the bureau began conducting internal experiments to test the strength of the privacy protections used for 2010 census data, and based on the results, agency officials concluded they can no longer rely on data swapping.


How 26 People In The Census Count Helped Minnesota Beat New York For A House Seat

Using a fraction of the census data the bureau released a decade ago, the researchers were able to reconstruct a complete set of records for every person included in the 2010 census numbers. Then, after cross-referencing that reconstructed data with records bought from commercial databases, they were able to re-identify 52 million people by name, according to a court filing by John Abowd, the bureau’s chief scientist. In a worst-case scenario, the bureau’s researchers estimated, attackers with access to more commercial data could unmask the identities of as many 179 million people, or 58% of the population included in the 2010 census.

To try to better protect people’s privacy for the 2020 census, the bureau announced in 2017 plans to create a new system, based on differential privacy, that officials say allows them to add the least amount of noise needed to preserve privacy in most of the released data and balance confidentiality and usability.

“Obviously, you know, it’s not the easiest thing to do,” the bureau’s acting director, Ron Jarmin, said this month at the Population Association of America’s annual meeting, adding that the bureau decided against data swapping and withholding certain tables as alternative safeguards. “To achieve a similar level of privacy protection with those sort of traditional methods, I think, would have produced a product that was even … less useful for data users than what we’re contemplating right now.”

How will differential privacy affect 2020 census data?

The bureau says no noise was added to protect people’s privacy in the new state population numbers, including those used to reallocate congressional seats and Electoral College votes, as well as numbers for Washington, D.C., and Puerto Rico. The bureau is also planning to release the total number of housing units in each census block, as well as the number of prisons, college dorms and other group-living quarters in each block, without privacy protections.


Stuck At 435 Representatives? Why The U.S. House Hasn’t Grown With Census Counts

But it remains unclear how the bureau’s differential privacy plans will affect other new redistricting data that is expected out by Aug. 16, including population numbers and demographic details about counties, cities and other smaller areas.

It will depend on the amount of noise the bureau chooses to add and how it tries to smooth out the effects of adding noise. Bureau officials plan to make their decisions for the new redistricting data in early June. Separate privacy protection decisions for other 2020 data sets are expected to be made later after gathering more public feedback.

Why have the bureau’s differential privacy plans been controversial?

Leave a Reply

Your email address will not be published. Required fields are marked *