Your data isn’t private, even if it’s anonymous. In 2006, AOL publicly released 20 million anonymized search logs. That same year, Netflix released 100 million anonymized movie ratings of its users. In both cases, researchers were able to re-identify the names of individual users.. More recently, researchers have been able to re-identify names of 30% of anonymized public hospital data in Maine and Vermont and 94% of randomly selected Airbnb hosts in Wisconsin. The basic information you enter into a website — birthday, gender, and zip code — can identify at least 87% of US citizens in publicly available databases. This means that our online privacy is at stake, and can be (and probably has been) breached.
What could have prevented these privacy breaches? Privacy enhancing technologies.
How do Privacy Enhancing Technologies protect privacy?
Privacy enhancing technologies protect data to preserve privacy. This can be done in four ways. First, companies can minimize the data they process. Second, companies can hide the processed data from others. Third, they can separate, or isolate, the data away from their own hands. And finally, companies can aggregate data by storing it in clusters rather than individually.
- Minimize. To minimize the data processed, the system must be programmed to collect only the data needed to run the site. However it’s unrealistic that companies will become data minimalists since data is profitable. The other three ways underscore how companies can work with large data sets.
- Hide. Hiding the data restricts the data from plain view. A well known way is to encrypt the data, which locks the data. Only the sender and the receiver can access the key to unlock the data. However, encryption is not enough to protect privacy for the following two reasons. First, software developers, due to lack of expertise, often implement encryption incorrectly, so privacy breaches can still happen. Second, encryption is impractical. It is difficult to analyze encrypted data, so organizations have to decrypt data to analyze it.
One alternative to encryption is differential privacy. Instead of locking the data, differential privacy hides the data by obscuring it. Think of it as blurry glasses. You cannot identify individual faces, so individual privacy is protected. However, you still have an understanding of your environment, so you can still understand the data as a whole. Differential privacy accomplishes this by injecting statistical noise into data sets. Because of differential privacy’s promise for data analysis and privacy, big tech companies like Google, Microsoft, Apple, and Uber are using differential privacy. To better aid public health researchers track COVID-19, Google has released data on community mobility from Google Maps to the public using differential privacy.
However, differential privacy is no cure-all. There is a tradeoff of accuracy and privacy since we can adjust the blurriness of our glasses. Clearing your vision means that your data analysis is more accurate but with less privacy. Making your vision more blurry means that your data is less accurate but with more privacy. So, finding the right balance between accuracy and privacy is extremely context dependent.
- Separate. We can think of your personal data as a puzzle piece. If you separate and scatter the puzzle pieces, then it is harder to identify any individual. This is analogous to processing data sets across separate servers or processing the data locally instead of centrally. However, separation is unrealistic since most tech companies (Facebook, Google) process data in a centralized server.
- Aggregate. The approach here is to aggregate personal data in similar groups. K-anonymity erases entries in a database to get k individuals or rows with identical information. Since we cannot distinguish between individuals who now have identical data, privacy is preserved.
What inhibits the usage of privacy enhancing technologies?
Hiding, separating, and aggregating the data allow for large scale data analysis that is privacy preserving. They each have their limitations, but can be very effective if applied correctly. The question then is — what’s stopping us from using these tools? There are many underlying reasons in academia, companies, the government, and us, the consumers.
Academia: Privacy enhancing technologies started from research and are studied extensively by researchers. The research on privacy enhancing technologies still has some loopholes and tradeoffs. This language barrier between academia and the industry makes it difficult for companies to implement privacy technologies from research papers unless they have experts in the field.
Companies: Preserving your privacy is generally not in a company’s best interests and so neither is exploring the use of privacy enhancing technologies. Even if they are not selling your data, they can profit off your data by sharing it with other companies. Preserving privacy is also secondary to building the product. Zoom CEO has admitted that in prioritizing a convenient user interface, he dismissed privacy issues.
Companies lack accountability to privacy violations. The public will not know of a privacy breach until a whistleblower working for the company speaks out. This secret system perpetuates a cycle of privacy violations. A classic case is the Facebook Cambridge Analytica data scandal of 2018, which only garnered media attention after Cambridge Analytica whistleblower Christopher Wylie leaked it to the Guardian.
Government: Even if companies want to fully preserve privacy, the government’s national security needs undermine a company’s commitment to privacy. In 2014, Apple implemented end-to-end encryption, so that only the recipient (not even Apple) can read his or her own messages. This is analogous to throwing the key inside the locked room. As a result, Apple was unable to break into the San-Bernardino terrorist’s iPhone in 2016. The FBI ordered Apple to provide an encryption backdoor so that they could read the phone. This would require Apple to create a new iPhone operating system, which Apple describes as “hack[ing] their own users and undermin[ing] decades of security advancements that protect their customers.” Eventually the FBI was able to hack into the phone with the help of a third party.
Regulations: The US has data privacy laws but no comprehensive federal privacy law like the European Union’s General Data Protection Regulation. Instead, data privacy laws in the US are mandated at the federal and state level. On the federal level, the US has privacy laws regarding finance, healthcare, and children’s data. On the state level, states have minor privacy laws but in general lack a comprehensive privacy law. Only 3 states – California, Maine, and Nevada — have comprehensive privacy laws that are in effect, and a dozen states are in the process of passing comprehensive privacy laws. With no comprehensive privacy law across the United States, companies do not have a consistent standard on privacy protection.
Consumers: Even with transparent privacy policies, consumers will never truly understand how companies and the government handle their data behind the scene. It is no surprise that the majority of Americans feel as if they have no control over how the government and companies collect their data, according to the Pew Research Center. 8 in 10 Americans are concerned about their online personal information and the majority believe that the potential risks of data collection outweigh the benefits. Yet despite these concerns, many consumers are unwilling to give up the convenience of Facebook and Zoom, even with their privacy scandals.