Do Not Track
Tuesday, 22 December 2015
It seems that we are hearing more and more about panic and paranoia that leads to really bad decisions. For example, after every publicized terrorist action is an increase in attempts to ban encryption or provide backdoors for law enforcement. This leads to three big items that lawmakers continually overlook. First, bad guys don't follow the laws, so making it illegal won't prevent them from doing it. And requiring backdoors won't impact the cryptographic solutions that the bad guys use because, again, they don't have to follow the law.
Second, any kind of backdoor that good guys can use can also be used by bad guys. By enabling law enforcement, we also enable attackers.
Of course, there's the counter argument that most bad guys use public tools. If the public tools are backdoored, then the bad guys are just as vulnerable as everyone else. However, I don't give this argument much merit. In particular, organized crime rings have their own developers. If open source solutions do not provide the security that the bad guys want, then they will develop their own secure solutions.
Finally, there's the big issue that US laws do not apply outside of the United States. When strong cryptography, such as DES, was declared "munition" and not exportable, people outside of the USA developed the code. There were years where anyone wanting strong cryptography had to download the code from outside the USA. If we want to be the technology leaders, then shouldn't other countries have to come to us?
Fortunately, we have not (yet) seen anyone try to promote a law forbidding all metadata. (There's a programming language called Lambda, where there are no variables and every function is named lambda. Life without metadata is like that.)
People forget that there is no such thing as online anonymity. You might be able to make it difficult to be tracked online, but it is never impossible. Every program you use, every system you connect to, and everything you do online leaves traces that can lead back to you. This is the basis of Locard's exchange principle, and it works just as well with online forensics as it works in the physical world.
The downside of being tracked online is that you're never really doing something privately. While not doing anything illegal, I may not want the world to know that, after eating a burrito, I went looking for toilet plungers on Amazon with pricing for next-day shipping.
The big concern is that, with all of this data, companies will be able to build huge profiles about you. Then again, why worry that it might happen when it is happening right now? Facebook shows ads related to your recent Home Depot searches, and Target knows you're pregnant before you know it.
The thing that I can never figure out is the cause of this fear... Are people afraid that others might see what they do on the public internet, or afraid that someone may know you better than you know yourself?
The big failure with DNT is that, although the client sends it, there is zero requirement for the server to support it. In fact, many of the sponsors behind the DNT header ignore it. For example, the Electronic Frontier Foundation supports DNT. However, they also offer a service called "Panopticlick" that tells you if your browser's signature is unique. In order to check if you are unique, they must track other requests. (Otherwise, multiple retests from the same browser would appear to be duplicates.) So even though my browser says "DNT: 1", it is still being tracked by the the EFF.
Similarly, even though Microsoft supports DNT in their web browsers, Microsoft's web services explicitly ignore the DNT setting. As they wrote in their privacy statement:
I've read lots of bills and laws. Some are easy to understand, while others appear to be intentionally oblique. This bill is one of the most difficult to understand documents that I have seen. It's as if someone wants to use a bunch of college level words just to show how smart they are. (I think it uses "promulgating" and "promulgated" 16 times in 13 pages.) The result is a complicated text that really requires a lawyer to interpret.
Keep in mind, unless you are an attorney (and I'm not an attorney), you'll probably be missing some of the nuance and legal issues. I find this ironic since the bill repeatedly states that instructions for users must be "simply and easily" understood. (See Section 2(a)(1) and 2(c)(3)(B).) If I cannot simply and easily read and understand the proposed law, then how can I be expected to simply and easily implement it and disclose the requirements to my users?
The bill proposes that companies clearly describe the data that they collect and offer ways for users to request the anonymization or deletion of their data (Section 2(b)(1)). Also, it requires users to opt-in to data collection (Section 2(b)(2)) and companies must maintain opt-out lists (Section 2(c)(3)(B)).
From a technical viewpoint, this is both contradictory and ineffective. Many web services, such as linked ads, do not have any direct and easy way for users to find the online service. Even if the service does offer easy to understand text and simple opt-out methods, there is no easy way for a user to identify the service provider.
According to the bill (Section 2(b)(1)), services should include ways for users to request the anonymization or deletion of information. I always find requirements like this to be ironic. In order to delete or anonymize user data, I must first know how to identify the user -- that means tracking. This requirement demands the ability to track everyone so that some people can request removal.
Maintaining an opt-out list also leads to an ongoing explicit tracking of the user so that their data can be automatically identified and removed/anonymized. You must be tracked in order to protect your privacy. (Ha!)
Secion 2(d) of the bill identifies types of personal information "such as Internet Protocol (IP) addresses, media access control (MAC) addresses, and other unique device identifiers." This is another technical problem.
Let's start with: your IP address is usually not personal information. Unless you are one of the tiny minority of users with static network addresses, you are likely using a DHCP service. The "D" stands for "Dynamic" -- your computer is issued an IP address as-needed, and it can change any time. When you go to the coffee shop, hotel, or library, you are issued a network address. But the address is not unique; lots of other people have used, and will use, that same address. The same goes for proxy networks -- the proxy's address is not unique to you.
And then there's the MAC address. Under IPv4, the MAC is never seen outside of your local network. Google, Facebook, and other online services never see your MAC address. In contrast, some IPv6 protocols require embedding the MAC address in the IPv6 network address. So either tracking the MAC is a non-issue, or it's required for routing network traffic.
Amazingly, this law does not mention anything about web logs, email headers, Twitter comments, and Facebook postings. For example:
Personally, I see a big incentive for being tracked online. Specifically, it provides an alibi. "So where were you when this person went missing?" "I was shopping on Amazon. And their logs will geolocate me to a Starbucks across town, where store cameras at the cash register will show me making a purchase, and street cameras show me driving my car." That's better than saying, "I can't prove I wasn't there because I'm not tracked anywhere."
Similarly, the police have acquired the mail server logs for Cock.li. This is because Cock.li was used to email terrorist threats against schools in Los Angeles and New York. (Depending on which report you read, this was either obeying a subpoena or an unexpected raid.) The server only stored 7 days worth of logs, so law enforcement had to act quickly. With online anonymity and no tracking, there is nothing to stop this kind of terrorist threat from happening again. In contrast, tracking information could make someone think twice before doing a copycat threat.
The bill does grant companies permission to collect personal information if it is required for performing the service. Specifically, Section 2(b)(1) says (my bold for emphasis):
And keep in mind, "requested by the individual" is vague enough to include web requests. A web page linked to an ad, and your web browser requested the link. Therefore, you requested it.
So what does this mean? If your company is in the business of collecting personal information for advertising, then you are permitted to collect. Moreover, you can keep the data indefinitely if you use it for long-term profiling. This loophole is so large that the entire bill ends up serving no purpose.
This bill also lacks any real teeth. For example, Section 2(c)(3)(B) says that users must have the option to request opting out. Yet nowhere in the bill does it say that the request must be respected or acted upon. Also, how do you verify that you are you? Companies can just say that they couldn't authenticate the removal request.
There's one other big loophole... With telephone solicitors, the "no call" laws have been a spectacular failure. Robo-calls increased 43% this year and the FTC issued another challenge for anyone who can help solve this problem. One of the big problems with robo-calls is the non-profit/political exemption. Anyone claiming to be a non-profit or acting for a political group can legally ignore the no-call list. This Do-Not-Track bill includes a similar exemption in Section 3(a)(2)(C); any group claiming to be a non-profit can ignore this requirement and track users as much as it wants.
If Facebook really wants to avoid this law, they can setup a non-profit that collects the data. The non-profit can then provide the collected information to the for-profit company. In this configuration, the for-profit company never collected anything; they legally acquired the data from a non-profit that legally collects personal information.
And keep in mind: if I, as a non-attorney, can see these loopholes, then you know that the professional legal manipulators can find even bigger problems with this bill.
But let's assume that the bill becomes a law that requires (1) services must stop tracking users, and (2) users can request deletion of their personal information. This still doesn't mean that the data can be deleted.
For example, 18 U.S.C. 2251, 2252, 2258, and 1466 are laws that describe how online service providers must react to child pornography.
These laws explicitly say that providers of online services must report identified child pornography to the National Center for Missing and Exploited Children (NCMEC). Section 2252 says that a failure to report is a felony. Moreover, the law says that, after reporting, the data must be retained (in case law enforcement needs more information).
So imagine this situation: someone uploads child porn and then requests the service to delete the data. (Don't laugh -- we've had this happen already at FotoForensics.) The question then becomes: which takes precedence? The requirement to track, report, and retain, or the requirement to delete and provide anonymity? And before you answer, it is very possible that there is no good solution -- the courts may find a service is correctly following one federal law while committing a felony via another law. A lose-lose situation is very feasible.
Many online protocols and document formats explicitly define headers and metadata that identifies senders, recipients, and other tracking information. Some is provided for communication, and others are provided for debugging or attack identification. If this bill becomes law, then it will literally break the Internet (at worst) or be completely ignored (at best).
While I believe that we should have options for online privacy, I do not believe that this bill proposes a viable solution.
Second, any kind of backdoor that good guys can use can also be used by bad guys. By enabling law enforcement, we also enable attackers.
Of course, there's the counter argument that most bad guys use public tools. If the public tools are backdoored, then the bad guys are just as vulnerable as everyone else. However, I don't give this argument much merit. In particular, organized crime rings have their own developers. If open source solutions do not provide the security that the bad guys want, then they will develop their own secure solutions.
Finally, there's the big issue that US laws do not apply outside of the United States. When strong cryptography, such as DES, was declared "munition" and not exportable, people outside of the USA developed the code. There were years where anyone wanting strong cryptography had to download the code from outside the USA. If we want to be the technology leaders, then shouldn't other countries have to come to us?
Other Bad Ideas
Another computer concept that scares people is "metadata". As security expert Bruce Schneier wrote, "Metadata equals surveillance; it's that simple." While I agree with many things Scheier says, this is not one of them. Schneier's statement is nothing less than buying into the mass media hype around the boogeyman called metadata and promoting ignorance. Bruce should know better than that.Fortunately, we have not (yet) seen anyone try to promote a law forbidding all metadata. (There's a programming language called Lambda, where there are no variables and every function is named lambda. Life without metadata is like that.)
A New Threat!
The other scary kid on the block is called online tracking. (Cryptography, metadata, and online tracking? It's the three horsemen of the cyber apocalypse!)People forget that there is no such thing as online anonymity. You might be able to make it difficult to be tracked online, but it is never impossible. Every program you use, every system you connect to, and everything you do online leaves traces that can lead back to you. This is the basis of Locard's exchange principle, and it works just as well with online forensics as it works in the physical world.
The downside of being tracked online is that you're never really doing something privately. While not doing anything illegal, I may not want the world to know that, after eating a burrito, I went looking for toilet plungers on Amazon with pricing for next-day shipping.
The big concern is that, with all of this data, companies will be able to build huge profiles about you. Then again, why worry that it might happen when it is happening right now? Facebook shows ads related to your recent Home Depot searches, and Target knows you're pregnant before you know it.
The thing that I can never figure out is the cause of this fear... Are people afraid that others might see what they do on the public internet, or afraid that someone may know you better than you know yourself?
DNT
A few years ago, there was a proposal for a "DNT" HTTP header (Do Not Track). When set to "1", it meant that the user did not want their data collected and tracked. Today, most web browsers transmit a "DNT: 1" header.The big failure with DNT is that, although the client sends it, there is zero requirement for the server to support it. In fact, many of the sponsors behind the DNT header ignore it. For example, the Electronic Frontier Foundation supports DNT. However, they also offer a service called "Panopticlick" that tells you if your browser's signature is unique. In order to check if you are unique, they must track other requests. (Otherwise, multiple retests from the same browser would appear to be duplicates.) So even though my browser says "DNT: 1", it is still being tracked by the the EFF.
Similarly, even though Microsoft supports DNT in their web browsers, Microsoft's web services explicitly ignore the DNT setting. As they wrote in their privacy statement:
Browser Controls for "Do Not Track." Some browsers have incorporated "Do Not Track" (DNT) features that can send a signal to the websites you visit indicating you do not wish to be tracked. Because there is not yet a common understanding of how to interpret the DNT signal, Microsoft services do not currently respond to browser DNT signals. We continue to work with the online industry to define a common understanding of how to treat DNT signals. In the meantime, you can use the range of other tools we provide to control data collection and use, including the ability to opt out of receiving interest-based advertising from Microsoft as described above.
I'm just a bill...
Not to be outdone, congress is now considering the "Do Not Track Online Act of 2015" (PDF). This is 13 pages of the stupidest proposed law that I have ever read.Issue #1: Readability
I've read lots of bills and laws. Some are easy to understand, while others appear to be intentionally oblique. This bill is one of the most difficult to understand documents that I have seen. It's as if someone wants to use a bunch of college level words just to show how smart they are. (I think it uses "promulgating" and "promulgated" 16 times in 13 pages.) The result is a complicated text that really requires a lawyer to interpret.
Keep in mind, unless you are an attorney (and I'm not an attorney), you'll probably be missing some of the nuance and legal issues. I find this ironic since the bill repeatedly states that instructions for users must be "simply and easily" understood. (See Section 2(a)(1) and 2(c)(3)(B).) If I cannot simply and easily read and understand the proposed law, then how can I be expected to simply and easily implement it and disclose the requirements to my users?
Issue #2: Effectiveness
The bill proposes that companies clearly describe the data that they collect and offer ways for users to request the anonymization or deletion of their data (Section 2(b)(1)). Also, it requires users to opt-in to data collection (Section 2(b)(2)) and companies must maintain opt-out lists (Section 2(c)(3)(B)).
From a technical viewpoint, this is both contradictory and ineffective. Many web services, such as linked ads, do not have any direct and easy way for users to find the online service. Even if the service does offer easy to understand text and simple opt-out methods, there is no easy way for a user to identify the service provider.
According to the bill (Section 2(b)(1)), services should include ways for users to request the anonymization or deletion of information. I always find requirements like this to be ironic. In order to delete or anonymize user data, I must first know how to identify the user -- that means tracking. This requirement demands the ability to track everyone so that some people can request removal.
Maintaining an opt-out list also leads to an ongoing explicit tracking of the user so that their data can be automatically identified and removed/anonymized. You must be tracked in order to protect your privacy. (Ha!)
Issue #3: False Personal Information
Secion 2(d) of the bill identifies types of personal information "such as Internet Protocol (IP) addresses, media access control (MAC) addresses, and other unique device identifiers." This is another technical problem.
Let's start with: your IP address is usually not personal information. Unless you are one of the tiny minority of users with static network addresses, you are likely using a DHCP service. The "D" stands for "Dynamic" -- your computer is issued an IP address as-needed, and it can change any time. When you go to the coffee shop, hotel, or library, you are issued a network address. But the address is not unique; lots of other people have used, and will use, that same address. The same goes for proxy networks -- the proxy's address is not unique to you.
And then there's the MAC address. Under IPv4, the MAC is never seen outside of your local network. Google, Facebook, and other online services never see your MAC address. In contrast, some IPv6 protocols require embedding the MAC address in the IPv6 network address. So either tracking the MAC is a non-issue, or it's required for routing network traffic.
Amazingly, this law does not mention anything about web logs, email headers, Twitter comments, and Facebook postings. For example:
- Virtually every web server defaults to logging web requests; the default is "track". Even if the requests are not tracked, error logs and server logs may mention network addresses. If you cannot track, then you cannot log.
- Email headers include the list of servers that the email bounced through. These are used for both debugging and spam mitigation. Without the ability to trace email headers for email abuses, we will enable spammers. (Wouldn't it be funny if the only people who asked not to be tracked were the spammers?) Email headers also list the sender, but with anonymity, you will never know who sent you the email. (It's like receiving a gift in the mail during the holidays, and not knowing who sent it even after you open it. This year, I've received two boxes like this.)
- Let's say that you want to delete everything associated with you on Twitter or Facebook. Does this include retweets and responses? How far back does the history need to be erased? What about people who copied your comment to another forum? This reminds me of the entire debate on "right to be forgotten" and the impact of altering history through omission and deletion.
Personally, I see a big incentive for being tracked online. Specifically, it provides an alibi. "So where were you when this person went missing?" "I was shopping on Amazon. And their logs will geolocate me to a Starbucks across town, where store cameras at the cash register will show me making a purchase, and street cameras show me driving my car." That's better than saying, "I can't prove I wasn't there because I'm not tracked anywhere."
Similarly, the police have acquired the mail server logs for Cock.li. This is because Cock.li was used to email terrorist threats against schools in Los Angeles and New York. (Depending on which report you read, this was either obeying a subpoena or an unexpected raid.) The server only stored 7 days worth of logs, so law enforcement had to act quickly. With online anonymity and no tracking, there is nothing to stop this kind of terrorist threat from happening again. In contrast, tracking information could make someone think twice before doing a copycat threat.
Issue #4: Big loopholes
The bill does grant companies permission to collect personal information if it is required for performing the service. Specifically, Section 2(b)(1) says (my bold for emphasis):
(b) EXCEPTION. -- The rules promulgated under paragraph (2) of subsection (a) shall allow for the collection and use of personal information on an individual described in such paragraph, notwithstanding the expressed preference of the individual via a mechanism that meets the standards promulgated under paragraph (1) of such subsection, to the extent --
(1) necessary to provide a service requested by the individual, including with respect to such service, basic functionality and effectiveness, so long as such information is anonymized or deleted upon the provision of such service; or
(2)...
And keep in mind, "requested by the individual" is vague enough to include web requests. A web page linked to an ad, and your web browser requested the link. Therefore, you requested it.
So what does this mean? If your company is in the business of collecting personal information for advertising, then you are permitted to collect. Moreover, you can keep the data indefinitely if you use it for long-term profiling. This loophole is so large that the entire bill ends up serving no purpose.
This bill also lacks any real teeth. For example, Section 2(c)(3)(B) says that users must have the option to request opting out. Yet nowhere in the bill does it say that the request must be respected or acted upon. Also, how do you verify that you are you? Companies can just say that they couldn't authenticate the removal request.
There's one other big loophole... With telephone solicitors, the "no call" laws have been a spectacular failure. Robo-calls increased 43% this year and the FTC issued another challenge for anyone who can help solve this problem. One of the big problems with robo-calls is the non-profit/political exemption. Anyone claiming to be a non-profit or acting for a political group can legally ignore the no-call list. This Do-Not-Track bill includes a similar exemption in Section 3(a)(2)(C); any group claiming to be a non-profit can ignore this requirement and track users as much as it wants.
If Facebook really wants to avoid this law, they can setup a non-profit that collects the data. The non-profit can then provide the collected information to the for-profit company. In this configuration, the for-profit company never collected anything; they legally acquired the data from a non-profit that legally collects personal information.
And keep in mind: if I, as a non-attorney, can see these loopholes, then you know that the professional legal manipulators can find even bigger problems with this bill.
Issue #5: Conflicting Laws
But let's assume that the bill becomes a law that requires (1) services must stop tracking users, and (2) users can request deletion of their personal information. This still doesn't mean that the data can be deleted.
For example, 18 U.S.C. 2251, 2252, 2258, and 1466 are laws that describe how online service providers must react to child pornography.
These laws explicitly say that providers of online services must report identified child pornography to the National Center for Missing and Exploited Children (NCMEC). Section 2252 says that a failure to report is a felony. Moreover, the law says that, after reporting, the data must be retained (in case law enforcement needs more information).
So imagine this situation: someone uploads child porn and then requests the service to delete the data. (Don't laugh -- we've had this happen already at FotoForensics.) The question then becomes: which takes precedence? The requirement to track, report, and retain, or the requirement to delete and provide anonymity? And before you answer, it is very possible that there is no good solution -- the courts may find a service is correctly following one federal law while committing a felony via another law. A lose-lose situation is very feasible.
Next Steps
As with the debate on Net Neutrality, I really think that there was not very much thought put into this bill. If the Do-Not-Track authors consulted any technical experts, then those experts were either ignored or grossly ineffective at identifying these limitations.Many online protocols and document formats explicitly define headers and metadata that identifies senders, recipients, and other tracking information. Some is provided for communication, and others are provided for debugging or attack identification. If this bill becomes law, then it will literally break the Internet (at worst) or be completely ignored (at best).
While I believe that we should have options for online privacy, I do not believe that this bill proposes a viable solution.
> People forget that there is no such thing as online anonymity. You might be able to make it difficult to be tracked online, but it is never impossible. Every program you use, every system you connect to, and everything you do online leaves traces that can lead back to you.
Don't you see the contradiction of what you are saying?
1. Either metadata tells a whole lot about users online, thus being identical to surveillance; or
2. Metadata is useless and anonymity is possible.
So, which is it? Given that you run fotoforensics, I'd say 1
http://www.hackerfactor.com/blog/index.php?/archives/571-Metadata-is-Scary.html
Surveillance is a very specific term. As the dictionary defines it, surveillance is "the act of carefully watching someone or something especially in order to prevent or detect a crime". Routing email, sorting files, or even examining metadata does not mean that it is related to a crime, and it doesn't imply 'watching'. Metadata doesn't even make you a suspect. Moreover, metadata is passive -- it doesn't prevent or detect anything. Prevention or detection may be based on what you find in metadata, but that's up to whoever does the interpretation.
In contrast to surveillance, forensics evaluates data after it was generated. Surveillance requires active monitoring and selecting what you capture. Forensics requires looking at evidence after it was collected, regardless of where it came from. While metadata is readily applicable to forensic analysis, it is usually not essential for surveillance. Since collected computer data doesn't change, it doesn't even require watching -- you just need to look once.
Were they good things that you liked? If so, can I take credit for one?
If they were not nice things, then never mind......
Thank you for the wonderful holiday gift!
-Neal