Get Well Soon
Sunday, 10 January 2016
I'm always amazed at the things we can identify just by looking for unexpected symptoms. At our local museum, they are wrapping up an exhibit on Sherlock Holmes. Arthur Conan Doyle's fictional character was inspired by a real-life person: Dr. Joseph Bell. Bell emphasized the importance of observation as part of a medical diagnosis. As sherlockian-sherlock.com noted:
I'm a strong believer in the power of observation and its applicability to computer science. When your web browser connects to a web server, it presents data that can be observed by the server. For example:
With HTTP, the server can issue a cookie to the client, and the client is expected to return the cookie to the server. Cookies provide a simple solution for session maintenance. (You wouldn't want to log into a site and see someone else's account information, would you?) According to the protocol, the client will only return cookies to the site that issued them. (I'm skipping over the whole issue of third-party cookies.) If you look at the cookies on your computer, you will see each of them associated with specific domains. This is how the browser knows to send the right cookie to the right server.
However, I occasionally see browsers that submit unknown cookies to my server. These are cookies that my server never provided to the client. In many cases, I can identify the service that generated the cookie -- and it isn't me. (I've seen cookies from Google, CloudFlare, gamer networks, banks, and more!) I managed to track down the cause for most of these unexpected cookies:
Meanwhile, people with spyware, well, they are definitely being closely monitored. UserVoice, AddThis, Aidata, MindSpark, Clkmon are all types of adware that alters cookies and spies on user activity. Even large companies like Alibaba have spyware that inserts cookies.
Keep in mind: If the server does not detect anything, then it doesn't mean anything. You might be clean, or you can be infected with something that doesn't change the HTTP header. However, if it does detect something, then you are definitely compromised -- either by malware or by a man-in-the-middle.
Knowing that a client's computer is infected seems to provide a strong indicator of their intent. For example, if a user has detectable adware/spyware, then they are nearly 10 times more likely to upload prohibited content (porn) to FotoForensics. One plausible explanation is many porn sites host malware, or "malvertising", that installs adware and spyware. As a result, these porn connoisseurs get infected. And when people upload files to FotoForensics, they upload pictures that are related to their interests (i.e., porn). At that point, the user uploads porn and the server sees that the user's browser is infected.
Keep in mind: lots of users upload pictures to FotoForensics -- both permitted and prohibited content. You can be infected and upload workplace-safe content. You can also have no detectable malware and still upload prohibited content. If you are cautious, then you can visit porn sites without getting infected, and if you don't take protective steps, then you can get infected without ever visiting a porn site. This finding only notes that users with detectable compromises are much more likely to upload prohibited content.
Another third will beg and plead for assistance in removing the infection. Their belief is that I found it, so I must help them with the cure. Just because I can detect the infection doesn't mean that I have the knowledge, tools, or time to help everyone who is infected. (I had one guy a few years ago who was willing to pay me $1,000 to remove the infection. I told him he could spend much less money if he took his computer to a company that specializes in malware removal; malware removal is not my specialty.)
The final group of people usually ignore the notice. ("So what... My computer still works.") In my experience, only a small percentage actually say "thank you" or do something about it.
Still, I do want to offer some kind of assistance. Is there any interest in an online site that tests HTTP headers for indicators of adware and spyware infection? If so, then I'll make a public test page.
Bell could tell from the tattoos of sailors where they had sailed. From having a look at a hand he told the profession of its owner. A glance at a face told him whether the person is a drinker or not. Thanks to his observations he knew a lot of information about his patients soon before they talked about themselves. When someone lied to him, the professor explained him the telling signs that revealed the truth.
I'm a strong believer in the power of observation and its applicability to computer science. When your web browser connects to a web server, it presents data that can be observed by the server. For example:
- Following links. Based on your HTTP 'Referer' string, the server can identify how you found the site. This field specifies the URL that referred you to the URL. If someone links to my site, then I can see who linked to it. If you retrieve a picture from my site, then I can determine which page linked to the picture. In the case of links from search engines, the destination server can often see the search terms that you used.
- Type of device. The HTTP User-Agent string usually identifies the type of browser and computer. (Assuming it can be trusted...) This string identifies the operating system version, whether it is a smartphone or a tablet, and even the type of browser.
- Network address. Each HTTP connection explicitly identifies the client's network address. This can be used to geo-locate to a country, region, or city. Of course, if the user is going through a proxy, then this is the proxy's address.
- Proxies. Many HTTP fields specify the use of a proxy server. Some even identify the network address that is being relayed. In the worst case, the server can compare the client's network address to a list of known proxies in order to identify a proxied connection.
- Network stack. The hypertext transfer protocol (HTTP) sits on top of the transmission control protocol (TCP) and Internet protocol (IP). The packet headers for the TCP and IP portions can identify the type of network connection (dialup, DSL, fiber, gigabit, etc.), operating system (such as Windows, Linux, or Mac), and even approximate the client's computer uptime. (See p0f for a great overview of passive operating system fingerprinting at the network layer.)
Do you feel ill?
In 2014, I began to look closely at the HTTP headers received by the FotoForensics server. At the time, I found that some browsers were sending unexpected cookies to the server.With HTTP, the server can issue a cookie to the client, and the client is expected to return the cookie to the server. Cookies provide a simple solution for session maintenance. (You wouldn't want to log into a site and see someone else's account information, would you?) According to the protocol, the client will only return cookies to the site that issued them. (I'm skipping over the whole issue of third-party cookies.) If you look at the cookies on your computer, you will see each of them associated with specific domains. This is how the browser knows to send the right cookie to the right server.
However, I occasionally see browsers that submit unknown cookies to my server. These are cookies that my server never provided to the client. In many cases, I can identify the service that generated the cookie -- and it isn't me. (I've seen cookies from Google, CloudFlare, gamer networks, banks, and more!) I managed to track down the cause for most of these unexpected cookies:
- Cause #1: Infected. The web browsers are infected with one or more viruses that insert cookies incorrectly. Basically, the malware performs a man-in-the-middle attack, intercepting all network traffic. Unsurprisingly, the malware writers are lazy. Rather than associating each cookie with specific domains, the malware just sends every cookie to every domain.
Since most web servers ignore unknown cookies, these unexpected cookies are not an issue with regards to functionality. It only becomes an issue when two different web servers use the same cookie name (and are being access by the same browser at the same time -- extremely uncommon), or when the cookie contains personal information that leads to a privacy issue. - Cause #2: Tagging web traffic. Some proxies and ISPs add in cookies for tracking requests. This isn't a problem for HTTPS connections, but it is an issue for HTTP.
- Cause #3: Residues from proxies. If a user previously used a proxy or ISP that added in cookies (see Cause #2) and then changed proxies or ISPs, then the browser will still send the inserted cookies to my server. (The browser has no way to know that the cookies came from a man-in-the-middle and not from my server.)
- Cause #4: Bots and apps. Some web bots or standalone applications insert their own cookies. I'm not sure what they expect from pushing their own unexpected cookies to my server, but that's what they are doing.
Meanwhile, people with spyware, well, they are definitely being closely monitored. UserVoice, AddThis, Aidata, MindSpark, Clkmon are all types of adware that alters cookies and spies on user activity. Even large companies like Alibaba have spyware that inserts cookies.
Tossing cookies
Beyond cookies, there are dozens of HTTP header elements that denote malware, adware, and spyware infections. From your single web request to my server, I know if you are infected. And amazingly, about 1% of users are readily detectable as being compromised.Keep in mind: If the server does not detect anything, then it doesn't mean anything. You might be clean, or you can be infected with something that doesn't change the HTTP header. However, if it does detect something, then you are definitely compromised -- either by malware or by a man-in-the-middle.
Knowing that a client's computer is infected seems to provide a strong indicator of their intent. For example, if a user has detectable adware/spyware, then they are nearly 10 times more likely to upload prohibited content (porn) to FotoForensics. One plausible explanation is many porn sites host malware, or "malvertising", that installs adware and spyware. As a result, these porn connoisseurs get infected. And when people upload files to FotoForensics, they upload pictures that are related to their interests (i.e., porn). At that point, the user uploads porn and the server sees that the user's browser is infected.
Keep in mind: lots of users upload pictures to FotoForensics -- both permitted and prohibited content. You can be infected and upload workplace-safe content. You can also have no detectable malware and still upload prohibited content. If you are cautious, then you can visit porn sites without getting infected, and if you don't take protective steps, then you can get infected without ever visiting a porn site. This finding only notes that users with detectable compromises are much more likely to upload prohibited content.
Just what the doctor ordered
I'm always hesitant to inform users when they have an infected computer. In my experience, a solid third of users will blame the person who points out the infected. The gripes are usually like "I wasn't infected until I went to your site!" or "You're wrong and a liar!" This is like blaming the doctor who diagnoses you for giving you an illness. You didn't receive it from the doctor; you already had it and the doctor pointed it out.Another third will beg and plead for assistance in removing the infection. Their belief is that I found it, so I must help them with the cure. Just because I can detect the infection doesn't mean that I have the knowledge, tools, or time to help everyone who is infected. (I had one guy a few years ago who was willing to pay me $1,000 to remove the infection. I told him he could spend much less money if he took his computer to a company that specializes in malware removal; malware removal is not my specialty.)
The final group of people usually ignore the notice. ("So what... My computer still works.") In my experience, only a small percentage actually say "thank you" or do something about it.
Still, I do want to offer some kind of assistance. Is there any interest in an online site that tests HTTP headers for indicators of adware and spyware infection? If so, then I'll make a public test page.
Read more about Forensics, FotoForensics, Network, Privacy, Programming, Security
| Comments (9)
| Direct Link
Maybe you could ask for their experiences in administrating such a page.
I like your conflicker infection test. However, that's looking for web browser reaction and not HTTP headers. (You should also add in a little JavaScript with a delay that tests if the pictures loaded so that users can see an automated summary.)
I'm thinking of a server-side system (no client-side javascript and no manual interpretation) that identifies certain types of infection.
sounds good to me.
Does this already exist somewhere else?
From your description it sounds like it might.
Not that I know of. (That doesn't mean that it doesn't exist; only that I couldn't find one.)
This seems like a good compromise between informing people (and getting the reactions you describe) and offering a simple self-administered test, with the warning that you can't help them any further.
I guess a single server doing this kind of detection might just become one that will be picked up by malware coders. Then they will fake your responses (or manipulate the hosts file) and your user will be none the wiser.
If, however this technique would be adopted by numerous websites, it could provide some impact. So, I suggest to go public with the details and find some friends to do this together.
You might want to discuss this with Bruce Schneier or present your findings at some conference.
Sherlock would be the wrong role model for this, better look at Molly Hooper and her abominable brides...
You have some good points.
I'm going to skip the conference option. (I don't feel like waiting 6-12 months before releasing this.) But I'll make sure to detail what the code looks for.
As far as helping bad guys: I don't think this will have any impact. Most malware doesn't change the HTTP headers, but the ones that do are really obvious (as soon as you know what to look for).