msgbartop
Unmask Parasites - Check your web pages for hidden links, iframes, malicious scripts, unauthorized redirects and other signs of security problems.
msgbarbottom
Loading site search ...

“Cheap Vista” or Cloaked Spam on High-Profile Sites

   01 Oct 09   Filed in Website exploits

In this post, I’ll show how cybercriminals used hacked high-profile sites to drive search traffic to online stores that sell pirated copies of popular software and, presumably, steal credit card details.

I’ve been watching this sort of search spam for more than a year now. And after this post in Google’s Webmaster Help forum, I decided to take a closer look at this this problem.

Millions of interlinked spam pages are hosted on hacked high-profiles websites, which makes them rank well on Google and occupy top positions in search results for keywords targeted by spammers.

Hacked sites include:

For example, if you search for “Cheap Vista for Students” on Google, you’ll see something like this:

Cheap Vista for Students

Almost 20 million results. Impressive, isn’t it? And although Google wouldn’t show more than 350 results (too little unique content), 99% of them were spam.

Redirect to rogue site

As you can see, the first page of results contain links mainly to reputable .edu domains. However, if I click on these links, an online store that sells pirated software will open.

Soft4pcs

This redirect is not a result of malicious activity of trojans on my computer (I’m on Linux). HTTP headers reveal the server-side 301 redirect from the .edu site to soft4pcs .com

HTTP/1.1 301 Moved Permanently
Date Mon, 28 Sep 2009 11:15:54 GMT
Server Apache/2.2.3 (Ubuntu) PHP/5.2.1
X-Powered-By PHP/5.2.1
Cache-Control no-cache, must-revalidate
Pragma no-cache
X-ENGINE rx-engine
Location http://soft4pcs .com/shop/item/47/?cpn=wmtu_resnet_mtu_edu_soft2
Content-Length 0
Keep-Alive timeout=15, max=100
Connection Keep-Alive
Content-Type text/html; charset=UTF-8

Alternative names of the same site: soft4windows .com, download-journal .com, oem-box .com – let’s call them SOFT4

Signs that this is not a legal site:

  • No information about the company and real addresses.
  • Fake certificates without links that can prove them
  • 90% discounts for most products

If you decide to buy something in this store, you’ll be redirected to a “secure” order form on
bill4soft . com/order/shop.

bill4soft

The payment site is actually the same SOFT4 (you can see it if you open the home page on bill4soft .com ) – it’s just an alternative domain name with a “verified” security certificate. However, as you can see, the fact the the certificate is verified doesn’t add any trust to it: the only information it provides about the owner is the domain name.

Fair exchange: illegal copies of software for valid credit card numbers

I’m not sure if you get an opportunity to download the software if you pay (on the FAQ page they mention that “sometimes” their email with download links can be “mistakenly” blocked by ISPs and deleted as spam), but I’m sure that the cybercriminal will find a “creative” way to use your credit curd number and all the personal details you’ve just provided to them.

Well, now you can see why they hack sites and spam Google – it’s a profitable (though illiegal) business.

Now lets talk about how they game Google and webmasters.

Spammy pages

Many high-ranked websites has been hacked to place spammy intermediary pages there. Pages on established trusted domains will rank better than similar pages on unknown and less-known sites.

Where possible, hackers used legitimate web pages as templates for their spammy pages. They just replaced normal content of web pages with spammy keywords and links, preserving the markup. This way search engines shouldn’t be alerted since the pages don’t look alien.

Webby Awards Layout
A spammy page on a Webby Awards site – the same look&feel

Hackers usually create a few hundred to a few thousand such spammy pages that target specific keywords. To increase their ranking and make them all discoverable by search engine bots, these pages are interlinked.

Cloaking

The spammy pages are only for Google. I can see them only if I switch my browser’s User Agent to Googlebot. (I used the User Agent Switcher Firefox plugin). Otherwise, I would get a standard “404 – Page not found” error. This helps to hide the hack from webmasters, who might think that Google mistakenly indexed non-existent pages on their website (You can hear such speculations on the Google’s Webmaster Help forum quite often.) This “black hat” technique is called cloaking.

As far as I can understand, hackers should hide their illicit pages from site owners (404 error), show spammy pages to Googlebot, and redirect the rest visitors to the SOFT4 site.

Driving traffic to the SOFT4 site is the real purpose of these spam pages. They should rank well on Google (they really do) and when users click on the search results links, instead of unintelligible spam pages they’ll get redirected to real rogue online stores.

I have yet to figure out what triggers the redirect instead of the 404 error. On the same machine, I consistently get redirects from Firefox 3 under Linux, and consistently get 404 errors for the same URLs when I open them in Firefox 3.5 under WinXP.

It looks like they don’t like Firefox 3.5 for some reason. I tried a few different User Agents (Firefox 3 on Ubuntu, Firefox 3 on WinXP, IE7 on Vista, IE 7 on XP, Netscape 4.8 on Vista, Opera 9.25 on Vista) and they all redirected to soft4pcs site. However when I switched my browser’s User Agent to “Firefox 3.5.3 on XP” I consistently got 404 errors. Moreover, the error pages contained a cookie (e.g. site_domain_edu_soft2_visit=ban) that expires in a year, so that even if I switched the User Agent back to any of the “allowed” values, I would still get the 404 error page.

.htaccess?

At first I thought all logic was in .htaccess files that contained conditional rewrite rules based on visitors User-Agents and requested URLs. This looked sensible. As you might have noticed, all spammy URLs have the same pattern:
/promos/?7354=Microsoft-Windows-Vista-Ultimate-(32Bit).html
/software/?catalog_id-2731_download-vista.php
/usage/OEM/?9443/download-OEM-software/vista.php

It is easy to craft a regular expression that would redirect such requests to actual spammy pages.

However, when I found a few site powered by IIS, I had to dismiss the .htaccess hypothesis.

PHP scripts

The only common denominator for all the hacked sites is PHP. Their HTTP headers reveal support of PHP. So most likely all the cloaking logic is inside a PHP script. This explains why 404 error pages contained hacker-defined cookies. Moreover, I’ve found a few broken scripts that reported PHP errors (they failed to include some files).

With PHP, hackers may have their spammy pages encrypted so that when webmasters try to scan their servers for keywords like “vista”, “viagra”, etc. they won’t be able to find anything. Encrypted pages can be decoded on the fly by the script. (A new message in the aforementioned forum thread proved this hypothesis: they use base64 encoding algorithm).

The same error messages revealed possible places where hackers can hide their files. In that case the site was WordPress powered, and the files were located in an special WordPress directory used for file uploads: wp-content/uploads. This directory usually has 777 permissions to make it possible to upload files (e.g. images, documents) directly from WordPress web interface. Here are the paths to illicit files:
wp-content/uploads/2008/10/crystals.php
wp-content/uploads/2008/10/.cache/.%D59C%49AA%73A8%63A1%9159%0441

Paths reported in the forum for another compromised WordPress installation:
/wp-link.php – (this file is not from a standard WP package)
/wp-includes/js/tinymce/themes/advanced/images/xp/.cache/

Outdated third-party scripts

Having analyzed content on many compromised sites, I think that most of them have been hacked using vulnerabilities in web software, used on the sites. Outdated versions of blogs, wikis, CMS, etc. can be found on almost every hacked site. And the “leader” is Moodle (open-source community-based tool for learning). It is very popular for educational sites and and it is very popular within all sorts of hackers and spammers. Just try this Google search to see what I mean. As you can see there are many known vulnerabilities and even a slightly outdated version or improperly maintained installation is a backdoor for hackers.

Vulnerabilities in third-party scripts is not the only possible attack vector. Stolen FTP credentials and brute-force attacks shouldn’t be discounted.

Boosting ranking of cloaked spam pages

Well, now that you see how cloaked spammy pages work, let’s talk about how hackers managed to make those pages rank well on Google. It’s not enough just to place them on a high authority domain. High ranking is almost impossible without external inbound links from legitimate web pages (preferably with high PR) from other sites. Links from indexed legitimate web pages are also needed to have Google discover the spammy pages.

Here comes another type of illicit content on compromised web sites: legitimate web pages with loads of cloaked spam links injected. As in the previous case, the links are only injected when the page is requested by Googlebot and there is no trace of them if it’s a normal web browser.

iesalc unesco cloaked spam links
Here’s how high-ranked home page of UNESCO’s IESALC looks like if I switch my browser’s User Agent to Googlebot.

Unmask Parasites tool can also be used to reveal such links.

Spam links in Unmask Parasites report

Although the pages are not marked as suspicious here (after all these are normal links to legitimate web sites), link texts like “download windows xp professional cr-rom” can easily unmask them as alien.

To make the spam links less visible to “too curious” webmasters who might also want to check how their web pages look with Googlebot user-agent, hackers enclose the link block with the following two scripts.

<script>document.writeln('<'+'d'+'i'+'v'+' '+'s'+'t'+'y'+'l'+'e'+'='+'"'+'p'+'o'+'s'+'i'+'t'+'i'+'o'+'n'+':'+'a'+'b'+'s'+'o'+'l'+'u'+'t'+'e'+';'+'t'+'o'+'p'+':'+'1'+'0'+'0'+'p'+'x'+';'+'r'+'i'+'g'+'h'+'t'+':'+'1'+'0'+'p'+'x'+';'+'w'+'i'+'d'+'t'+'h'+':'+'1'+'5'+'0'+'p'+'x'+';'+'h'+'e'+'i'+'g'+'h'+'t'+':'+'5'+'0'+'p'+'x'+';'+'o'+'v'+'e'+'r'+'f'+'l'+'o'+'w'+':'+'a'+'u'+'t'+'o'+'"'+'>')</script>

<script>document.writeln('<'+'/'+'d'+'i'+'v'+'>')</script>

These scripts make all spam links displayed inside a small 150×50 block. This block is not completely hidden to avoid undesirable suspicion from Googlebot. (The screenshot of the UNESCO’s IESALC page was made with disabled scripts to make the spam links prominent.)

The cloaked spam links on legitimate web pages promote spammy pages on other compromised websites and sometimes on their own site. This cross-promotion from high-ranked legitimate pages makes spammy pages rank well on Google and dominate on first pages of search results for relevant keywords.

Here is a basic scheme of this spam campaign.

scheme: search spam cross-promotion

I want to hope that posts like this can improve the situation with spam, so I appeal to all parties who can change things for the better.

To webmasters:

  • Keep third-party scripts up-to-date.
  • Regularly check if Google has indexed any pages on your site for unrelated keywords (e.g. viagra, mortgage, casino, “cheap vista”). Use Google’s site: command to narrow down searches to your site. E.g. site:example.com viagra
    If you use Unmask Parasites, in the “Additional Tests” section of each report you’ll find the “Reveal hidden spam links with Google site-wide searches” link. This link will generate a list of search strings that you can use to find pages with illicit content indexed by Google (the searches can reveal both injected hidden spam links and cloaked spam pages)
  • Regularly check web server logs. They may reveal requests to illicit pages and visitors that used irrelevant keywords to find your site via Google search. Pay special attention to 301 and 302 redirects.
  • If you feel that your site/server may be affected by this attack, try to scan the whole directory tree for suspicious directories like “.cache” and for files that contain base64_decode function.

To IT departments:

  • Consider security risks when you are choosing software for your website.
  • Free open-source software is cool. But if it is unattended on public web servers it’s a serious security problem. Please, take stock of every third-party script used on your servers and make sure to keep them all up-to-date. Even if you no longer actively use and update some section of a websites for many years, it still needs to be updated if you don’t want it to be used to compromise your whole site.

To Google:
I’m sure you are aware of the problem and you know how to detect cloaking. So why are those spammy pages are still included in your search results? Please, delist spam sections of hacked legitimate sites. This won’t affect legitimate content and at the same time will make this sort of spamming useless. Even if you have someone manually identify and block spam sections of compromised websites used in this “pirated software” campaign, it should take just one day to complete the task. All those cloaked spam pages are hosted on just about a hundred of sites and all illicit URLs have the same pattern. Correct me if I’m wrong. And don’t forget to inform webmasters so that they know their sites are hacked.

Update: I’ve reported the “Cheap Vista for Students” search as spam here.

To readers:
This post reflects the situation with search spam as I see it from my computer. It is enough to make some conclusions, but without access to the compromised sites the picture is not complete. If you happen to administer one of the compromised sites, or just know more about this issue, please share your information here.

Any other comments and corrections are welcome as well.

Similar posts:

Reader's Comments (12)

  1. |

    [...] ———- 怪しげなソフトの購入 = クレジットカードスキミング “Cheap Vista” or Cloaked Spam on High-Profile Sites [...]

  2. |

    [...] This post was mentioned on Twitter by InfoSec4All, Joseph Clore, Ricardo Delgado, Joe Burton and others. Joe Burton said: internetcrimes.net “Cheap Vista” or Cloaked Spam on High-Profile Sites | Unma.. http://bit.ly/2GGOd0 Training Today! [...]

  3. |

    An easy way they could pull of showing different sites to real visitors is to check if the refer came from a search engine or a type in.

    • |

      Yes. This trick is often used to redirect search traffic when hackers inject mod_rewrite rules into .htaccess files. It leaves webmasters unaware of the problem since they rarely use search engines to open their own sites.

      However in this case, the hack doesn’t seem to check the referrer, and you get the same results both when you click on the search results and when you type the URLs in.

      • |

        They’re checking the HTTP referrer. I just tested it: when clicking from a Google results page with sending referrer info disabled in my browser I get a 404, with sending referrer info enabled, I get redirected.

        • |

          Probably the logic is more complex. I get redirected even if I just paste the URL into browser’s address bar and hit enter.

          It would be intresting to hear from webmasters of the compromised sites and see the code used by hackers.

  4. |

    [...] on how these rogue activities are effecting search results read the following blog post from Unmask Parasites when they demonstrate the extend of the issue and how users and search results are being [...]

  5. |

    [...] because such sites have gamed Google to land at the top of its search results. Some of these are legit sites that have been compromised, others are total fakes from the get-go. (And Bing is even worse, FYI.) Even if they don’t [...]

  6. |

    Good things for site owners and webmasters to be aware of.

  7. |

    Thank you for your very informative explanation of this part of the international criminal activity in pirated software.

    I am amazed that the FBI and the victim companies don’t police this sort of activity intensively.

    And, also why Google and the other search engines don’t get more pro-active.

    Maybe with “Rewards” offered by the gov’t and software companies more bounty hunters would do the legwork to shutdown these criminal enterprizes.

    And, ultimately, in the world of commerce buyers should beware, and patronize only legitimate vendors.

    Thanks for your excellent work.

  8. |

    Hi,

    You implied a few times on your blog that Linux is immune to malware, if I get your point right.

    However, there are Linux trojans and viruses in the wild, at least as Wikipedia states that.

    Could you please shed some light on Linux safety or it’s just another urban myth?