Loading site search ...
Filed in Website exploits
Cloaking in SEO is defined as a technique in which the content presented to the search engine spider is different from that presented to the user’s browser (Wikipedia). But in case of hacked sites, cloaking is more tricky than just different content for search engines and for real users. It can also be different content for different types of users. Moreover, the internal implementation is usually hidden (cloaked) from webmasters of compromised sites.
This post will be about one of such site hacks that involved SEO cloaking and used quite an interesting trick to alter page content.
First of all, I’d like to thank Jim Walker (Hack Repair) who shared the story of one hacked site and sent me the malicious code found there.
At first, it looked like a typical site affected by a black hat SEO cloaking:
- You could see cialis/viagra keywords in Goole search results for that site (site: query).
- You could see a modified title with “pharma” keywords in Unmask Parasites reports.
- At the same time nothing like that was there when you opened the site in a web browser.
But then you would notice something puzzling:
- The spammy content was only present when you requested “www.example.com” and never when requested “www.example.com/index.php“, although it was a Joomla site and “www.example.com” is absolutely the same as “www.example.com/index.php“
- The spammy content changed quite often. For example, your initial Unmask Parasites report could include only a suspicious title, but a couple of hours later Unmask Parasites would also report a bunch of spammy external links as well. And a few hours later the links could change or disappear.
- And inspection of files on server showed that there were no modifications in files responsible for generation of HTML code for <title>, <meta description…> and the places that contained spammy links.
Of course, although that’s not typical, the cloaking code can distinguish between “/” and “index.php” requests, spammy content can be downloaded from a remote site in real time and the code itself can use smart tricks and CMS APIs to hide itself in surprising places and modify web pages on the fly. Further analysis showed that while all this guesses were true, the actual tricks’ implementation was more interesting than we originally thought.
How it all works?
A site scan revealed a file called “updtr” in the “images” directory inside one of the active Joomla template’s sudirectories. At the top level of the template, there was the “styles.php” file were hackers added one line of code in the middle of the file:
Looks quite innocuous, isn’t it. But if you open the “updtr” file you’ll see this (full version on pastebin):
<?php error_reporting(0); eval(gzuncompress(base64_decode('eF59Um1rgzAQ/...skipped...HF7rzfjRDublzMyFohTO9/wX8t+Ms'))); ?>
<?php eval(gzuncompress(base64_decode('eF6FU9tum0AQ/ZU8WHKi9oGr61XEg52wGI...skipped...hP0TxcRfPrnv+jt/x5c/v7DxO+YfA='))); ?>
<?php eval(gzuncompress(base64_decode('eF6FVW1rGzEM/itXCDRmIVh+Ox/ZlXWs7E...skipped...4lg+/txWmabb8u97Pjv8BlyxTKw=='))); ?>
That’s what a typical encrypted malicious PHP code looks like. (Reminder: don’t limit your searches for malicious PHP patterns to *.php files only.)
Once decoded (pastebin), you can see a pretty short code that checks various request parameters and returns different versions of web pages. What is interesting is how it does it.
At the top level, we see three types of supported requests:
- Requests with the “NZgroup” keyword in the User-Agent string. These requests check if the cloaking code is installed. They expect CHETKO##OK as a response.
- Requests to “/” (site root without index.php) from any browsers whose User-Agent string is not “Kaspersky Internet Security“. Response HTML for such requests is downloaded from a remote server. This is the most interesting part that will be described below.
- The rest requests. They go unchanged as if there were no malicious code at all.
So what happens when someone requests a homepage and why does the code specifically checks for the “Kaspersky Internet Security” User-Agent? Well, in this branch, the malicious code collects all the data from the request’s HTTP headers (user IP address (REMOTE_ADDR and HTTP_X_FORWARDED_FOR), site domain (HTTP_HOST), REQUEST_URI, HTTP_USER_AGENT, HTTP_ACCEPT_LANGUAGE, HTTP_REFERER, SERVER_SIGNATURE, QUERY_STRING) and sends them as a POST request to hxxp://mainserverprocess .net/googleornot/dtbnz.php. The result of this POST request is returned to a user as-is (without any further modifications).
As you might have guessed, the result will be different depending on the parameters sent to dtbnz.php on mainserverprocess .net.
Regular request from a normal user
If the dtbnz.php script detects a normal user, then it needs to show them a normal homepage. How can a script on a remote server get the HTML code of a third-party website? The answer is like everybody else: load it via HTTP (people use web browsers for that ;-) ). After all, the script has enough information to do it (site domain and the exact request). Good, but we know if this script requests a homepage from the hacked site, the cloaking code will try to contact dtbnz.php on mainserverprocess .net again. Looks like a deadlock! Not really. Do you remember the check for “Kaspersky Internet Security” User-Agent string? That’s right, the dtbnz.php script uses this User-Agent string for its own requests to avoid deadlocks!
If we check access logs, we’ll find those requests with the “Kaspersky Internet Security” User-Agent there. Those log entries will also show that the requests come not from directly the dtbnz.php script. mainserverprocess .net‘s IP address is 126.96.36.199 while the “Kaspersky Internet Security” requests come from 188.8.131.52. This means that spammers have at least two different servers (on the Fdcservers.net network), each of which has it’s own role in this attack.
If the dtbnz.php script detects a request from Google (based on the IP addresses sent by the cloaking script), it retrieves the HTML code of the hacked homepage just like in the above scenario, but before sending it back to the cloaking script on the hacked site, it makes some spammy modifications:
- Normal <title> is replaced with a spammy title.
- A spammy descriptions is added to <meta name=”description” content=”…””/>.
- Sometimes (but not always) they also add a block of spammy links to other hacked sites. These links regularly change so that only the sites that currently bear the cloaking code (remember the “NZgroup” requests?) are being promoted.
Example of the added link block:
<div id="pharmacy"><h1>online pharmacy cialis</h1><a href="http://www.example.com/" target="_blank">cialis online canada</a> ...skipped many links....<a href="http://example.org/" title="Cialis online 20mg">Cialis online 20mg</a> ...skipped some spammy text about erectile dysfunction treatment drugs...</div>
The rest content of the web page is left the same so that Google doesn’t flag it as suspicious for a radical content change.
Clicks on Google search results
So if spammers make Google index “pharma” keywords on the site what happens when people click on such search results?
The cloaking script checks the HTTP_REFERER header of requests and if it contains specific keywords (viagra, cialis, propecia, lipitor, and nexium) then it passes the keywords as the “pill” parameter to the dtbnz.php script. In all other cases the “pill” parameter is omitted.
When the dtbnz.php script sees known pills in requests then it simply fetches a corresponding online store pages from www .your-online-pharmacy .net or www .rxprofits .com and slightly modifies them so thatt they work properly off of a hacked site.
If the pills parameter is omitted then the dtbnz.php script fetched an unmodified copy of the hacked site homepage.
As you can see this particular approach has a key features:
- The actual spammy content is not stored on a compromised server.
- This makes malware footprint smaller and its discovery more difficult.
- At the same time, it’s very flexible since hackers don’t have to break into compromised sites whenever they need to change anything.
- Spammy content is not directly coupled with the hacked site source code:
- Webmasters can’t simply identify files with malicious code (e.g. if you see spammy keywords in the <title> tag, it doesn’t mean that the culprit is in your files that has or generates the <title> tag)
- Hackers don’t have to customize their code for every site/CMS/template/etc. They just need to upload one file and inject a single line of code into any *.php file loaded during the homepage generation. The rest will be done by a remote server.
Although this particular cloaking attack does quite smart things to stay under the radar, webmasters can still easily detect it and find the offending files. They only need to use proper tools and follow essential security practices.
High level detection
Since the goal of cloaking is to have Google index spammy content on your site, Google is the best tool to detect such issues.
- Simple [site:yourdomain.com] query can reveal strange titles and page descriptions of your web pages.
- Most black hat SEO hacks focus on the following topics: counterfeit prescription drugs, payday loans, fake luxury goods, porn, pirated software and movies so you use the site: searches along with corresponding keywords to find any pages that may contain them, for example: [site:example.com viagra OR cialis]. Of course this works only if your site does cover the same topics so the choice of the keywords is important. Here is a list of some keywords that may help reveal spam on your site: viagra, cialis, tadalafil, zanax, zoloft, nexium, pharmacy, “payday loan”, gucci, chanel, “cheap luxury”, porn, poker, casino, “cheap Windows”, “OEM software”.
- Instead of doing the spammy keyword site: searches manually, you can setup Google alerts and have Google notify you if it finds those keywords on your site.
- Regularly check Google Webmaster Tools reports. Especially the “Traffic” section where you may spot strange “Search Queries” and “Links to Your Site“. Edit Aug 21, 2013: after changes in Webmaster Tools, this section is called “Search Traffic“. You should also check “Content Keywords” in the “Google Index” section.
- It’s always a good idea to check what Google actually sees when crawls your site. You can use the
“Fetch as Googlebot” tool for that (in the “Health” section of Webmaster Tools) Edit Aug 21, 2013: now it’s “Fetch as Google” in the “Crawl” section.
- You may also find Unmask Parasites helpful. Of course, it can’t pick up every single black hat SEO trick but it’s proven to be quite efficient in revealing signs of cloaking (such as changed titles and spammy links). And it works in real time.
Low level detection
While Google can help reveal black hat SEO hacks of your site, you can see signs of the problem only when your site has been hacked for quite some time. Moreover, Google’s reports can’t show you how exactly hackers broke into your site and how they modified it internally. That’s is where you need low level techniques — they usually require more technical knowledge but their results are more timely and accurate.
- Use some integrity control tool for your site. Something that notifies you about changes in the file system so that you know what, when and where changes. In case of the described attack, if the webmaster used some integrity monitoring tool, he would immediately know that someone created a new file (updtr) in the “images” directory and that the styles.php file was modified. He would also have information about any backdoors uploaded to his server. This would be enough to easily revert the changes and clean up the site.
- Another useful technique is log analysis. Even if your site attracts lots of visitors and the logs are huge, you can still spot suspicious activity if you look for certain patterns. For example, scan logs for all POST requests and then make a list of all requested files. Then check the list for files that should not be there or should not normally accept POST requests. Then you might want to get complete logs of activity of those IPs to figure out what else they did on your site. This way you may spot requests to backdoors and vulnerable files. Regular log analysis can help you detect latent security problem and identify the point of penetration so that you can properly close security holes and prevent reinfections.
Of course, don’t forget about all the rest best security practices (e.g. strong passwords, trusted up-to-date and fully patched software, strict permissions, malware-free local computers. etc.) and you may not have to deal with recovering your site after a hacker attack.