August 15, 2010

Spam in Blogs: Form of Spamdexing

Silver balls
Spam in blogs (also called simply blog spam or comment spam) is a form of spamdexing. It is done by automatically posting random comments or promoting commercial services to blogs, wikis, guestbooks, or other publicly accessible online discussion boards. Any web application that accepts and displays hyperlinks submitted by visitors may be a target.

Adding links that point to the spammer's web site artificially increases the site's search engine ranking. An increased ranking often results in the spammer's commercial site being listed ahead of other sites for certain searches, increasing the number of potential visitors and paying customers.

Possible solutions

  • Disallowing multiple consecutive submissions
    It is rare on a site that a user would reply to their own comment, yet spammers typically will do. Checking that the user's IP address is not replying to a user of the same IP address will significantly reduce flooding. This however proves problematic in the fairly rare instance when multiple users, behind the same proxy, wish to comment on the same entry.

  • Blocking by keyword
    Blocking specific words from posts is one of the simplest and most effective ways to reduce spam. Much spam can be blocked simply by banning names of popular pharmaceuticals and casino games.

    This is a good long-term solution, because it's not beneficial for spammers to change keywords to "[email protected]" or such, because keywords must be readable and indexed by search engine bots to be effective.

  • nofollow
    Google announced in early 2005 that hyperlinks with rel="nofollow" attribute would not be crawled or influence the link target's ranking in the search engine's index. The Yahoo and MSN search engines also respect this tag.

    Using rel="nofollow" is a much easier solution that makes the improvised techniques above irrelevant. Most weblog software now marks reader-submitted links this way by default (with no option to disable it without code modification). A more sophisticated server software could spare the nofollow for links submitted by trusted users like those registered for a long time, on a whitelist, or with a high karma. Some server software adds rel="nofollow" to pages that have been recently edited but omits it from stable pages, under the theory that stable pages will have had offending links removed by human editors.
Some weblog authors object to the use of rel="nofollow", arguing, for example, that

  • Link spammers will continue to spam everyone to reach the sites that do not use rel="nofollow"

  • Link spammers will continue to place links for clicking (by surfers) even if those links are ignored by search engines.

  • Google is advocating the use of rel="nofollow" in order to reduce the effect of heavy inter-blog linking on page ranking.

  • Google is advocating the use of rel="nofollow" only to minimize its own filtering efforts and to deflect that this actually had better been called rel="nopagerank".

  • Nofollow may reduce the value of legitimate comments
Other websites like Slashdot, with high user participation, use improvised nofollow implementations like adding rel="nofollow" only for potentially misbehaving users. Potential spammers posting as users can be determined through various heuristics like age of registered account and other factors. Slashdot also uses the poster's karma as a determinant in attaching a nofollow tag to user submitted links.

rel="nofollow" has come to be regarded as a microformat.

About the Author


Author & Editor

Has laoreet percipitur ad. Vide interesset in mei, no his legimus verterem. Et nostrum imperdiet appellantur usu, mnesarchum referrentur id vim.

Post a Comment

Iwebslog Blog © 2015 - Designed by