MediaWiki, the platform which Wikipedia uses, is a familiar and easy to install platform. On the other hand, MediaWiki is an extreme example of user generated content (UGC). As opposed to other platforms like WordPress were the contribution is limited to the marginal sections, MediaWiki allows contributors to add or delete content from the article (i.e. 'wiki page') itself. Additionally, while one can publish comment after reviewing it, wiki edits are dependent one on another so one can't defer their publication. This makes the platform more vulnerable to web-spam and vandalism.
One solution is to create a private Wiki (i.e. to limit the access to group of approved users) . However, this contradict the philosophy behind this platform and makes it similar to other CMSs. On the other hand, for many business it seam necessary to limit the access to their Wiki in order to maintain their online reputation. After all, not all the businesses have the financial resources or, as wikipedia has, the community to handle with spam-bots or cheap human labor. In this post I'll try to explorer some of the tools that MediaWiki offers to manage spam in order to utilize the full potential of this platform.
One cannot exaggerate the importance of security to online reputation management. Especially, Preventing IDs theft is crucial to separate between legitimate and other users. Moreover, it is may protect your website and your users from disasters.
Using strong admin passwords may prevent spammers form using your privileges to bypass safeguards. In addition it may save you from vandals that try to take over your account in order to block your administration abilities.
In order to protect your legitimate users IDs, you can set the minimum password length in the localsettings.php file:
$wgMinimalPasswordLength = 8 ;
The extension SecurePasswords can add more restrictions like using uppercases or digits.
Email confirmation may also protect users by ensuring that thay will be able to reset their password in case of ID theft. The flowing line will block edits for accounts with unconfirmed email.
$wgEmailConfirmToEdit = true ;
DNS-based Blackhole List (DNSBL)
In addition to passwords restrictions and email confirmation, MediaWiki supply one more mean to block spammers. DNSBL is a common practice to identify spammers by their IP addresses nowadays. Although its false-positives drawback, it is one of the main ways to deal with spam today. The following lines will trigger the MediaWiki platform to perform IP look-up in three different DNSBLs whenever a user edit a page.
$wgEnableDnsBlacklist = true;
$wgDnsBlacklistUrls = array( 'all.s5h.net', 'dnsbl.tornevall.org','l2.apews.org');
I included in the DNSBL list the services that in my experience have the higher true-positives (i.e. 'detections'). However, if the rates of spam edits are still high, one may utilize the extension CheckUser to locate the IPs from which that edits carried and perform DNSBL lookups to identify which DNSBLs can detect most of that IPs and add them list above.
The combination of multiple DNSBLs may decrease the rates of true-positives on expense of higher false-positives (i.e. lower rates of spam but higher rates of blocked legitimate users.). Now, although this may be bad practice for large companies, for small companies with small budget this is necessary evil.
Although the above measures can dramatically decrease the rates of spam, they are not sufficient to combat spam-bots. Chasing after spam-bots that can create multiple accounts and mass of edits once they locate unblocked IP may be very exhausted task. Moreover, these bots can consume a fair amount of bandwidth.
CAPTCHA puzzles are one of the most popular ways to stop bot-spam today. Google reCAPTCHA Service provides an API for websites that want to utilize its reCAPTCHA system for free. The System, which serves some of Google's digitisation projects, is constantly avolving to adapt itself to the most recent attacks. In addition, it has audible alternative in order to make it accessible for people with visual disabilities (Although its accessibility and usability are disputed).
The ConfirmEdit extension utilizes the reCAPTCHA API in order to detect spam-bots. This extension may also utilze Microsoft's Asirra puzzles, that considerd more difficult for computers and at the same time more user friendly (however not accessible for blind people). In addition, this extension offers some less robust CAPTCHAs systems like MathCaptcha and FancyCaptcha.
Bad Behavior is another extension, that can detect bots through their HTTP requests headers. This extension can detect spam-bots as well as other harmful bots like e-mail address harvesters. The extension code, wich may run on varius PHP platforms, have been proved to be very efficient even on high-traffic web-sites. Moreover, it may speed-up website load time by reduceing its resources usage. Bad Behavior's MediaWiki instalation is straightforward, however one may aplly some extra configurations in some rare cases.
There are also regular expression (regex) based extensions like SpamBlacklist, TitleBlacklist and Phalanx that blocks edits that match one or more regex from predefined list. In adition, one may block future edits by setting $wgSpamRegex in the localsettings.php file. However one should be carefull not to block legitimate edits (see AVOID FALSE POSITIVES!).
Althogh all these means can dramatically reduce wiki-spam, thay can't eliminate spam completely. In order to discover unrecognized spam edits, one may set the Email notification (Enotif) options in the localsettings.php file. Also, it is highly recommended to add the wiki email address to your e-mail contacts list in order to prevent the notifications from going to spam.
Finally, there are times when one user can create a lot of spam . The Nuke extension gives wiki administrators the ability to mass delete pages that has been created from specific account or IP address.
If you liked this post, you might also enjoy 10 Most Notorious Wikipedia Editing Scandals
Yaniv is an independent writer that inquires search engines in general and topical search engines in particular. He also explores methods for optimizing custom-built search engines.