Been talking to one of the guys who works on anti-spam technology lately, about the ways forward in dealing with spam – in our case, we’ve been discussing how it might be tackled for forums, though blogs have much the same problem too.
And we realised that innovation will very definitely be required.
Spam is a very big problem for sites with any degree of user submitted content, especially forums and blog comments, but any site that allows for any kind of user submitted content is potentially at risk, including guestbooks and social sites.
What’s more interesting is that the bulk of the current views are looking in the wrong direction, in my opinion at least.
The main method of blocking spammers from such sites involves registration of user details, followed by validating that registration in some way, and maybe even vetting the first few of a new user’s posts. But the main ones are the registration and validation of an account.
Once upon a time, it was mostly sufficient to simply validate email addresses, that it had to be a valid email address entered, and you’d send a code to it to activate the account, but that’s not been a solution for years, unfortunately – spammers today are much more capable. So the next progression is to have databases of known offenders and mutually block those users from having valid accounts or making posts.
Such methods can be very effective – except there’s one very big problem: they’re about to be shot down in flames. Right now three elements together make up the profile of spammers on such databases: username, email address and IP address – because invariably that’s all the information you’re going to have on a prospective spammer. And maintaining a database on bad IPs is feasible for now with IPv4 – but once IPv6 goes widespread, it’s going to be a colossal headache for all concerned.
This is why, with Wedge, we’re taking the fight back the other way. We’ve done all kinds of tricks like integrating Bad Behaviour, amongst others, that don’t rely on profiling the user, but profiling the *behaviour* of the spammers.
I’m not going to talk too much about it, don’t want to give away too many details until we are publicly launched (whereupon the spammers will no doubt demonstrate that our methods are not quite as effective as we’d hoped, but with any luck, better than nothing!)
It all comes back to Sun Tzu, and his Art of War. “Know yourself and know your enemy, and you will be victorious.” Most of the anti-spam technologies out there currently don’t know the enemy, especially on the CAPTCHA front as I explained before – the bulk of them set out to just take some text and mash it around in ever more convoluted ways, rather than trying to understand how CAPTCHAs are beaten in the first place. When writing CAPTCHAs, I spent time trying to understand how the CAPTCHAs were actually beaten, and started devising methods to put a stop to it. It remains to be seen how successful they will be, though.
There endeth the lesson for today, really: back to the usual, if you think about it – don’t treat the symptoms, treat the cause. Fighting spammers off by fighting the tendrils won’t stop it coming at you, it just stops it for the moment until it’s grown new ones.
And, as per that thread, it’s going to continue to be an arms race – but the more that is invested in studying the behaviour, the more that is able to be achieved: the better you know your enemy, the easier victory can be achieved.