Ian Landsman is Starting From Scratch, June 9, 2006:

I Love Spam!

If you're in the market for a powerful and user friendly Help Desk solution, please take a look at my company's flagship product HelpSpot.
The next version of HelpSpot will include a tier of protections for the portal to protect it from spam in both the forums and request submission page. I've been testing it out this week on the UserScape support portal which was getting loads of generic form spam. I'm happy to report that it works super fantastico! The spam protection has 3 layers.

1. Link Counts
Any submission/post that has more than X links in it is autoclassified as spam. This defaults to 4 links by default, but it can be adjusted as needed. This instantly filters out the huge link spams, even if they're brand new and never used before.

2. Timestamp Forms
Each HTML form now includes 3 hidden fields. One with a timestamp, one with the IP the form was created for, and one with a secret hash of the two. When the form is submitted if the timestamp or IP is not original (checked via the hash) it's marked as spam. If they are original then the form cannot be older than 2 hours or it's marked as spam.

This works well because most form spam is done by crawling the site once for forms and then just submitting the same form over and over. Now those stored forms will be invalid after a few hours.

3. Bayesian Filtering
Finally each post is run through a new set of bayesian filters. These filters learn by manual deletions and also when spam caught in the above two methods is deleted.

Really just the Bayesian filter would be enough, but the above two help to keep spam from ever showing up at all. So if the spammers move over to a new set of words rather than having some spam show initially, the first two filter types help to keep them from ever being displayed.

I'm back to enjoying checking the forums. It's great to see the little spam icon with the number of spams captured.




Now I just want them to keep spamming so the filters can get trained up well and I can check this feature off as real world tested.

Oh and if any customers are having trouble with spam drop me an email and I can send you a link to the beta build of 1.3.5 with the spam protection.

Update: I meant to mention, but forgot to that #2 was found on Keith Devens blog. I had been tinkering with something along the same lines, but his solution was simpler and had the added plus that he'd already verified it worked.
Created on 06.09.2006 2:06 pm · Comments (5)


Discussion

Heyy Ian,
Are you ever gonna put up that article about the niches you think can be good? I'm waiting to read that before I pick any idea lol!
Cheers,
Ali.

Created by Ali on 06.09.2006 3:06 pm

That's pretty cool. I like your ideas for the form submission.

By the way, I've always been curious. Are you using some kind of pre-built Bayesian filtering library, or did you write it yourself?

Created by Michael Sica on 06.09.2006 3:06 pm

I'm working on it Ali, I haven't forgotten!

I wrote it myself Michael. Took about a day to figure out Paul Grahams Lisp code and a half day to wire up the HelpSpot specific version. Then lots of testing, etc.

Created by Ian on 06.09.2006 3:06 pm

Note that there's a small problem with my scheme and how it interacts with caching. If a page is cached by a caching server the form post won't come from the same IP address the page was originally requested from, and the post will be rejected. In practice, however, the scheme has blocked a ton of spam with almost no false positives, though there has been about one false positive during the entire time I've been using it.

The alternative is to disallow caching on your pages, increasing your server and bandwidth load. As it stands, my scheme is "broken", but in practice works extremely well.

Created by Keith Devens on 06.09.2006 3:06 pm

Yeah I thought about caching, but it the downside seemed small for the upside of the technique. In the final build I'll have a way to disable that check if desired. I also do send nocache headers on those pages, but that seems to sometimes be hit or miss.

Thanks for the nifty technique!
-----

Created by Ian on 06.09.2006 3:06 pm

 

Leave a Comment

Commenting is not available in this weblog entry.


> RSS 2.0
> Blog Archives (complete list)
> HelpSpot Mailing List

Copyright © by Ian Landsman

Design by Jakob Nielsen