A Plan for Spam

Sergey Alexandrovich Kryukov

5.00/5 (13 votes)

Jan 2, 2013

CPOL

5 min read

51533

How to abate the CodeProject spam crisis.

We are presently experiencing a hard pressure from a narrow group of "TV and Media" spammers who cynically challenge out ability to resist this kind of crime. Members of CodeProject are doing remarkable effort for extermination of unwanted parasites, but the measures taken seem to be not quite satisfactory. My reason for this short article is related to discussion of what we can do with between Chris Maunder and myself:
http://www.codeproject.com/Messages/4462716/Re-Live-streamers.aspx[^],
http://www.codeproject.com/Messages/4462726/Re-Live-streamers.aspx[^].

Several hours later, a fresh idea came to my mind, a variant of the ideas we already discussed. I would ask interested members to think about it and discuss it, criticize and support. Generally, we need some brain storm to help Chris and others to arm the site with suitable improved protection against spam, the way not threatening legitimate members and not boosting the overhead of using and maintaining the site too much.

I'm coming back to the idea of Bayesian filtering. I've successfully used it on my e-mails a while ago, but, after all, replaced it all by my own approach (this is not a place to discuss it because it cannot be applied to the site). I think, Bayesian filtering approach did not find its dominating place in e-mail services by some natural reasons, such as human operator/user overhead and unavoidable false negatives/positives of the method. However, I'm starting to think that if we use this idea, with a special twist (which can be further discussed), we can apply it for the protection of CodeProject.

This short article is named after the article "A Plan for Spam" by Paul Graham: http://www.paulgraham.com/spam.html[^].

See also another article: http://www.paulgraham.com/better.html[^].

I think, after reading of the articles the idea will be clear enough.

As to the implementation, please looks at this open-source product: http://nbayes.codeplex.com[^].

And this is a CodeProject article: A Naive Bayesian Spam Filter for C#[^].

That was just to demonstrate that the implementation won't be a big problem.

Still, the problem is: how to decide on the cancellation of the spammer's account? Don't we face the same problems: false negative/positive and excessive amount of the intervention of the administrator. Remember now, that I pointed out the main problem with the workload put on a human administrator: the requires chores are not automated, or not optimized to meet the goals.

Now, here is the main idea:

Let's invert the situation socially. Instead of making the decision on cancellation of a offender's account, let's make the potential offender applying for the "legalization" of a potentially spamming post. Hold on! Don't deny this idea from the very beginning, before I explain how it practically may look. I'm going to demonstrate that this can be done gently enough.

First of all, let's remember the starting point. At starting point, the filter is empty (or all available filters are empty), so, without intervention of the member caring about extermination of spam, nothing is filtered out, ever. The filters are started to populate as some member spots the spam and report it as such. It should be a special reporting action for spam, which feeds the spammed context into a filter. A filter starts populating and gradually acquires the ability to detect spamming content automatically. Yes, which some false positives/negatives. For the detail of this process, please come back to the articles by Paul Graham.

As a first step, the post content is not placed on the CodeProject content page (Questions & Answers, or something else). Instead, a potential offender gets the message on a page. Something like that:

CodeProject informs:
Sorry, we cannot place you post immediately. It contains some content detected by our filters as potential spam. The detection was bases on previous spam reports of CodeProject members. If you believe this is not spam, you will need to post your explanation here [URL]

The content goes to the database. On the request by the potential spammer, the page with legalization form is generated; and the report goes to the database, where the status of prospective post is stored. Again, it should not happen often; and legitimate members posting their messages will almost never get this message. I know this from my experience with Bayesian filtering for e-mail.

Now, by the request of the administrator, all the filtered members' messages will be generated on a single page. Usually, one glance on the messages will be enough to judge if this is spam or not. Importantly, this is quite unlikely that a real spammers will pledge for legalization of their contents. So, I think that the action most typically be will be "Yes to all" (pretty like in the movie "Bruce Almighty", 2003; no, this is not spam, I have no interest in promotion of this commercial product and cited it only for illustration of the protection method; I pledge for legalization of this post Smile | :) ). Of course, this "yes to all" is applied to the posts awaiting for approval/legalization. And it will be equally easy to have a single button "Remove all offending posts and member accounts" for all checked items.)

If you clearly imaging it, you will see that this procedure will be much easier than what we have now.

The access to this approval/legalization and member extermination procedure is a matter of some discussion. This aspect is not as important. I would suggest that the right for the final extermination of an offenders' accounts will be left to the administration, while the right for legalization and the right for extermination of offender's post (from this page; it is already there from the page of the question in Question & Answers forum) could be granted to members with some level of reputation.

Please discuss this idea and share your ideas. Maybe we can come up with some variant of my approach or something completely different.

Thank you for attention for this rather unpleasant matter and the effort already paid in order to sustain the site.

—SA