7c0h.com

Sentiments are the new Spam - Prologue

Once upon a time, you would create an e-mail account and use it for a long time without receiving spam. In fact, whenever you received your first spam message, you'd know exactly who to blame: that one cousin of yours who'd send you every single motivational powerpoint she came across, along with a list of 1500 other e-mail addresses. We could argue about who's the spammer in this situation, but that discussion will have to wait.

That kind of control over your account is no longer possible: even if you never share your account with anyone, you will at some point get spam. It's just the way things are, the "background radiation" of the internet. Luckily for us, things got so bad that a lot of smart people sat down to think really hard about this, and came up with Bayesian filtering, a technique so effective that most of us don't even bother checking our Spam folders anymore.

So we1 succeeded once. It's a good thing to remember, because we have a much harder battle to fight now: trolling, and it's ugly cousin, online harrasment.

Let's say you post a message on an online board. These are some of the things that could happen, in no particular order:

  • You could get an interesting, well thought reply (note that "well thought" doesn't mean "agrees with you"). It happens.
  • You could be modded down by people that disagree with what you just posted, even if the rules say they shouldn't.
  • You could be flooded by negative messages, because a certain group decided to impose their point of view. This is called brigading, by the way, and it's usually not personal - they oppose your point of view, but not you.
  • You could be flooded by negative messages, because a group has decided to target you online for something you said, or did, or are.
  • You could be posting in behalf of a company, in order to speak in favor of your products posting as anyone-but-an-employee. This is called being a shill, and most websites either pretend that it doesn't happen or they don't care.
  • You could be trying to derail a discussion, in order to make sure a certain point is not brought to light, or is drowned in the noise. This usually implies that you work for a government agency, it's being done right now, and it works.

We used to believe that everyone on the internet would eventually behave nicely, and that we could build our services based on trusting the 95% of users that have no hidden agenda. This is sadly not so, because

  1. ... people have not behaved nicely on the Internet since September 1993.
  2. ... 5% of very loud users are a lot more noticeable than 95% of the quiet ones. A post-mortem of a DARPA Challenge showed that a single person can sabotage the work of thousands of well-meaning volunteers.

In the follow-up articles I'm going to comment on what I perceive to be three main points in which this issue could be attacked. They are

  • Anonymity: there's no way of taking measures against a person, only against a user. This is by design, and I'm not arguing that we should get rid of anonymity. We should instead focus on identifying toxic users, which I think can be done implementing user groups.
  • Flamewars: derailing discussions in order to kill them. This may be a job for pattern matching, identifying when the shape of a discussion is tending towards known anti-patterns. We might also want to add clustering, in order to identify brigades.
  • Harrassment: perhaps the harder one, requires sentiment analysis techniques to identify negative comments and kill them before they reach their destination.

In the follow-up essays I'll present some papers about how one would go about attacking each point. I have no reason to believe that this techniques are unknown (some of them are already implemented), but I post them hoping that, much like Bayesian filtering, someone will read them and have an "oh, wait" moment).

Coming up next: anonymous users and user groups.