Today’s sender active anti-spam – Email

Posted 06 Mar 2007 by Dean Harding

Last time, I talked about what I call “sender passive” anti-spam techniques, where the sender doesn’t have to do anything special outside the SMTP protocol to get messages through. These techniques are OK, except for one major problem: because the sender doesn’t have to do anything, it’s no skin of the spammer’s teeth if his message is blocked (or not).

Today I’m going to talk about what I call “sender active” anti-spam techniques, where the sender has to do something in addition to the normal SMTP stuff to get a message through. These kinds of techniques can be quite effective, but they’re not all that common these days. I’ll explain why later.

Sender-ID

Sender-ID is a Microsoft technology that basically tries to “authenticate” a sender via records in the DNS. The concept is quite simple. When someone connects to your SMTP server to send you email, you look at the domain specified in the MAIL FROM command. You do a simply DNS query to see if the IP of the connecting client is listed in the MX record for that domain name, indicating it’s the actual “mail exchanger”, or alternatively if the IP is listed in a special TXT record, indicating it is a designated “mailer” for that domain. If the IP is not in the DNS records for the domain, you don’t let the main through.

Now the Microsoft are interested in something like Sender-ID is because it means people can’t pretend to be sending mail from “@hotmail.com” if they’re actually not a hotmail server. But it also means that botnets become unworkable – because you can’t list the IP address of every computer in your botnet in your domain (since, apart from the performance problems of having 10,000 TXT records, you’ve also given the authorities a wonderful list of all your infected computers!)

While I personally think Sender-ID was a good idea, it never really took off. It suffered a bit from confusing licence agreements, along (possible) more general “this is Microsoft, so it can’t be good” feelings.

Another problem with Sender-ID is that it needs to reach “critical mass” before it become effective. Until everyone uses Sender-ID, it can only be used as a “spam indicator” – meaning you give a mail that failed the Sender-ID a higher “spam score” than one that passes. You’ll still need to do other spam filtering, and you’ll still have to possibly accept mails that fail Sender-ID.

”Click here if you are human”

I’ve not seen this one much, and I don’t know the “official” name of the method. But basically the way this one works is, the first time you send a mail to someone, their email server queues up the message and sends you an immediate reply. The reply message will have text like “in order to verify that you are a human, please click this link” or perhaps “please reply with the words ‘I am human’ in the subject.” Subsequent emails (once you’ve “passed” the test) don’t do this.

This one is very effective at picking the humans from the mass-mailers. But again, it has some problems.

First of all, its quite annoying to send someone an email only to get an immediate “please verify yourself” response. But I guess it only happens once...

Secondly, you also immediately loose out on any legitimate mass emails that you might want (unless you “pre-whitelist” the sender). Someone people might see this as a feature, though.

And finally, if this ever did become popular, spammers would quickly learn what actions they need to perform in order to “verify” their humanity. For example, if the message was a “click here” they have to simulate clicking the link. If it was a “reply to me” they just need to send a new message. This would make it hard for the spammer to run a botnet, of course (because where do the replies go to? Generally a botnet spam’s return address is a black hole.)

Conclusion

I’ve only listed two methods here because there really aren’t all that many “sender active” anti-spam techniques in wide use today. I think that’s a shame, because these are in fact some of the most effective anti-spam techniques available – not only because they effectively block spam, but also because the actual work required by the receiver is fairly minimal.

But at the end of the day, even sender active anti-spam is really only effective because of its relative obscurity. For a “final” solution, you need something that will withstand being ubiquitous. It’s no good having a 100% (or even 95%) effective anti-spam method if it’s only effective because the spammers can ignore it – if it really is that good, you expect more and more people to use it over time, thus brining it to the attention of the spammers!

Next time, I’m going to get into some of the “requirements” I believe a “complete” or “holistic” solution to spam should have.

Dynamically generating a [DllImport]

Posted 05 Mar 2007 by Dean Harding

If you’ve ever worked with P/Invoke before, you’ll no doubt know that the typical usage pattern is something like this:


[DllImport("user32", CharSet=CharSet.Auto)]
int MessageBox(IntPtr hWnd, string lpText, string lpCaption, uint uType);

This is great if you’re wrapping system calls (like user32.dll, kernel32.dll, etc). But it’s not so good if you’re trying to wrap a custom native library. The reason is simple: the name of the DLL is hard-coded at compile-time, and you don’t get a choice as to where the system looks for it.

I use Firebird as an embedded database quite a bit, and this limitation is fairly obvious there. You need to give the embedded DLL a specific name, and it has to be in your application’s bin folder. The biggest problem is that the name of the DLL has changed a number of times over its lifetime, (gd32.dll, fbclient.dll, fbembed.dll, etc) so it’s hard to know which one to use without compiling the source for the provider yourself.

Anyway, I’ve submitted some code to the developers that will hopefully fix this in future versions (not sure if or when they’ll include it, but we’ll see...). Basically, I use Reflection.Emit to generate the “interop” class at runtime, which means I can put whatever I like as DLL name. This means you’ll be able to specify the name (and even the path) of the DLL to use in your Firebird connection string.

You can download the code at the bottom of the this post, but basically the way it works is, instead of having a static class that contains all your P/Invoke declarations, I’ve converted that class into an interface and replace all the [DllImport] attributes with my own custom [DynDllImport] attributes. The DynDllImportAttribute is just like DllImportAttribute, except it doesn’t take the DLL name in the constructor. All the other properties are basically just used to copy over to the generated DllImportAttribute at runtime.

The “meat” of my solution is the NativeLibraryFactory class. It takes the name of a DLL (for example, “fbembed”) and the Type of the interface, and it generates a class that implements the interface. Each interface method simply calls a P/Invoke declaration with the same signature and the passed-in DLL name for the DllImport.

This means that you can call something like this:


IFbClient fbClient = (IFbClient) NativeLibraryFactory.GetNativeLibrary(
    typeof(IFbClient), "fbembed");

And the returned IFbClient has the same methods as the P/Invoke declarations.

Now, I’ll admit that for a lot of projects, this may be overkill. I also realise that you could use Marshal.GetDelegateForFunctionPointer along with LoadLibrary and GetProcAddress to do a similar thing. However, that is only available in .NET 2.0 and it’s also not cross-platform – you can’t use LoadLibrary/GetProcAddress on mono (though with some work, you might be able to write it cross-platform using dlopen, etc).

Anyway, you can download the code here: native-library-generator-1.0.zip

The Price is Right (or not)

Posted 26 Feb 2007 by Dean Harding

Apparently, the PS3 is going to on sale in Australia at a retail price of $999.95!

It sells for US$ 599 (= AU$ 770 or 130%) in the U.S. and ¥60,000 (= AU$ 640) in Japan. So what gives with the high price for us Aussies?

If you want to go with the cheaper option and just import a version from Japan, you’ll be able to play games, but you won’t be able to play Blu-Ray movies (but who wants to do that anyway, I mean really?) nor will you be able to use the Aussie PlayStation Store. The article doesn’t mention anything about their “live” service for online games, though (or is that what PlayStation Store is? I hope not; that would be a silly name if it was.)

Remember, however, that when the XBOX 360 launched it went on sale for AU$ 499.9 for the Core version, which was equivalent to the U.S. price of US$ 299.99 (= AU$ 385) again, it was a 130% markup. So I guess us poor Aussies just have to suck it up.

Besides, nobody wants a PS3 anyway... do they? I certainly don’t want to pay $1,000 for one!

Today's sender passive anti-spam - Email

Posted 24 Feb 2007 by Dean Harding

I mentioned before how there are two basic forms for combating the email spam problem currently. I call these two methods “sender passive” and “sender active,” depending on whether the sender has to do something other what is specified in the basic SMTP protocol. Today I’ll talk about the common “sender passive” methods. Later I’ll mention a few of the “sender active” methods.

Content Filtering

This is by far the most common anti-spam technique in use today. The basic premise is you scan the content of the message and if it contain “spammy” keywords, you drop it. So if someone sends me a message with the word “viagra” in it, you can be pretty sure it’s spam.

The main benefit of this solution is its simplicity and its flexibility. There are lots of different kinds of filters, from simple keyword-based ones to more complex Bayesian filtering.

The problem is that spammers can tailor their messages to get past the filters, so it’s basically a cat-and-mouse game, it’s also always going to be a catch-up game. This is especially true for the “big four” providers (that is, Hotmail, Yahoo, AOL and Gmail) – spammers will specifically target addresses from them, but before sending out their batch, they actually sign up an account and use it for “testing” – once their message gets into their test account, bam! they hit everybody.

The other drawback I somewhat hinted at above. Say I’m a doctor, many of my message might then include the word “viagra” – I can’t just blindly filter that word out in that case.

Finally, there’s basically an infinite number of ways to write “viagra” – “v1agra,” “vi@gra”, etc. It’s just a matter of coming up with new ways to get past the filter.

IP Blacklisting

This method uses the one piece of “reliable” information we get about the source – their IP address. The basic idea is simple: if the sender’s IP address is on the blacklist, don’t accept mail from them.

There are a few ways to get an IP address on the blacklist. First of all, there are reliable blocks which can always stay on there. For example, if an ISP assigns the block 123.123.123.* to their dial-up lines, then you can be pretty sure there’s not mail server on there. Any mail originating from there is probably from a botnet.

Another way you might get on there is from user feedback. You can use a service like NJABL, which accepts feedback from users – if enough people report spam from an IP address, it gets added to the blacklist.

The good thing about blacklists is that they pretty much automatically kill botnets, since most botnets are in dial-up or residential IP blocks, the blacklists have those listed by default. Also, there is not very much extra processing on the client required – it’s usually just a matter of doing a specially-formed DNS query.

The problem is that, again, it’s a game of catch-up. You have to have a certain amount of spam from an IP before it gets blacklisted. There is also a problem of false-positives – your own IP might be blacklisted by mistake, which isn’t very fun when it happens.

Greylisting

This post is starting to get rather long, so I’ll finish up with a relative unknown technique, called “greylisting.” Basically the way this works is, the first time you see an email with a new “From” address, you add that address to the greylist, and send it back an SMTP “temporarily unavailable” error. A conformant SMTP server will then queue the message itself and send it again after a short while. On the second attempt, you simply let the message through.

Once a message has successfully got through the greylist, the server will usually add the sender to a temporary whitelist so that the next few messages will get through without going through the greylist.

The reason this works is because almost all legitimate SMTP servers will handle the “temporarily unavailable” message correctly, while almost all spammer’s servers do not (the main reason being that they’re being sent from zombie computers, and the queuing is going to be a little suspicious!)

Of the three techniques that I’ve mentioned today, this one has had the most success for me (that doesn’t mean it works well for everybody). However, it does have a couple of problems. First of all, it means legitimate email takes a little longer to reach me initially (depending on the sender’s server, it could be from 10 minutes to a couple of hours). The other probably is that the reason the technique is actually successful is it’s relative obscurity – if everybody did greylisting, the spammers would catch on pretty quick.

Another problem is that many large organisations have many SMTP servers on the same address (using MX record priorities) – each server must know what the others have seen, since the sender may send the first message to one server and the second to another. This needs to be taken care of.

By the way, I’ve listed this under “sender passive” even though, technically, the sender has to do something “extra” – but since that something “extra” is already part of the SMTP standard, it doesn’t really count.

Conclusion

In the end, all of the above techniques fail for the same reason: there is no cost to the sender for failed messages. Even it 90% of messages are blocked in this way, a batch of 1,000,000 messages will still have 100,000 people receive it. It’s also always a catch-up game: the spammers are necessarily one step ahead – after all, we can only block stuff once we know it’s being sent.

Next time, I’ll describe a couple of what I call “sender active” technique, which require extra effort (outside of the basic SMTP protocol) to get a message through.

Why we have a problem – Email

Posted 22 Feb 2007 by Dean Harding

It’s clear to anyone who has been using email for any length of time that spam is a problem. But why is it a problem? After all, snail mail has “spam” as well, but nobody seems to complain too much about that!

It turns out that despite the intentions of the creators of SMTP^†, there are in fact a lot of differences between email and snail mail that makes the “spam” problem less relevant to the latter than the former.

First of all, and most obviously, is the issue of cost. It costs money to send snail mail. In Australia, it’s 50c per letter. That’s only very nominal for one person sending one message, but it’s quite prohibitive if you want to send thousands. And in fact, that’s why most snail mail “spam” is hand-delivered by school kids – it’s cheaper to pay a kid $6 an hour, than it is to pay 50c per message. But even if you hand-deliver your snail mail spam, you’ve still got to pay for the paper and printing costs, the designers, etc. Email spam is free – all you pay for is the bandwidth, and since most spam is sent from botnets, it’s not even your own bandwidth that you’re using.

The second difference is that snail mail spam does not get sent to business addresses. The kid hand delivering the messages knows that nobody will read it if it’s the letterbox of a shop, so they don’t even bother putting in the mailbox. But even if they did, that’s only one person in the whole company that has to deal with it. Email spam is indiscriminate – it goes to everybody’s mailbox. So if you have to spend 30 seconds a day sorting your snail mail spam from the real stuff, multiply that by 30 seconds a day sorting email spam by the number of people in your company, and that becomes a lot of wasted time!

A third difference that is somewhat related to the first is that if you’re going to spend several thousand dollars to send out snail mail spam, you need to know that you’re actually going to get a pretty good return. So you hire designers and artist to build you a nice catalogue (or whatever), you figure out your demographics and the locations they live, and you target them. This means that many people actually do read your message. Spam, being free, is usually just typed up by some guy with only a basic grasp of the English language, no design skills, and is just sent to as many addresses as possible. So 99.999% of spam messages are not read.

Fourthly, there are many laws governing the sending of snail mail spam. Because the delivery is necessarily limited to just a single country, law can be made an enforced easily. Email spam is sent to all countries, indiscriminately, so it’s not possible to actually enforce any laws (because which country’s law do you enforce? Remember it’s also quite difficult to pinpoint the origin of the email as well).

Finally, and I don’t consider this as big a threat as you might, but email spam can contain viruses, snail mail spam cannot.

So you can see that all these differences boil down to basically one of two things: 1. spam is free, and 2. spam is indiscriminate. Any “real” solution to the probably would have to address of these two problems, I believe – make them a non-issue and email spam would be as big a deal as snail mail spam (that is, not much).

^† See, SMTP as a protocol works a lot like snail mail. You write the address of the person you’re sending to on the envelope (which is equivalent to the RCPT TO command), scrawl your return-address on the back of the envelope (which is not required – if there’s a problem and you don’t give a return address, you just don’t get notified of the problem. This is equivalent to the MAIL FROM command – if you specify a bogus address, it doesn’t matter, you just don’t get notified of failures). What you specify in the actual letter (i.e. the DATA command) is irrelevant to the successful delivery of the message (so you can have something totally different in the To: and From: for example).