Validating Email, Part 1 - Email

Posted 26 Mar 2007 by Dean Harding

The posts have started to slow down recently, but not for lack of ideas on my part! I’ve been moving house over the last week or two, and it’s tough to find 30 minutes free to type all this out what with the packing, unpacking, transporting and so on. But I’m right now waiting for the carpet cleaners to come and clean the old place out, so what better time than now to talk about my Validating Email protocol!

The protocol, which I’ve called Validating Email for now, is basically just a method for the receiver to validate the identity of the server. I’m going to describe a brief overview of the structure of the protocol in the post, along with a few pros and cons. I’ll get into the meat of things in a later post.

Basically, the idea in this protocol is that we want to verify that the person named in the “From:” field of the email is an “actual” entity and that emails sent to the “From:” will actually arrive in a valid mailbox (whether an actual human will read it is another matter).

Now, I could implement this as extensions to the SMTP protcol. But I’m not going to. I believe that in order to encourage people to make a whole-hearted switch to this new protocol, “existing” SMTP should be completely dissabled. The easiest way to do that is with a whole new protocol.

The basic steps in the protocol are as follows:

Sender “signs” the message with two certificates. One for the domain itself, and one for the mailbox that the message will be coming “from”. The domain certificate is the same for all message originating from that domain, the mailbox certificate is different for each mailbox, but must stay the same for the lifetime of the mailbox.
Sender connects to mail exchange of recipient’s domain.
Sender sends message “envelope” (including “From” and “To” and so on) along with the actual message itself. Like SMTP, the message itself is separate to the envelope data with the exception that the From: in the message must be exactly the same as the From in the envelope. In fact, you can leave out the From in the message if you like.
Recipient looks up the mail exchange listed in the sender’s From address, connects to it and asks the server for the certificate for the domain and mailbox. Recipient verifies the signatures in the recieved message.
Recipient responds to the sender with an “OK” if the signature verified.

That’s the basic steps. Step 4 could be done “offline” and an “OK” is always sent. So when there is a lot of traffic to your server, you might do step 4 as a separate step and batch up the requests to the same domain, for example.

Also, the responses can be cached for long periods of time, because they’re not going to change (unless a certificate is revoked, or something like that).

Note that when I say “certificate,” I’m not suggesting something like what VeriSign provides – self-certified certificates would be fine. All we are doing with this step is ensuring that the person sending the email is able to receive them on the domain specified. This system is actually a lot like the Sender-ID system that Microsoft invented. However, I believe that the problem with Sender-ID was that is was implemented on top of SMTP, meaning that it was too easy to ignore. And as long as people were ignoring it, it could not be used as a 100% effective indicator of spam. With Validating Email, if you fail the validation test, the email can be immediately dropped as spam – no further testing is required.

One problem with this system is that it doesn’t quite have the flexibility of regular SMTP. For example, if you look at the headers of any email message, you’ll see lots of “Recieved:” headers, which indicates that the message went through multiple “hops” to get to you. With Validating Email, there are no “hops” – just one message sent from sender to receiver.

However, I don’t think that’s a huge problem. Validating Email is only meant to be used between “untrusted” hosts. So if you’re simply routing mail through the servers on example.com, you don’t need to use Validating Email. You only need to use it on the hop from example.com to hotmail.com (or whatever). Once its in hotmail.com, you can again use whatever protocol you like between the hosts there.

You also don’t need to use Validating Email between the client (e.g. outlook) and the server – regular SMTP (with passwords, SSL, etc) or a webmail interface would still suffice. This is only a server-server protocol.

Anyway, that should give a basic idea of the protocol itself. The protocol itself is not going to stop all spam, but on top of the protocol, we can build further “defences” that will – but I’ll get into that another day.

Requirements for “solution” to spam – Email

Posted 19 Mar 2007 by Dean Harding

The “solution” to the problem of spam has a number of requirements on it. Here I’ll list my basic requirements, numbered for easy reference, along with a brief description of why I think the requirement is important. So, in no particular order:

Compatible with existing email clients. This is a no-brainer, nobody will use the new protocol if they have to upgrade all their email clients. Notice that I’m only saying email clients – obviously we’re going to have to do something with our servers, but as long as the clients don’t need to change, I think we’ll be OK.
Email remains “free.” People like the fact that they can send an email anywhere in the world for free. There have been suggestions in the past that charging for email would stop spam. While it might actually work, it has problems associated with it. If you’re not charging actual money (which would be silly anyway: where does the money go?) you may charge “time” – i.e. force the sender to solve some puzzle that takes a few seconds. It’d slow the spammer down who sends a million messages (though probably not enough that they’d care – what spammer would care if their run took 1 hour or 48 hours?); but what about legitimate bulk-senders (such as mailing lists). I guess you could work out exceptions to the rule, but the exception would be managed on the client and you can’t expect everyone who subscribes to a mailing list to exclude that server from their “solve-the-puzzle” requirement.
Email should remain anonymous. This one is tricky, but when I say “anonymous” what I mean is that you shouldn’t have to provide a real name, address, telephone number, etc. I believe that identity is important – that is, being able to tell one sender from another, but actually linking that identity to a real person is not so important.
Bulk email should still be possible. As I mentioned in point #2, there are legitimate reasons for sending bulk email. Mailing lists is one. “legitimate” marketing material is another (for example, I drive a Peugeot, and they send me an email every month with various bits of Peugeot news – I want to keep getting those).
Unsolicited email should still be possible. That is, I should be able to click on the email address of a web page’s webmaster’s email address and send them an email. I wrote an article for GameDev.net back in 2003, and to this day I still get questions from people reading it (not so much anymore, maybe one every couple of months or so). I don’t want to lose that ability.
Unsolicited bulk email should not be possible. That is what spam is.

Implementing these requirements is no short order, so lets see how we go :)

“Guidelines”?

Posted 16 Mar 2007 by Dean Harding

I saw this list of “Basic C# coding guidelines” and I started off just replying to it in a comment, but there are so many that I disagree with, that I’m going to do it in a whole separate post.

I’ll answer all the ones I disagree with, if I didn’t include one (and there’s only 2 that I don’t include!) then it means I agree with it. So let’s go!

Please do NOT use + sign to concatenate strings, because it creates three instances of string (try doing that in a long or infinite loop, your program will die with OutOfMemory exception in no time), Rather use string.Concat it keeps one instance (and hey no OOM). If concatenating many strings in a loop etc, then use StringBuilder. Alternative to string.Concat is string.Format (which uses StringBuilder inside)
This completely unqualified statement is a bit misleading, first of all. If you’re just going to do a:


    string a = "hello " + "world";

Then string.Concat will not have any advantage, and also suffers from the disadvantage of being harder to read. If you’re concatenating more than one string, then perhaps string.Concat would be better (but usually string.Format is better in that particular case).

In any case, a blanket “do not use + operator to concatenate strings” statement doesn’t really tell us much.

If using Generic types, then refrain from using foreach loop on the collection. Rather use ForEach method to loop through via an anonymous method predicate (much faster because doesn’t create the Iterator).
The ForEach method, while it doesn’t allocate an iterator, obviously does allocate a delegate. Also, calling a delegate for each object in the collection, as opposed to advancing the iterator, is probably slower anyway (given that advancing the iterator – for most collection classes – would be inlined).

Again, this also comes back to readability. And that’s the whole purpose of foreach – it’s 10 times more readable to use a foreach loop than to pass a delegate (even an anonymous one) to the ForEach method. I mean, come on – if allocating the iterator is any more than an insignificant dot on your performance radar, I’d be very surprised.

Nullify unused objects (doesn’t collect, but marks for collection and ceases from getting promoted into next generation)
Again, this is a blanket statement that if used in all contexts can result in worse performance. While it is true that the jitter cannot determine if member variables will be used again in the scope of a class, the jitter does an excellent job of determining when local variables are no longer user, even if their value is not set to null.

In fact, setting a local variable to null will artificially increase its lifetime, since the variable is still being “used” when you set its value to null.

In certain, limited scenarios setting a member variable to null may cause it to be collected earlier. However, it again comes down to readability vs. impact on performance. Because garbage collection is unpredictable, you don’t know whether the GC will kick in anyway and an extra “var = null” just makes things worse.

Besides, if you come in later to modify the code, and you want to access the variable later on in the function, you’d better remember that you’ve already set to null earlier on!

IF conditions having just one item in if and else, should be used as ternary operator (? Sign)
This is a total syntactic-sugar change. There is no difference in the generated code between an if...else block and a ?: statement. The only difference is that ?: is an r-value. It might save you an extra assignment instruction.

Personally, I believe the exact opposite is a better approach – change instances of ?: with the corresponding if...else block. It increases readability no end – especially for complex conditions.

Use ‘as’ operator instead of direct typecast using parenthesis with the exception of overloaded explicit cast operator, it saves from NullReferenceException and InvalidCastException
An explicit cast will not generate a NullReferenceException. However, there are certainly times when an InvalidCastException is desirable. This is another “blanket” statement that is simply not true in significant numbers of circumstances. Generally, the as operator is best used as a replacement for is+cast, i.e. replace:


if (foo is MyClass)
{
    MyClass myclass = (MyClass) foo;
    // do stuff
}

With:


MyClass myclass = foo as MyClass;
if (foo != null)
{
    // do stuff
}

This is better because there’s really only one typecast going on. But again, if this is affecting your actual performance (outside a tight loop, anyway) I’d be surprised.

Refrain from XmlDocument usage for navigational purpose, please either use XmlTextReader for sequential access or XPathDocument for XPath based data retrieval
This was true in early betas of version 2.0 – XmlDocument was not modified much for performance. Much work was done later on to increase its performance, though, and today it is quite acceptable to use XmlDocument where appropriate (i.e. don’t use it on 100MB XML documents, but that should be obvious).

For server side XSL transformation, Use XslCompiledTransform instead of XslTransform (Please check http://blogs.msdn.com/antosha/archive/2006/07/24/677560.aspx).
This is another blanket statement that really should come with a number of disclaimers. First of all, using XslCompiledTransform generates a .NET assembly and loads it into your process. If you’re going to be doing thousands of different XSL transforms, then use XslTransform (which may be quite likely in a server-side process). Also, because it’s compiled, it has significant first-use overhead (to compile the transform). If you’re only going to use the XSL file once or twice, use XslTransform.

I don’t know much about doing XSL in a browser, so I won’t comment on the second half of that statement.

Always join your threads in a web page, if used (otherwise the page will be rendered and workers will continue operating at the server cost, and ofcourse the results will be unpredictable)
I would qualify this with “don’t use threads in a web page.” Certainly not in any way that would require that your join with them before ending the request!

Always do NULL checking before operating on an object, NullReferenceException is widely occurring exception in most applications and only a preventive programming can help us get the real error
This is rather nonsensical. If you do null-checking before using a parameter, what are you going to do? All you can really do, if the null is unexpected, is throw a NullReferenceException anyway! This should probably also be qualified with a “in a public interface method.” There’s no point (and plenty of harm) in doing null-checks in internal and private methods.

Conclusion

The problem I had with this list, and with most “best practices” lists in general, is that it’s filled with blanket statements that should really come with plenty of qualifications. After all, if the + operator was so bad on string objects, it wouldn’t have been added in the first place. If XslCompiledTransform was so much better than XslTransform, then XslTransform would have been marked “deprecated.” And so on.

I especially have problems with “performance” guidelines, because there are no hard-and-fast rules about getting code to perform well. If replacing all instances of foreach over arrays with for-and-an-index would have any effect on the performance of your application as a whole, then you couldn’t be doing anything inside the loop of worth!

Are media companies really this dumb?

Posted 15 Mar 2007 by Dean Harding

Apparently, Google is in hot water over copyrighted material that is available on YouTube. I am not completely shocked by this, but some of the things the media companies are saying are really dumb. For example:

Viacom, which owns popular programs like "The Daily Show With Jon Stewart" and "South Park," said that clips of its programming had been viewed "an astounding 1.5 billion times" on YouTube.

So, 1.5 billion hits and what do Viacom want to do? They want to shut it down! Um, am I the only one who thinks that 1.5 billion hits would have brought in significant advertising revenue?

The whole reason Google bought YouTube is so that they can advertise on it, but instead of trying to capitalize on the potential benefit, Viacom are trying to kill the site altogether.

Personally, I can’t wait until the current crop of media companies to simply die out; it seems inevitable – after all, they just don’t seem to understand any technology created after 1980. And what they don’t understand, they try to kill! At least with them out of the way, someone can finally produce a HD-streaming movie service and I can watch any movie I want, on-demand. Man, that’d be sweet!

Side-by-Side assemblies

Posted 06 Mar 2007 by Dean Harding

I was getting the following error recently:


Error: The Side-by-Side configuration information in "<mydll>"
   contains errors. This application has failed to start because the application
   configuration is incorrect. Reinstalling the application may fix this
   problem (14001).

And, while I had downloaded vcredist_x64.exe (this was an AMD64 box) from microsoft.com somewhere, it was still giving the error. The solution? Run the version of vcredist_x64.exe from the Visual Studio directory:

%ProgramFiles%\Microsoft Visual Studio 8\SDK\v2.0\BootStrapper\Packages

This is very important! It seems that the CRT has been updated in VC++ SP1, and you’ve got to make sure you install exactly the right version!