Unicode support in MySQL is ... 👎

Posted by

For the last few days, I've been getting some strange error reports from the War Worlds server. Messages like this:

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\xB8. ...' for column 'message' at row 1
   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4120)
   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052)
   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2815)
   at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2458)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2375)
   at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359)
   at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:203)
   at au.com.codeka.warworlds.server.data.SqlStmt.update(SqlStmt.java:117)
   at au.com.codeka.warworlds.server.ctrl.ChatController.postMessage(ChatController.java:120)
   . . .

Now, that string which MySQL complains is "incorrect" is actually the Unicode codepoint U+1F638 GRINNING CAT FACE WITH SMILING EYES, aka 😸 -- a perfectly valid Emoji character. Why was MySQL rejecting it? All my columns are defined to accept UTF-8, so there should not be a problem, right?

When is UTF-8 not UTF-8?

When it's used in MySQL, apparently.

For reasons that completely escape me, MySQL 5.x limits UTF-8 strings to U+FFFF and smaller. That is, the "BMP". Why they call this encoding "UTF-8" is beyond me, it most definitely is not UTF-8.

The trick, apparently, is to use a slightly different encoding which MySQL calls "utf8mb4" which supports up to 4-byte UTF-8 characters.

So the "fix" was simple, just run:

ALTER TABLE chat_messages
   MODIFY message TEXT CHARACTER SET utf8mb4
                       COLLATE utf8mb4_unicode_ci NOT NULL;

And so on, on basically every column in the database which could possibly include characters outside the BMP. But that's not enough! You also need to tell the server to use "utf8mb4" internally as well, by including the following line in your my.cnf:

[mysqld] ​character-set-server = utf8mb4

Now presumably there is some drawback from doing this, otherwise "utf8mb4" would be the default (right?) but I'll be damned if I can figure out what the drawback is. I guess will just moniter things and see where it takes us. But as of now, War Worlds support Emoji emoticons in chat messages, yay!


If you're just seeing squares for the emoji characters in this post, you'll need a font that supports the various unicode emoji blocks. I've used the Symbola font (which you can get here) with good results.

Why you should use a reputable DNS registrar

Posted by

I've had a bit of a crazy weekend this weekend. It all started while I was checking out some of my websites and I discovered their DNS was not resolving! That's always scary, and after a few minutes I realised that the DNS servers were unreachable from some (but not all) networks. For example, this was a traceroute to one of the DNS servers from my home computer:

$ traceroute
traceroute to (, 30 hops max, 60 byte packets
 1  home.gateway.home.gateway (  0.745 ms  1.041 ms  1.334 ms
 2  lns20.syd7.on.ii.net (  18.276 ms  19.705 ms  20.402 ms
 3  te3-1-120.cor2.syd6.on.ii.net (  22.902 ms  23.762 ms  24.736 ms
 4  ae5.br1.syd7.on.ii.net (  138.154 ms  139.785 ms ae0.cr1.syd4.on.ii.net (  29.454 ms
 5  ae5.br1.syd4.on.ii.net (  30.446 ms ae0.br1.syd4.on.ii.net (  36.895 ms ae5.br1.syd4.on.ii.net (  37.306 ms
 6  te0-2-0.bdr1.hkg2.on.ii.net (  145.732 ms  139.403 ms  140.845 ms
 7  hostvirtual-RGE.hkix.net (  159.664 ms  132.746 ms  133.402 ms
 8  vhk.vr.org (  134.369 ms  135.393 ms  136.574 ms
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
. . .

It was getting stuck at vhk.vr.org, which seems to be some transit provider or other. But from other networks it was OK, here's the traceroute from my Google Compute Engine instance:

$ traceroute
traceroute to (, 30 hops max, 60 byte packets
 1 (  1.168 ms (  1.429 ms (  1.105 ms
 2 (  1.383 ms  1.072 ms (  1.335 ms
 3 (  1.477 ms (  1.367 ms (  1.287 ms
 4 (  1.656 ms (  1.434 ms (  1.914 ms
 5 (  1.910 ms  1.886 ms  1.868 ms
 6 (  1.841 ms  1.257 ms (  1.470 ms
 7 (  1.196 ms (  1.289 ms (  1.496 ms
 8 (  1.643 ms (  1.457 ms (  1.586 ms
 9 (  11.936 ms (  11.761 ms (  14.151 ms
10 (  15.746 ms (  11.702 ms (  14.216 ms
11 (  14.355 ms (  14.187 ms  13.759 ms
12  * * *
13  db-transit.Gi9-12.br01.rst01.pccwbtn.net (  39.572 ms  37.646 ms  39.849 ms
14  viad-vc.as36236.net (  39.499 ms  37.372 ms  39.243 ms
15  pdns1.terrificdns.com (  40.454 ms  39.940 ms  41.335 ms

So that seems kind of scary! I quickly composed an email to the support address with what I'd seen. An hour later and no response. At this point I was worried, because I had no idea how many people were unable to contact my websites. I tried logging in to the management console, but that was also a no-go: the DNS for the management console was hosted on the same DNS servers that were not responding! I managed to log in by hard-coding the IP address (which I got from my GCE server that was able to connect to the DNS servers) in my /etc/hosts file.

Just trying to get something up & running again, I exported all my DNS records and imported them into a friend's DNS server. Then I was able to change the nameservers configured in the management console to point to my friend's DNS server. For now, we were back up again. But a few hours later and still no response from my support request.

Who are NameTerrific?

I first heard about NameTerrific in this post on HackerNews. The website was well-done, the interface was easy to use. So I decided to give it a go with some of my more throwaway domains. I had a minor issue early on, and the support was quite good and over the next year or so, I started to move a couple more domains over.

Then, I stopped thinking about it. It worked well for the next year or so, and DNS tends to be one of those things you don't really think about. Until it stops working.

It was probably my fault. I didn't put much effort into researching NameTerrific's founder, Ryan Zhou. He seems to be a serial entrepreneur who dropped out of school to persue his dream. That's all well and good, but when you're hosting a service for people, it doesn't do much for your reputation as a reliable investment when you abandon your previous business for who-knows-what-reason.

What do I think happened?

 I think he discovered his website had bugs and people's domains were being transferred to the wrong people. Why do I suspect that? Because it's happening to me right now. Check this out. I go into my control panel for one of my domains, war-worlds.com:

I click on "Transfer Away", it prompts me to confirm, I click OK and I receive the following email:

It's an authorization code for a domain I don't even own. What's worse, if I do a whois on mitchortenburg.com, I find myself listed in the contact information! I seem to be the owner of a domain I never purchased (and, to be honest, don't really want) because of some bizarre mixup with the management website.

Even worse still: I have no way to generate an auth code for war-worlds.com (the domain I do care about) and I'm terrified that some other customer of NameTerrific's is somehow able to do what I've managed to do and gain ownership of my domain!

I'm not the only one having problems. Their facebook page is full of people who have also apparently realized they've been abandoned:

This is not how you run a busines. If you find bugs in your management console, you don't abandon your business and leave your customer in the lurch.

Now it seems Ryan has at least learnt from this failure to not let people post on his business' facebook wall. His "CoinJar" business has disabbled wall posts entirely.

What are my options?

So I currently have a ticket open with eNom (NameTerrific were an eNom reseller) and I hope I can get control of my domain back again. And if Mitch Ortenburg is reading this, I'm also quite happy to return your domain to you, if I only knew how to contact you...

And of course, I have already transferred every domain I can out of NameTerrific and into a reputable registrar. Lesson learnt!

I don't think Samsung Mob!lers threatened to strand bloggers at Berlin

Posted by

So I was reading this post on Hacker News wherein some bloggers were apparently threatened with cancelled flights home after they refused to man Samsung booths at a trade show.

Now, to me, this seemed rather fishy. Why would Samsung do something so obviously malicious? I had a look at the Samsung Mob!lers website, and it's pretty clear how the site works. You write articles on topics Samsung wants you to write about (there's currently one up there now asking bloggers "to showcase the extent of its [S-Voice] functionality in a fun, adventurous and creative way.") In return for writing blog posts on behalf of Samsung, you earn "points" which you can then redeem for "rewards".

Here's the page that lists a trip to the IFA consumer show as a "reward" in the Mob!lers program. So let's be clear here: the trip to the IFA consumer show was a reward for writing blog posts praising Samsung products. This is not a case of Samsung inviting bloggers to trade shows to provide coverage, it's a prize.

The TNW article goes to great pains to point out that sending bloggers to trade shows is normal fare for large companies. But that's a red herring -- these are not bloggers invited to the event to provide coverage for Samsung. They were invited to the event as a reward for writing blog posts about Samsung products. The article also goes to pains to point out that programs like Samsung's Mob!lers are common. But it doesn't go into any details about what the program entails (except to say that the Mob!lers program expects you to become a "shill" for the company).

So here's what I think happened: I think these bloggers earned their points on Samsung's Mob!lers website. In return, Samsung rewarded them with a free trip to Berlin to the IFA trade show. My guess is that Samsung made it pretty clear they would be manning Samsung booths at the show (after all, that was the "warning signs" the article talks about: fitting for uniforms and such). The bloggers probably told Samsung that they wanted to review other products while they were there and Samsung said that's fine.

Now we get to the "smoking gun" email that TNW posts. In light of the above, all the email says is that they refused to go the Samsung event and just stayed in their hotel, and now Samsung wants to send them home. It doesn't say they weren't told they would have to man the booths. It doesn't say that flights had been cancelled. Just that they want to send them home early.

And if they're not doing what they were asked to do, then why wouldn't you want to send them home early? All we have on the "cancel flights" and "stranding" is the word of the bloggers who clearly weren't interested in doing their part.

Debugging an issue with high CPU load

Posted by

For the last few days, one of the websites I work on was having a strange issue. Every now and then, the whole server would grind to a halt and stop serving traffic. This would last for a couple of minutes, then suddenly everything would "magically" pick up and we'd be back to normal. For a while...

Simple website monitoring

The first part to figuring out what was going on was coming up with a way to be notified when the issue was occuring. It's hard to debug these kind of issues after the fact. So I wanted to set up something that would ping the server and email me whenever it was taking too long to respond. I knocked up the following script file (I might've inserted backslashes at invalid locations, sorry, that's just to ease reading on the blog):


TIME=$( { /usr/bin/time -f %e wget -q -O /dev/null \
          http://www.example.com/; } 2>&1 )
TOOSLOW=$(awk "BEGIN{ print ($TIME>2.5) }")

if [ $TOOSLOW -eq 1 ]; then
  echo "The time for this request, $TIME, was greater than 2.5 seconds!" \
    | mail -s "Server ping ($TIME sec)" "me@me.com"

I set this up as a cron job on my media centre PC (high-tech, I know) to run every 5 minutes. It would email me whenever the website took longer than 2.5 seconds to respond (a "normal" response time is < 0.5 seconds, so I figured 5 times longer was enough).

It didn't take long for the emails to start coming through. Then it was a matter of jumping on the server and trying to figure out what the problem was.

First steps

Once the problem was happening, there's a couple of "obvious" first things to try out. The first thing I always do is run top and see what's happening:

top - 08:51:03 up 73 days,  7:45,  1 user,  load average: 69.00, 58.31, 46.89
Tasks: 316 total,   2 running, 314 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.0%us,  1.3%sy,  0.0%ni, 15.2%id, 72.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   8299364k total,  7998520k used,   300844k free,    15480k buffers
Swap: 16779884k total,     4788k used, 16775096k free,  6547860k cached

Check out that load! 69.00 in the last five minutes, that's massive! Also of concern is 75% next to "wa", which means 75% of the CPU time was spend in an uninterruptable wait. There's not many things that run in uninterruptable wait (usually kernal threads), and it's usually some I/O sort of thing. So lets see what iotop (which is like top but for I/O) reports:

Total DISK READ: 77.37 K/s | Total DISK WRITE: 15.81 M/s
25647 be/4 apache     73.50 K/s    0.00 B/s  0.00 % 99.99 % httpd
24387 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [pdflush]
23813 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [pdflush]
25094 be/4 root        0.00 B/s    0.00 B/s 96.72 % 99.99 % [pdflush]
25093 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [pdflush]
25095 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [pdflush]
25091 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [pdflush]
24389 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [pdflush]
24563 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [pdflush]
24390 be/4 apache      0.00 B/s   23.21 K/s 96.71 % 99.99 % httpd
24148 be/4 apache      0.00 B/s    0.00 B/s 96.71 % 99.99 % httpd
24699 be/4 apache      0.00 B/s    0.00 B/s 99.99 % 99.99 % httpd
23973 be/4 apache      0.00 B/s    0.00 B/s 99.99 % 99.99 % httpd
24270 be/4 apache      0.00 B/s    0.00 B/s 99.99 % 99.99 % httpd
24298 be/4 apache      0.00 B/s 1918.82 K/s 96.71 % 99.02 % httpd
  628 be/3 root        0.00 B/s    0.00 B/s  0.00 % 97.51 % [kjournald]
25092 be/4 root        0.00 B/s    0.00 B/s  0.00 % 96.72 % [pdflush]
24258 be/4 root        0.00 B/s    0.00 B/s 99.99 % 96.71 % [pdflush]
23814 be/4 root        0.00 B/s    0.00 B/s  0.00 % 96.71 % [pdflush]
24388 be/4 root        0.00 B/s    0.00 B/s 99.02 % 96.71 % [pdflush]
25545 be/4 apache      0.00 B/s    0.00 B/s  0.19 % 92.73 % httpd
25274 be/4 apache      0.00 B/s    0.00 B/s  0.00 % 92.38 % httpd
24801 be/4 apache      0.00 B/s    5.84 M/s 99.99 % 91.63 % httpd
25281 be/4 apache      0.00 B/s    5.75 M/s  0.00 % 91.33 % httpd
26115 be/4 apache      0.00 B/s    0.00 B/s  9.60 % 19.26 % httpd
25561 be/4 apache      0.00 B/s    3.87 K/s  0.00 %  9.66 % httpd
26035 be/4 apache      0.00 B/s    0.00 B/s  0.00 %  9.63 % httpd

So all those pdflush commands looked suspicious to me. pdflush is a kernel thread that's responsible for writing out dirty pages of memory to disk in order to free up memory.

It was at this point that I was suspecting some kind of hardware failure. Checking the output of sar -d 5 0 I could see this:

Linux 2.6.18-308.1.1.el5PAE (XXX)  23/08/12

08:55:45          DEV       tps  ...     await     svctm     %util
08:55:50       dev8-0    877.25  ...    179.28      1.14     99.84
08:55:50       dev8-1      0.00  ...      0.00      0.00      0.00
08:55:50       dev8-2      0.00  ...      0.00      0.00      0.00
08:55:50       dev8-3    877.25  ...    179.28      1.14     99.84

Check out that utilization column! 99.84% is really bad (more than 70% or so is when you'd start to have problems)

I was at a bit of a loss, because I'm not too familar with the hardware that's running this server (it's not mine) but I knew the disks were in hardware RAID and smartctl wasn't being helpful at all, so I posted a question on Server Fault. At this point, I was thinking it was a hardware problem, but I wasn't sure where to go from there.

My first hint

My first hint was a comment by Mark Wagner:

Apache PIDs 24801 and 25281 are doing by far the most I/O: 5.84 M/s and 5.75 M/s, respectively. I use iotop -o to exclude processes not doing I/O

What were those two processes doing? I opened one of them up with strace:

# strace -p 24801
[sudo] password for dean: 
Process 24801 attached - interrupt to quit
write(26, "F\0\0\1\215\242\2\0\0\0\0@\10\0\0\0\0\0\0\0"..., 4194304) = 4194304
write(26, "F\0\0\1\215\242\2\0\0\0\0@\10\0\0\0\0\0\0\0"..., 4194304) = 4194304
write(26, "F\0\0\1\215\242\2\0\0\0\0@\10\0\0\0\0\0\0\0"..., 4194304) = 4194304
write(26, "F\0\0\1\215\242\2\0\0\0\0@\10\0\0\0\0\0\0\0"..., 4194304) = 4194304
write(26, "F\0\0\1\215\242\2\0\0\0\0@\10\0\0\0\0\0\0\0"..., 4194304) = 4194304

It was just spewing those lines out, writing huge amount of data to file number "26". But what is file number 26? For that, we use the handy-dandy lsof:

# lsof -p 24801
httpd   24801 apache  cwd    DIR    8,3     4096         2 /
httpd   24801 apache  rtd    DIR    8,3     4096         2 /
httpd   24801 apache  txt    REG    8,3   319380   6339228 /usr/sbin/httpd
. . .
httpd   24801 apache   26w  0000    8,3        0  41713666 /tmp/magick-XXgvRhzG

So at the end of the output from lsof, we see under the "FD" (for "file descriptor") column, 26w (the "w" means it's open for writing) and the file is actually /tmp/magick-XXgvRhzG.

A quick ls on the /tmp directory, and I'm shocked:

-rw------- 1 apache apache 1854881318400 Aug 20 04:26 /tmp/magick-XXrQahSe
-rw------- 1 apache apache 1854881318400 Aug 20 04:26 /tmp/magick-XXTaXatz
-rw------- 1 apache apache 1854881318400 Aug 20 04:26 /tmp/magick-XXtf25pe

These files are 1.6 terrabytes! Luckily(?) they're sparse files so they don't actually take up that much physical space (the disks aren't even that big), but that's definitely not good.

The last piece of the puzzle was figuring out what images were being worked on to use those enormous temporary files. I was thinking maybe there was a corrupt .png file or something that was causing ImageMagick to freak out, but firing up Apache's wonderful mod_status and I could immediately see that the problem was my own dumb self: the URL it was requesting was:


Now, if you don't spot it right away, it's the "size" parameter: size:320200, it's supposed to be size:320,200 - what my code does if you leave off the "height" is it assumes you want to display the image at the specified width, but with the same aspect ratio. So it was trying to generate an image that was 320200x200125, rather than 320x200!

The Solution

The solution was, as is often the case, extremely simple once I'd figured out the problem. I just made sure the image resizer never resized an image to be larger than the original (our originals are generally always bigger than what's displayed on the website).

The only remaining question was where was this request coming from? The output of mod_status showed an IP address that belonged to Google, so it must've been the Googlebot crawling the site. But a quick search through the database showed no links to an image with the invalid size:320200 parameter.

At this stage, it's still an open mystery where that link was coming from. The image in question was from an article written in 2010, so it's not something current. In any case, where the link was coming from is of less concern than the fact that it no longer causes the whole server to grind to a halt whenever someone requests it.

So I'm happy to leave that one open for now.

App Engine task failed for no apparent reason (on the development server)

Posted by

So I ran into an interesting issue with App Engine task queues today. They would fail for no apparent reason, without even executing any of my code (but on the development server only). In the log, I would see the following:

WARNING  2012-05-07 11:56:25,644 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.100 seconds
WARNING  2012-05-07 11:56:25,746 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.200 seconds
WARNING  2012-05-07 11:56:25,947 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.400 seconds
WARNING  2012-05-07 11:56:25,644 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.100 seconds
WARNING  2012-05-07 11:56:25,746 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.200 seconds
WARNING  2012-05-07 11:56:25,947 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.400 seconds

I was a little confused, so I had a look in taskqueue_stub.py around line 1936. I could see basically all it was doing was connecting to my URL and issuing a GET just like I had configured it to. On a whim, I added an extra line of logging so that it now output the following:

WARNING  2012-05-07 11:56:25,643 taskqueue_stub.py:1828] Connecting to: (default host is
WARNING  2012-05-07 11:56:25,644 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.100 seconds
WARNING  2012-05-07 11:56:25,745 taskqueue_stub.py:1828] Connecting to: (default host is
WARNING  2012-05-07 11:56:25,746 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.200 seconds
WARNING  2012-05-07 11:56:25,946 taskqueue_stub.py:1828] Connecting to: (default host is
WARNING  2012-05-07 11:56:25,947 taskqueue_stub.py:1936] Task task1 failed to execute. This task will retry in 0.400 seconds

Here, the "default host" is what I specified in the command-line for the development server to listen on (all IPs, port 8271). However, is my machine's IP address -- but note there's no port specified. It would try to connect to port 80!

I couldn't figure out why until I looked at the code I was calling that was enqueueing the task to begin with. It's an Android app, and it uses HttpCore to manage the HTTP requests it makes to my App Engine app (I'm thinking of switching to the built-in URLConnection after all, but we'll see...) and the code has the following lines:

// This is paraphrasing slightly
String requestUrl = uri.getPath();
if (uri.getQuery() != null && uri.getQuery() != "") {
    requestUrl += "?"+uri.getQuery();

BasicHttpRequest request = new BasicHttpRequest(method, requestUrl)
request.addHeader("Host", uri.getHost());
// ...

You might see the problem already...

The problem is, we were setting the "Host" header to (no port number!) and the task library was then using the Host header we supplied to know which host to connect to in order to execute the task.

The solution is actually simple:

String host = uri.getHost();
if ((uri.getScheme().equals("http") && uri.getPort() != 80) ||
        (uri.getScheme().equals("https") && uri.getPort() != 443)) {
    host += ":"+uri.getPort();
request.addHeader("Host", host);

Basically, if we're using the default HTTP or HTTPS port, don't include it in the Host field. Otherwise, make sure the port number is included.

This would never be a problem if you were executing tasks from a browser (or using a more high-level library that auto-populated the Host header for me!) so it's understandable that I couldn't really find any other people having this problem. Maybe I should just switch to using URLConnection