So War Worlds has, for quite some time, been plagued with various memory issues. One of the problems is that Android in general is pretty stingy with the heap memory it gives to programs, but the other is that it's actually very easy to "leak" large chunks of memory by accidentally leaving references to Activity objects in static variables.
There's not much that can be done about the former, but the latter can be fairly easily debugged using a couple of tools provided by Android and Java.
Detecting leaked Activties
The first step is figuring out when Activties are actually leaked. In API Level 11 (Honeycomb), Android added the detectActivityLeaks() method to StrictMode.VmPolicy.Builder class. All this method does is cause Android to keep an internal count of the number of instances of each Activity class that exist on the heap, and let you know whenever it hits 2.
When you get a StrictMode violation, there's a few things you can do, you can have Android kill your process, you can log a message, or you can do both. To be honest, neither of those things is particularly helpful because when you've got a leak what you really want is a dump of the heap so that you can analyse it later.
Luckily for us, the StrictMode class writes a message to System.err just before it kills the process, so we can do this little hack to hook in just before that happens and get a HPROF dump just in time:
private static void enableStrictMode() {
try {
StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
.detectActivityLeaks().detectLeakedClosableObjects()
.detectLeakedRegistrationObjects().detectLeakedSqlLiteObjects()
.penaltyLog().penaltyDeath().build());
// Replace System.err with one that'll monitor for StrictMode
// killing us and perform a hprof heap dump just before it does.
System.setErr (new PrintStreamThatDumpsHprofWhenStrictModeKillsUs(
System.err));
} catch(Exception e) {
// ignore errors
}
}
private static class PrintStreamThatDumpsHprofWhenStrictModeKillsUs
extends PrintStream {
public PrintStreamThatDumpsHprofWhenStrictModeKillsUs(OutputStream outs) {
super (outs);
}
@Override
public synchronized void println(String str) {
super.println(str);
if (str.startsWith("StrictMode VmPolicy violation with POLICY_DEATH;")) {
// StrictMode is about to terminate us... do a heap dump!
try {
File dir = Environment.getExternalStorageDirectory();
File file = new File(dir, "strictmode-violation.hprof");
super.println("Dumping HPROF to: " + file);
Debug.dumpHprofData(file.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
Essentially, what we do is, we set up StrictMode to kill us, the replace System.err with a hack subclass of PrintSteam that checks for the message StrictMode prints and generates a heap dump right then and there.
It's ugly, but we only have to do it in debug mode anyway, so I think it's OK.
Analysing the dump
Next, we have to copy the .hprof file off the device (you could use adb pull for that, but I like ES File Explorer). One important step is converting the .hprof file from Dalvik format to normal Java format. There's a handy-dandy tool called hprof-conv in the Android SDK which does the conversion for you.
Once you've got a HPROF file a format that can be read by the standard Java tools, the next step is actually analysing it. I was using the Eclipse Memory Analyzer tool, but anything will do, really. The first thing we need to do is find out which objects are leaking. Luckily for us, StrictMode prints out which activity is leaking, so we can start there:
01-15 17:24:23.248: E/StrictMode(13867): class au.com.codeka.warworlds.game.EmpireActivity; instances=2; limit=1
01-15 17:24:23.248: E/StrictMode(13867): android.os.StrictMode$InstanceCountViolation: class au.com.codeka.warworlds.game.EmpireActivity; instances=2; limit=1
So it's detected two copies of the EmpireActivity class. We fire up the Memory Analysis tool in Eclipse, and go to "List objects ... with incoming references" and enter our offending class, "au.com.codeka.warworlds.game.EmpireActivity". And just as StrictMode said, there's our two instances:
Now, how do we determine where they're being held? Right-click one of them and select "Path to GC Roots ... with all references", which will bring up something like this:
If we go with the assumption that the leak happens in our code and not in the framework (usually a pretty good assumption) the only object holding a reference to this activity is our FleetList class. If we expand that out a few times, we can follow the references all the way back to the mSpriteGeneratedListeners member of StarImageManager.
What the StarImageManager class does is it generates (or loads from disk) the bitmaps used to display stars. It has a list of objects that are interested in knowing when the star images are updated (generating them can take a while). What was happening here is that the FleetListAdapter class was adding itself to the list of classes interested in receiving updates from StarImageManager, but never removing itself from that list.
By fixing that bug, I managed to squash one the biggest memory leaks in the game.
Further work
One thing this has highlighted to me is that my system of highly manual event pub/sub is probably not a great idea. Scattered all throughout the code are lots of these "lists of classes who want to subscribe to my event". Management of those lists is all manual, firing the events is manual, and it's actually been a bit of a source of problems in the past (e.g. with events firing on random threads and whatnot).
So the next big chunk of work will be replacing all of these with some kind of unified EventBus pattern. I quite like the implimentation of code.google.com/p/simpleeventbus, in particular I like how it uses WeakReferences to hold the event subscribers (which would make the above memory leak non-existent).