Thursday, June 4, 2009

MyISAM vs. InnoDB via Twitter

So I saw the following tweet come across TweetDeck (using the search functionality as a custom feed no less):

ok #mysql / #database geeks on twitter (twittabase tweeks?) .. which is better: myisam or innodb, and where and why.
from @narendranag

140 characters is nowhere near enough space to answer that question. I'm going to put my thoughts here from the perspective of someone who's had to deal with large databases (500+ GB) in both MySQL and PostgreSQL doing both OLTP and OLAP work. Here's probably the best summation:

MyISAM
- Fast

InnoDB
- Reliable

That's not to say that MyISAM can't be reliable or that InnoDB can't be fast but that's the best way to look at it. But you can't balance which table type you choose based on those two criteria. There are reasons you might not need full ACID compliance that would make you still want to use InnoDB over MyISAM. Hell, MEMORY is faster than MyISAM but you don't want to use it for all your tables. Often times, the right answer is somewhere in the middle.

I tend to default to using InnoDB across the board. It's always defined as my default table type in my.cnf. I do this because often times the developers are coding around the assumption that the database supports things like foreign keys, transactions and the like. Admittedly, this is often hidden behind the ORM they use such as Hibernate but it's still important.

But what about speed? Why is InnoDB "slower" than MyISAM. It's basically because it's doing "more" than MyISAM. It's managing key constraints, logging transactions and all the other things that make it ACID compliant. Much of that "slowness", however, can be mitigated by getting away from the default InnoDB configurations and profiling the system over time to size bufferpools and the like:

  • Don't use a single tablespace file for InnoDB (innodb_file_per_table). The global tablespace will still need to be used but it's much slower than if you were to use a tablespace per table. This also gets you the benefit of being able to recover disk space after dropping an InnoDB table. If InnoDB is using the global tablespace file, the ONLY way to recover that space from dropping an InnoDB table or whole schema of InnoDB tables is to fully backup ALL schemas, remove the ibdata/log files, restart the database and then reload from backup. Not pretty.
  • Use the smallest possible primary key where you can. At one company, we were using GUIDs for the primary key. InnoDB prefaces every index with a subset of the first bytes of the Primary Key for that record. I can't remember the exact number off-hand but it was only a subset. Considering the first X bytes it was using could potentially be the same across multiple records, it took more work than if we had used ints. Additionally, not only were our primary key indexes unneccesarilly large, every subsequent index was as well. This wasn't as much a big deal on columns with smaller datatypes but indexes on columns of datatypes like varchar and lob were pretty ugly
  • Consider using a larger log file size. This has a trade-off in recovery time though. Your call there.
  • Get fancy with disks. If you have multiple volumes of different raid types, you can not only tell InnoDB to put the global tablespace and log files on a different path but you can also "move" the database to a different volume as well. This involves creating the database, shutting mysql down, moving the database directory from the mysql data directory to a new location and then symlinking it. Until MySQL or InnoDB gets me the ability to define where I want a given tablespace, this is the next best thing.

One area where InnoDB is faster than MyISAM natively is in concurrent CRUD operations. That's because InnoDB uses row-level locking. I'm not as clear on the specifics of the locking as I am with say DB2 (DB2 will actually lock the row before and after the one you're modifying) but it's better on multiple concurrent operations than table level locking.

So when would you want to use MyISAM then? One area we really found that using MyISAM made sense was on non-relational tables within a schema that normally had InnoDB tables. In one case, we had a table that had two columns - an id and a blob. That table was MyISAM. Conceivably anywhere you are denormalizing the data quite a bit, it can make sense to use a MyISAM table especially if it's a frequently rebuilt table (like a summary table updated from a cron job). Of course we've also used MEMORY tables for that same purpose. Just be careful how you intermix these in the code (the aforementioned Java transactions for instance).

So here's my recommendation:
OLTP tables - InnoDB with a few caveats
OLAP tables - MyISAM with a few caveats

No comments: