Replication Vs DRBD

Published in

MySQL Replication vs DRBD

I personally think DRBD has its place but
there are far more cases when other techniques would work much better
for variety of reasons.

Some reasons you can have systems to run
out of sync and have conflicting updates applied to them. With DRBD you
have the choice of killing one of the nodes and re-syncing from another
one while with MySQL Replication you can use Maatkit to merge the
changes after all, also you can review binary logs to see which updates
were applied to different nodes.

Transaction log replay, yes. But
fsck? These days this amounts to running a journal replay. Takes under
a second in most circumstance

I would put it 10 seconds but it
does not matter. The transactional log reply is likely to take much
longer than that. This is very bad property of DRBD - besides the well
understood overhead of committing on both nodes instead of one you also
meat the tough choice - you've got to pick either long recovery time or
further degraded performance. In the large databases I run in
production relaying on MySQL Replication for HA I often have 15+
minutes reply of Innodb Transactional log which would be a huge bummer
with DRBD.

The side question about it is of course the fact you
have to be picky about storage engines you're using - DRBD does not
work with MyISAM (check required) so you need to have processes to
ensure your application does not uses this storage engine which may be
hard to guaranty in many environments when development has too much
autonomy.
I must note in this aspect however DRBD goes on par with
MySQL Statement Based Replication - it is also well too easy to use
MySQL features which break replication.
hot vs cold. If you're
using decent hardware and care about performance you use O_DIRECT with
Innodb which makes it to bypass file cache. If you have it DRBD slave
will be fully cold. But lets assume you're ready to pay for yet another
penalty DRBD introduce and do not use this option wast memory and CPU
cycles on double caching. Even in this case DRBD slave node can't be
called hot because write load often does not touches the same data as
read load. Here is simple example - assume you're inserting the data at
the same time as running reporting queries on last month. All last
month will be hot on the slave which is doing reads but only last few
hours will be hot on the standby box.
Running two instances on the
server allows to reduce hardware waste with DRBD, though not eliminate
because you get some disks which you can't really use for anything else
rather than HA. Two instances also complicate things - depending on
infrastructure it can be seen as almost no complication or quite
serious complication.
"Cannot do maintenance on cold standby database."
But you can do anything you want with a database that you run off a DRBD LVM snapshot. Works on a Secondary node too.
So when I would recommend to use DRBD with MySQL ?
First
it is good choice for organizations which are got use to SAN based high
availability solutions, Quite often these guys would have be familiar
with such HA concept and it would be very natural for them to use same
approach for MySQL as they use for PostgreSQL for example instead of
investing time to learn about MySQL Replication or just looking to keep
MySQL infrastructure to be as close to one for other databases in use
as possible.
Second - it is often inevitable choice when you can't
avoid loosing any transaction - period. Some people would rather stand
longer fallback time (as with DRBD) but would not like to have lost
transactions which may happen with async replication. Another similar
case is when you're looking to ensured consistency - MySQL Replication
can out of sync - and there is bunch of tips in documentation of how to
do it. With DRBD the chance of nodes running out of sync is minimal and
can be caused by software and hardware bugs rather than known
limitations.
It is worth to note DRBD also allows building very
nice mixed environments with MySQL - for example you can replicate
binary logs using DRBD so if master node fails you have not lost
transactions - when you can use such logs to do point in time recovery
or to catch up to last few not committed transactions on the slave. We
should spend some time implementing such script sometime which could be
helping to get best of both worlds.
MySQL 5 High Availability with DRBD 8 and Heartbeat 2
we're
going to go through the entire process of installing, configuring and
testing DRBD, Heartbeat and MySQL running in a 2 node cluster
environment. This will be a bare bones configuration, best practices
and security will take a back seat while we learn how all the different
pieces work together.

Create Test Servers
Configure eth0 with fixed IP, node0 system is 192.168.12.21 and node1 will be 192.168.12.22.

No firewall
SELinux disabled
Minimal package group selection.

Once
the OS is up and running, be sure to yum update. You'll have to enable
centosplus, and the contrib repos. These test systems are running
kernel 2.6.9-55.0.2.EL. I used heartbeat-2.0.7-1, and drbd-8.0.4 for
this article.

Note: At the time of this article, a DRBD v8 RPM
didn't exist in the repos, so we'll create our own. I used a different
CentOS 4.5 machine that had all the development tools already
installed. Before creating your RPMs, you'll need to install the
development tools, if you haven't already done so with:

yum groupinstall 'Development Tools'
Building drbd-8.0.4 RPM
Download
the latest DRBD source from http://oss.linbit.com/drbd/. Just follow
these steps, and you should have an RPM ready to go.

[root@dev ~] tar -xvzf drbd-8.0.4.tar.gz
[root@dev ~] cd drbd-8.0.4 && make && make rpm
[root@dev ~] cd dist/RPMS/i386
[root@dev ~] ls

drbd-8.0.4-3.i386.rpm
drbd-debuginfo-8.0.4-3.i386.rpm
drbd-km-2.6.9_55.0.2-8.0.4-3.i386.rpm

Installation

[root@node0 ~]# rpm -Uvh drbd-8.0.4-3.i386.rpm
[root@node0 ~]# rpm -Uvh drbd-km-2.6.9_55.0.2-8.0.4-3.i386.rpm
[root@node0 ~]# yum install heartbeat
[root@node0 ~]# yum install mysql-server.i386

This will install mysql 5.0.42, which will work just fine for learning DRBD and Heartbeat.

Now
that we have all our software installed, we'll make node1 and this is
the easy part. Since we're working with VMware virtual server, all we
need to do is copy the node0 directory to node1! Follow these steps:
1. Shutdown node0.
2. Copy node0 to node1..
3.
Node1 is going to detect that hardware has changed, one of those will
be the network card. Remove the NIC, then kudzu will find it again,
allowing you to change the IP address. Node1 IP address is
192.168.12.22/255.255.255.0, same gateway/DNS as node0.
4. Edit
/etc/hosts and /etc/sysconfig/networking files and replace node0 with
node1. In hosts file, also add the IP addresses for all nodes, and
192.168.12.30 virtual IP. Call that virtual host cluster.
Configuring DRBD

1. Create partitions on both nodes.
2. Create drbd.conf
o Configure global drbd options.
3. Configure resource which consists of:
o Disk partitions on node0 and node1.
o Network connection between nodes.
o Error handling
o Synchronization
drbd.conf

global {
minor-count 1;
}

resource mysql {
protocol C; # There are A, B and C protocols. Stick with C.
# incon-degr-cmd "echo 'DRBD Degraded!' | wall; sleep 60 ; halt -f";
# If a cluster starts up in degraded mode, it will echo a message to all
# users. It'll wait 60 seconds then halt the system.

on node0.ka6wke.net {
device /dev/drbd0; # The name of our drbd device.
disk /dev/sdb1; # Partition we wish drbd to use.
address 192.168.12.21:7788; # node0 IP address and port number.
meta-disk internal; # Stores meta-data in lower portion of sdb1.
}

on node1.ka6wke.net {
device /dev/drbd0; # Our drbd device, must match node0.
disk /dev/sdb1; # Partition drbd should use.
address 192.168.12.22:7788; # IP address of node1, and port number.
meta-disk internal; #Stores meta-data in lower portion of sdb1.
}

disk {
on-io-error detach; # What to do when the lower level device errors.
}

net {
max-buffers 2048; #datablock buffers used before writing to disk.
ko-count 4; # Peer is dead if this count is exceeded.
#on-disconnect reconnect; # Peer disconnected, try to reconnect.
}

syncer {
rate 10M; # Synchronization rate, in megebytes. Good for 100Mb network.
#group 1; # Used for grouping resources, parallel sync.
al-extents 257; # Must be prime, number of active sets.
}

startup {
wfc-timeout 0; # drbd init script will wait infinitely on resources.
degr-wfc-timeout 120; # 2 minutes.
}
} # End of resource mysql

Bringing up DRBD

All software, drbd.conf, and devices have been created, make sure only node0 is running. Login as root, then:

drbdadm create-md mysql

cat /proc/drbd


Note
that node0 is in a secondary state, in a minute, we'll fix this by
promoting it to the primary node. Start up node1 then you'll have to:
drbdadm create-md mysql

You'll see we now have two Secondaries. Now, we need to promote node0 to Primary with this:

drbdadm -- --overwrite-data-of-peer primary mysql
watch cat /proc/drbd

Testing DRBD

To
have a working system, we need to create a filesystem on node0. We do
that just like normal, the difference is we use /dev/drbd0 device
instead of /dev/sdb1

mkfs.ext3 -L mysql /dev/drbd0

mkfs.ext3 /dev/drbd0

mke2fs 1.35 (28-Feb-2004)
mkfs.ext3: Wrong medium type while trying to determine filesystem size

You're on node1, which is secondary and /dev/drbd0 is read only! Switch to node0.

Once
that's done, we'll do some simple tests. On node0, mount /dev/drbd0 on
/mnt/mysql. Change to that directory, then touch a few test files,
create a directory. In order to check to see if our files have been
replicated, we need to unmount /mnt/mysql, make node0 secondary,
promote node1 to primary, remount /mnt/mysql then check to see if your
files are on node1. These steps are:

umount /mnt/mysql
drbdadm secondary mysql


Switch to node1, then:

drbdadm primary mysql
mount /dev/drbd0 /mnt/mysql

Configuring MySQL to use DRBD

MySQL
is installed, but we need to configure it to use the DRBD device. We do
that by simply putting all the databases and logs in /mnt/mysql/data.
In a production environment, you'd probably break out logs, database
and index files onto different devices. Since this is an experimental
system, we'll just put everything into one resource.

Edit my.cnf
so datadir=/mnt/mysql/data. After that, run mysql_install_db. Check
/mnt/mysql/data to see if the mysql database directory has been
created, and all the files are there. Start mysql:

mysqld_safe &
[1] 4013
Starting mysqld daemon with databases from /mnt/mysql/data

Lets
do a manual switch to node1, we have to add shutting down mysqld first
before unmounting /mnt/mysql, and starting mysqld on node1 as the last
step. MySQL should start up on node1 without any errors. Login to
MySQL, change to world, run a select. Your data is all on node1. Now we
know DRBD is working, and MySQL will work over DRBD. Next is to
configure Heartbeat to make this all automatic.

Configuring Heartbeat to Manage DRBD & MySQL
Heartbeat
is the brains used to automate all the steps needed to determine if a
node is down, and run all the steps needed to bring the spare node up
as primary. We'll be using heartbeat v2.x for this experimental
cluster. Heartbeat uses three config files, all found in /etc/ha.d:

ha.cf - the main heartbeat config file.
haresources - resources that are managed by heartbeat.
authkeys - used for node authenitacation.

Node0
will be our primary node. The first configuration file is ha.cf, and
this file is what configures all the nodes, communication, and which
features are enabled. Our basic ha.cf file will look like this

logfacility daemon # This is deprecated
keepalive 1 # Interval between heartbeat (HB) packets.
deadtime 10 # How quickly HB determines a dead node.
warntime 5 # Time HB will issue a late HB.
initdead 120 # Time delay needed by HB to report a dead node.
udpport 694 # UDP port HB uses to communicate between nodes.
ping 192.168.12.10 # Ping VMware Server host to simulate network resource.
bcast eth0 # Which interface to use for HB packets.
auto_failback off # Auto promotion of primary node upon return to cluster.
node node0.ka6wke.net # Node name must be same as uname -r.
node node1.ka6wke.net # Node name must be same as uname -r.

respawn hacluster /usr/lib/heartbeat/ipfail
# Specifies which programs to run at startup

use_logd yes # Use system logging.
logfile /var/log/hb.log # Heartbeat logfile.
debugfile /var/log/heartbeat-debug.log # Debugging logfile.

These
are the directions Heartbeat needs to manage our resources. On startup,
it's read from left to right. On node0, first thing Heartbeat will do
is create the virtual IP, then it'll bring up our DRBD resource. Once
that's finished, it'll mount /dev/drbd0 on /mnt/mysql, then finally
startup mysqld. When shutting down, Heartbeat reads this file from
right to left. It'll stop mysqld, unmount /mnt/mysql, and demote node0
to secondary. The last thing Heartbeat will do is stop the virtual IP
from working on node0. Heartbeat on node1 will notice node0 is missing,
and take over all the resources in the same fashion as node0 does upon
startup.

Our last file is authkeys. Since our virtual machines
are running on a private network for experimental purposes, we'll
dispose of any network security. We'll use the unsecure crc directive,
our authkeys file:

auth 2
2 crc


The permissions on
the authkeys file must be 0600, if you don't heartbeat won't start, and
in the system log, you see: node0 heartbeat: One last edit to make is
MySQL configuration file, my.cnf so mysqld will bind to the virtual IP
address. Add bind-address=192.168.12.30 in my.cnf, and copy it over to
node1.

We're ready to start heartbeat up on both nodes.
service heartbeat start
service heartbeat start

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.12.%'

Testing MySQL High Availability

service heartbeat stop

On node1, DRBD has mounted the MySQL datadir, and started mysqld You can also see Heartbeat running too: