I'm running Mongodb on AWS EC2 instance. Data/log/and journal are stored in a separate volume, formatted as xfs. Currently we are stopping the mongodb instance to take a snapshot, but reading this: https://docs.mongodb.com/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/ apparently there is no need to stop the service during the snapshot since journal is enabled. Am I correct? Can I create consistent snapshot even if the service is running?
1 Answer
In general, do not trust any backup procedure until you have confirmed the integrity of a restore from long term media.
You already have the capability to take a storage system layer backup online. In this case, with EBS volumes or Linux LVM. The problem is getting the database in a consistent state.
An online backup is possible with or without journal. In either case, mongo's way to suspend database writes is fsync and lock, as described in that tutorial.
Without a journal, it is difficult to tell what data is durable on disk and what is buffered and not yet committed. fsync and lock establishes a point in time, and stops any more in progress writes until the backup is done.
The lock is also needed with multiple disks, where (on this storage system) the snapshots are not consistent with each other. Suspending writes for the duration of the backup means that disk /dev/sdf
will not be at a slightly different point of time compared to /dev/sdg
.
Mongo claims that if you only have a single disk, and have a journal, you don't need to fsync and lock. Presumably, the EBS snapshot is a good enough crash-consistent point in time, and journal forward recovery can fix up any incomplete writes.
-
Agree completely with this, particularly that you need to regularly do a restore test. I personally would do (and actually do) a little more. 1) I would do a monthly process to stop, snapshot, and start the database 2) I would have a process that periodically (nightly for me) exports the data, compresses, then stores the backup somewhere outside of AWS. I trust AWS plenty, but human error is infinite. I also store other assets, using in incremental backup system.– TimCommented Sep 4, 2018 at 9:10
-
1You don't need to take offline backups if you have sufficiently verified online backups. But offline backups are simpler. Also: restore test from long term media is open to interpretation. It could be restored to another storage account/region/cloud provider to demonstrate you can export to a different failure domain. Commented Sep 4, 2018 at 12:42
Snapshotting with the journal is only possible if the journal resides on the same volume as the data files, so that one snapshot operation captures the journal state and data file state atomically.
. If your data & journal are stored on separate volumes you cannot take a consistent snapshot of an active deployment, and would have to fsyncLock or stop themongod
process.