Sunday, January 16, 2011

How do you synchronise huge sparse files (VM disk images) between machines?

Is there a command, such as rsync, which can synchronise huge, sparse, files from one linux server to another?

It is very important that the destination file remains sparse. It may be longer (but not bigger) than the drive which contains it. Only changed blocks should be sent across the wire.

I have tried rsync, but got no joy. groups.google.com/group/mailing.unix.rsync/browse_thread/thread/94f39271980513d3

If I write a programme to do this, am I just reinventing the wheel? http://www.finalcog.com/synchronise-block-devices

Thanks,

Chris.

  • I'm not aware of such a utility, only of the system calls that can handle it, so if you write such a utility, it might be rather helpful.

    what you actually can do is use qemu-img convert to copy the files, but it will only work if the destination FS supports sparse files

    From dyasny
  • Rsync only transfers changes to each file and with --inplace should only rewrite the blocks that changed without recreating the file. From their features page.

    rsync is a file transfer program for Unix systems. rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.

    Using --inplace should work for you. This will show you progress, compress the transfer (at the default compression level), transfer the contents of the local storage directory recursively (that first trailing slash matters), make the changes to the files in place and use ssh for the transport.

    rsync -v -z -r --inplace --progress -e ssh /path/to/local/storage/ \
    user@remote.machine:/path/to/remote/storage/
    

    I often use the -a flag as well which does a few more things. It's equivalent to -rlptgoD I'll leave the exact behavior for you to look up in the man page.

    : The '-S' is for sparse files, not 'chops long lines'. From man page: -S, --sparse handle sparse files efficiently. I'll give this a try, thanks.
    wizard : Thanks I fixed that - I Was going off of something that was said in the link you gave.
    : No, unfortunately this does not solve the problem. It *does* sync the file, but it turns the sparse file at the far end into a non-sparse file. I am using ssh/rsync which comes with Ubuntu 9.04.
    : My above comment was incorrect. The problem was that rsync creates non-sparse files on its first copy. The --inplace rsync does work correctly, provided that the destination file already exists and is as long (not big) as the origin file. I now have a solution, but it requires me to check whether each file already exists on the target server. If it does, I do an --inplace, if it doesn't, I use --sparse. This is not ideal, but it works.
    : Solved - http://www.finalcog.com/rsync-vm-sparse-inplace-kvm-vmware
    From wizard
  • Take a look at Zumastor Linux Storage Project it implements "snapshot" backup using binary "rsync" via the ddsnap tool.

    From the man-page:

    ddsnap provides block device replication given a block level snapshot facility capable of holding multiple simultaneous snapshots efficiently. ddsnap can generate a list of snapshot chunks that differ between two snapshots, then send that difference over the wire. On a downstream server, write the updated data to a snapshotted block device.

    From rkthkr
  • Could replicating the whole file system be a solution? DRBD? http://www.drbd.org/

    : I don't think drbd is a good solution here, but the idea of rsyncing --inplace the whole fs, rather than the disk-image-files, is interesting. I'm not sure whether rsync allows this - I'll give it a try and report back...
    From James C

0 comments:

Post a Comment