waqas bhatti / notes / some rsync tricks

Some rsync tips and usage patterns I've come to love over the last few years:

Get the total size of files that will be transferred for a session

rsync -aurvhn --stats [source] [dest]

then look for "Total transferred file size".

the -n flag means it's a dry-run, so nothing will actually be transferred
(except for file lists).

Using a weaker cipher and no compression over a local network to speed things up

rsync -aurvh --stats --rsh="ssh -c arcfour -o Compression=no" [source] [dest]

Exclude everything except for a certain wildcard glob when transferring things

rsync -aurvh --include="*/" --include="[glob to include]" --exclude="*" [source] [dest]

this basically says: include all subdirectories and the glob, but exclude
everything else.

Check if all source files are present at the destination

 rsync -aurvhnc --stats [source] [dest]

the -c flag calculates 128-bit checksums for all remote and local files,
which is a more robust comparison than the default modified-time and filename
comparison.

Rsyncing a large file over an unreliable connection

nohup rsync --partial --append-verify [remotefile] [localdest] &

the nohup makes the rsync process immune to SSH glitches killing the terminal

the partial and append-verify flags make sure rsync transfers files in a way
such that it can resume partially completed file fragments
 

Parallelized rsync over fast (10-gigE) NFS connections

find [source] -type d -name '[desired glob]' -print0 | xargs -0 -P [nprocesses] -n 1 -I% rsync -aW % [dest]

To get filenames from a file instead:

cat [file with source paths] | xargs -P [nprocesses] -n [nfiles per process] -I% rsync -aW % [dest]

Try to avoid rsync over NFS—it's absolutely terrible and will incur huge NFS overhead when it transfers filelists back and forth, and when it checks if a file exists on the destination. Use the -W (transfer whole files) flag to force it to not use the file-delta calculations when running it over NFS. Also try to pull the source from the destination when rsyncing over NFS, instead of pushing to the destination from the source. For some reason, rsync is way faster the first way.