Why Source and Target are not of the Same Size while using rsync

Question: While using rsync the source and the target are not of the same size. What could be the reason?

There can be a difference in the size of the source and the target after using rsync. This can be due to the following reason:

1. Check, if you excluded stuff while using rsync then obviously the sizes will never be the same.

2. If your target is slightly smaller than your source, likely cause is a difference in directory sizes. This is simply due to how directories allocate disk space and can’t really be helped. I have devised a quick shell command to add up all of the file sizes in the current directory without including the directory sizes:

# echo 'find . -type f -ls | awk '{print $7 "+"}''0 | bc

This can be used to confirm that the files themselves are the same sizes even if the directories are not.

3. If your target is 1-10% larger than your source the most likely cause is hard links. Rsync by default or even with -a/–archive does not preserve hard links so if you rsync 2 hard links they will end up as duplicate files on the target taking up twice the disk space. If you want to preserve hard links add the -H/–hard-links option.

4. If your target is >10% larger than your source the most likely cause is sparse files. Rsync by default does not write any files as sparse files even if they are on the source (it can’t actually tell).

If you have sparse files (most commonly used as virtual machine images and incomplete p2p downloads) then you will want to use the –sparse option. Note that this can turn things around and make the target smaller than the source as rsync with –sparse will not allocate disk space for any long string of null characters possibly making files on the target sparse when they were not on the source.

5. There are also differences in filesystem types, block sizes, file slack overhead, etc. that can cause the outcome to be different.