Okay - I've done this using two scripts. The first is called 'copy_files' and does the "find" and "md5sum" work:
if [ $# -lt 2 ]; then
echo Usage: copy_files [source directory] [destination directory]
else
if [ ! -d $2/hashes ]; then
echo "Error: Did not find $2/hashes directory (create it and re-run if $2 is the correct destination)."
else
find $1 -name \* -type f | xargs -n1 md5sum | xargs -n2 ./copy_new_file $2
fi
fi
and the second is called "copy_new_file" which will copy the file to the destination unless a file with the same hash has already been copied before:
if [ $# -lt 3 ]; then
echo Usage: copy_new_file [destination directory] [hash value] [source file]
else
if [ ! -d $1/hashes ]; then
echo "Error: Did not find $1/hashes directory (create it and re-run if $1 is the correct destination)."
else
if [ ! -d $1/hashes ]; then
echo "Error: Did not find $1/hashes directory (create it and re-run if $1 is the correct destination)."
else
if [ ! -f $1/hashes/$2 ]; then
echo $2 $3> $1/hashes/$2
cp $3 $1
fi
fi
fi
fi
To use first make sure you have execute permissions on both scripts:
chmod a+x copy_files copy_new_file
Now it is as simple as:
./copy_files source_dir dest_dir
This does have a problem that if you have two (or more) files that have the same name but have different hashes as the subsequent files will just overwrite the earlier ones. If this is going to be an issue for you then I'll work out a way to perhaps prefix the filename with the hash.