Hola, techtalkers. I have an interesting problem to enliven your
Sunday. My photo archives are a mess, because instead of setting up a
sensible organization from the start I instantly devolved into a
mess of random duplicate backup directories, with as many as five
duplicates of the same photo scattered all over the place. It's 9000+
images, so I'm not real eager to hunt down and delete the duplicates
manually. So I came up with this:

First make a list of the duplicates:

find Pictures/  -type f -exec md5sum '{}' ';' | sort | uniq
--all-repeated=separate -w 15 > dupes.txt

This makes a nice text file with a blank line between each photo name:




I like using md5sums to find the duplicates because it finds dupes with
different filenames.

Awk counts all the filenames without counting the blank lines:

awk 'NF != 0 {++count} END {print count}' dupes.txt

Then I run a couple more awk incantations to count how many unique
images there are, and I get 4301. Whee fun, eh?

So I can generate a list of unique files with awk, and then
use cp, rsync or mv to copy the list to a new directory. ExifTool is
really slick for manipulating big batches of image files, but I'm still
figuring how to use it.

Another option is to delete the duplicates and leave the rest in place,
but I haven't figured out how to do that.

Thoughts? Brainstorms? I also looked at the Organize command included
in Exiv2, but I couldn't get it to build on my system.


