[Techtalk] interesting photo management problem-- weeding out duplicates

Sun Apr 27 22:40:19 UTC 2014

Hola, techtalkers. I have an interesting problem to enliven your
Sunday. My photo archives are a mess, because instead of setting up a
sensible organization from the start I instantly devolved into a
mess of random duplicate backup directories, with as many as five
duplicates of the same photo scattered all over the place. It's 9000+
images, so I'm not real eager to hunt down and delete the duplicates
manually. So I came up with this:

First make a list of the duplicates:

find Pictures/  -type f -exec md5sum '{}' ';' | sort | uniq
--all-repeated=separate -w 15 > dupes.txt

This makes a nice text file with a blank line between each photo name:

5374b0c445690e735e5e10ba248f5ed0
Pictures/Pictures-realhome/insurance/122005/14194820.JPG
5374b0c445690e735e5e10ba248f5ed0
Pictures/Pictures-realhome/jdfd-xmas-2005/P1000048.JPG

5374e5d9486b223c508f81175cdf551a
Pictures/pictures/canon-30d/IMG_0084.JPG
5374e5d9486b223c508f81175cdf551a
Pictures/Pictures-realhome/random-pictures/canon-30d/IMG_0084.JPG

537b0f0dd3ea8e35465c6cb86d2faa67
Pictures/pictures/105_PANA/P1050778.JPG
537b0f0dd3ea8e35465c6cb86d2faa67
Pictures/Pictures-realhome/random-pictures/105_PANA/P1050778.JPG 

I like using md5sums to find the duplicates because it finds dupes with
different filenames.

Awk counts all the filenames without counting the blank lines:

awk 'NF != 0 {++count} END {print count}' dupes.txt
9855

Then I run a couple more awk incantations to count how many unique
images there are, and I get 4301. Whee fun, eh?

So I can generate a list of unique files with awk, and then
use cp, rsync or mv to copy the list to a new directory. ExifTool is
really slick for manipulating big batches of image files, but I'm still
figuring how to use it.

Another option is to delete the duplicates and leave the rest in place,
but I haven't figured out how to do that.

Thoughts? Brainstorms? I also looked at the Organize command included
in Exiv2, but I couldn't get it to build on my system.

Carla

-- 
++++++++++++++++++++++++++++++++++++++++
Ace Linux guru                         +
carlaschroder.com                      +
Everything mortal has moments immortal +
++++++++++++++++++++++++++++++++++++++++