解决方法
如果你正在使用
linux / * nix系统,你可以使用sha512sum之类的sha工具,因为md5可以被破坏.
find /path -type f -print0 | xargs -0 sha512sum | awk '($1 in seen){print "duplicate: "$2" and "seen[$1] }(!($1 in seen)){seen[$1]=$2}'
如果你想使用Python,一个简单的实现
import hashlib,os def sha(filename): ''' function to get sha of file ''' d = hashlib.sha512() try: d.update(open(filename).read()) except Exception,e: print e else: return d.hexdigest() s={} path=os.path.join("/home","path1") for r,d,f in os.walk(path): for files in f: filename=os.path.join(r,files) digest=sha(filename) if not s.has_key(digest): s[digest]=filename else: print "Duplicates: %s <==> %s " %( filename,s[digest])
如果您认为sha512sum不够,可以使用像diff或filecmp(Python)这样的unix工具