Thursday, October 28, 2010

Finding sensitive information from a drive or folder

A quick script for finding email addresses on a massive scale (for instance, on a drive).


#!/bin/env sh

SEARCHPATH="$1"

find "$SEARCHPATH" -type f -print | while IFS=$"\n" read file
do
echo "\nSearching through $file..."

MATCHES=`strings "$file" | egrep '([[:alnum:]_.-]{1,64}+@[[:alnum:]_.-]{2,255}+?\.[[:alpha:].]{2,4})'`

if [ "$MATCHES" != "" ]
then
echo "---------------------------\nFound matches, beware false positives:"
echo "$MATCHES"
fi
done

unset IFS


False positives are pretty much guaranteed (as long as binary files are on the file system). Most sensitive data follows patterns, so the regex is interchangeable with SSN's or anything else you need to find.

Some example output from running the script on /usr/src/...


Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/sched.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2005, Keir Fraser <keir@xensource.com>

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/version.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2005, Nguyen Anh Quynh <aquynh@gmail.com>
* Copyright (c) 2005, Keir Fraser <keir@xensource.com>

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/physdev.h...

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/event_channel.h...

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/vcpu.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2005, Keir Fraser <keir@xensource.com>

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/memory.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2005, Keir Fraser <keir@xensource.com>

Searching through /usr/src/linux-headers-2.6.35-22/include/xen/interface/elfnote.h...

Searching through /usr/src/linux-headers-2.6.35-22/include/crypto/skcipher.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2007 Herbert Xu <herbert@gondor.apana.org.au>

Searching through /usr/src/linux-headers-2.6.35-22/include/crypto/ctr.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2007 Herbert Xu <herbert@gondor.apana.org.au>

Searching through /usr/src/linux-headers-2.6.35-22/include/crypto/compress.h...

Searching through /usr/src/linux-headers-2.6.35-22/include/crypto/algapi.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2006 Herbert <herbert@gondor.apana.org.au>

Searching through /usr/src/linux-headers-2.6.35-22/include/crypto/hash.h...
---------------------------
Found matches, beware false positives:
* Copyright (c) 2008 Herbert Xu <herbert@gondor.apana.org.au>

No comments:

Post a Comment