Senf: the mustardy sensitive number finder

Senf is a portable tool for finding sensitive numbers. Use this tool to identify files on your system that may have Social Security Numbers (SSNs) or Credit Card Numbers (CCNs). The latest version can always be found at the Senf site.

Warning! [About Senf]

Warning: Do not have a false sense of security after running this program

It is important to understand what Senf is, and what Senf is not

What Senf is

  • A program written in Java by humans
  • A program which can quickly and conveniently point the operator of a computer to files which contain strings of text which resemble SSNs or CCNs
  • A program which tries to find files with large quantities of SSNs/CCNs; you can tell it to find a single occurence of a pattern, but that will yield many false positives.

What Senf is not

  • An infallible oracle which will detect all SSNs/CCNs and only SSNs/CCNs.
  • A prophet which will enlighten a computer operator as to the exact location of an SSN/CCN within a file (although the GUI version is somewhat sibylline).
  • Designed to detect sensitive data in encoded/encrypted binary files. In fact, it skips many file extensions by default.

What all this means

  • There will be false positives
  • There will be false negatives

Which reduces to (even though it's longer)!

This tool is not to be regarded as the end-all in your effort to ensure your computer is free of SSN/CCN records. It simply will report to you files that contain numbers that could pose a security threat. Remember, it looks for strings of numbers -- and the typical computer has lots of these.


Requirements

Java 1.5 JRE

No matter what system you're running on, you need the  Java 1.5 runtime (or greater); you do not need the whole Java 1.5 SDK, which includes the runtime.

System path

We assume that the Java interpreter is in your environment path (meaning, no matter where we try to run java from, it will run). The JRE installer should modify your system path to include the java interpreter.

If you get a strange error message saying something along the lines of "If you see this, Senf did not run!" then chances are your path is not set up correctly. Unfortunately, the solution to this is beyond the scope of this text.


Installation

Once the JRE is installed, all you need to do is copy senf.jar and the seeds folder to some folder on the computer that is going to run the scan. You might also want to copy the configuration files to the same folder.


Running

Brief Note

On some Operating Systems (typically Windows and Mac OS X), simply double-clicking on the senf.jar file will launch the program automatically. If this works, you can skip the rest of this section.

Windows

  1. Open a command prompt
  2. Navigate to the folder in which Senf is installed.
  3. Run java -jar senf.jar (with optional arguments)

Linux and Mac OS X

  1. Open a command shell
  2. Navigate to the folder in which Senf is installed.
  3. Run java -jar senf.jar (with optional arguments)

Using Senf

Usage: senf [OPTIONS]

OptionDefaultEffect
-qoffquiet mode (display no output)
-voffverbose mode (display everything)
-eoffprint error messages to the screen
-p <scan path>working dirSet the path to start scanning from
-l <yyyyMMdd>offSet modified-date check; files last modified before this date are skipped
-f <filesize>infiniteSet the max file size to scan; end size (no spaces) with 'g' for gigs, 'm' for megs, 'k' for kilobytes, and nothing for bytes
-m <number>15Set minimum number of times to match a CCN/SSN pattern before reporting a file
-o <log file>senf_DATE.txtSet the name of the file (including path, if you like) where log information will be saved
-alonAppend the current log to the end of the file if it already exists
-acoffAppend configuration information to the end of the output log
-nloffDo not use a log file
-goffHide the GUI
-asoffAuto-start scanning (ignored when -g is specified)
-hn/aDisplay this help and exit

By default, Senf only prints to the screen files which are matched -- not all output is shown.

Examples

  • To search all files in your home directory in Linux/Mac OS X
    • java -jar senf.jar -p ~/
  • To search all files in your home directory in Windows XP
    • java -jar senf.jar -p "C:\Documents and Settings\<yourname>"
  • To scan only files <= 100MB, ensure that each one has at least 12 matches before marking it as possible, display error messages, and start in a folder called C:\mustard\gruga
    • java -jar senf.jar -f 100m -m 12 -e -p "C:\mustard\gruga"

Also, note that this program may take a while to complete; again, by default, the only things it prints to the screen are possible matches (ie no errors), so it may look like it's frozen, not printing anything for a while, but it's (probably) not.

As of the Sasuke.188 release, Senf provides a GUI for ease of use. The GUI offers a results viewer to help the user quickly identify what was flagged by Senf as being sensitive. Results appear in the central pane of the Senf window as they are found; if an entry is clicked on, the Senf Analyzer will pop up, showing the applicable matches in the file.


Configuration files

Configuration

Senf uses the file senf.conf to load default settings.

Extensions

As of the Haku version, Senf uses an ACL in place of the old whitelist/blacklist system. The ACL is contained in the file senf.acl, and can be modified either by editing the file, or through the Senf GUI.

ACL entries have three columns. The first column denotes whether to allow or deny matches. The second dictates what type of match to look for. The third contains the expression to search for. Possible entries for each row are listed below.

ROW1ROW2ROW3
----------------------------------------
ALLOWBEGINSWITH<user_defined_expression>
DENYCONTAINS
ENDSWITH
EXACTLY
REGEX

An example "senf.acl" file is included with common entries. In the case of two conflicting entries, the entry listed first will over rule the later entry.


Libraries

Senf is now able to use external JAR libraries to scan more file types than before. The most notable example of this is PDF files. Senf uses  PDFBox to parse PDF files properly. It should be noted, that the PDFBox library depends on the  FontBox library.

These libraries should be placed in the "libs" folder in order for Senf to function properly. They may be named anything, and will still function.


How Senf Works

The way Senf scans has changed drasticly with the release of Haku. There are four important parts to Senf Haku. Parsers, Seeds, Streams, and Stream Sources.

Streams

A Stream is something that Senf can scan, and implements the class SenfStream. An example of a "Stream" is a text file.

Stream Sources

A Stream Source is something that contains streams, and implemtnts the class SenfStreamSource. An example of a Stream Source is a directory, or a zip file.

Seeds

A Seed is something that Senf will look for in a Stream. Seeds implement the class Seed. As of Senf Sasuke.188, Seeds are modular. This means that seeds may be added/removed from the "seeds" directory to modify what Senf will or will not search for within a Stream. At the moment, Senf includes a Seed for both Social Security Numbers and Credit Card numbers.

Parsers

Parses are the objects that tell the Senf engine what each "object" that is to be scanned should be scanned as. That is, the Parser tells Senf what type of Stream or StreamSource each SenfObject should be cast as. Parsers implement the class SenfParser, and are modular.


Algorithms

Senf looks for certain patterns to reduce false positives. Those patterns are described here. These patterns cannot be used to find every conceivable incarnation of the numbers Senf searches for. However, if you have suggestions for improving the algorithms (and, better, known false negatives to back up your suggestions) please let us know.

Credit card numbers

Formats

There are a number of valid credit card formats. Senf supports only the 16 digit formats. This includes Mastercard, some (but evidently not all) VISA, and Discover. It does NOT include, for example, American Express.

Separators

Credit cards numbers may be one long string of numbers (nnnnnnnnnnnnnnnn), or may be separated into groups of four digits (nnnn-nnnn-nnnn-nnnn). There are, of course, as many ways to delimit groups of digits as can be imagined; Senf only counts matches that use either no separator, or only one of:

  • dash ("-")
  • space (" ")
  • dot (".")
  • pipe ("|")

Luhn check

Credit cards must pass a Luhn mod 10 check to be considered valid.

Social Security Numbers

Formats and separators

Socials are detected in both single string (nnnnnnnnn) and grouped (nnn-nn-nnnn) formats; permitted separators are the same as credit card numbers.

Validity checking

Socials are verified against their area (the first three digits), according to the Social Security Administration's current list of valid high groups. In addition, group and serial numbers may not be all zeroes.


Okay, that's all

Thank you for using Senf! Feedback and questions are welcome; email security@utexas.edu

This version of the Senf README was updated for Senf.haku.335.