User Guide

This guide provides a more in-depth look at how to use Pattern Analyzer's various features and interfaces.

1. Command-Line Interface (CLI)

The patternanalyzer CLI is the primary tool for automated analysis. The main command is analyze.

Basic Usage

patternanalyzer analyze <input_file> [options]

Key Options

-o, --out <path>: Specifies the path for the output JSON report. Defaults to report.json.
-c, --config <path>: Path to a JSON or YAML configuration file to customize the analysis pipeline.
--profile <name>: Use a built-in analysis profile. Available profiles: quick, nist, crypto, full. This is an easy way to run a focused set of tests.
--xor-value <0-255>: Applies a single-byte XOR transformation to the data before running tests.
--html-report <path>: Generates a standalone HTML report in addition to the JSON output.

Example with a profile and HTML report:

patternanalyzer analyze suspicious.dat --profile crypto --html-report report.html

2. Configuration Files

For full control over the analysis, you can use a configuration file in YAML or JSON format.

Structure

A configuration file can specify transforms, tests, and global settings.

Example (config.yml):

# 1. A list of transformations to apply in sequence
transforms:
  - name: xor_const
    params:
      xor_value: 85 # 0x55 in decimal

# 2. A list of tests to run on the transformed data
tests:
  - name: monobit
  - name: runs
    params:
      min_bits: 100 # Custom parameter for the runs test
  - name: ecb_detector
    params:
      block_size: 16

# 3. Global settings
fdr_q: 0.05 # False Discovery Rate significance level (q-value)

To use this file:

patternanalyzer analyze my_file.bin --config config.yml

3. Web User Interface (Streamlit)

The web UI provides an interactive way to upload files, select tests and transforms, and view results, including visualizations.

To launch the Web UI:

streamlit run app.py

Or, if [ui] extras are installed:

patternanalyzer serve-ui

Navigate to the URL shown in your terminal to access the interface.

4. Interpreting the Results

The JSON output from an analysis contains three main sections: results, scorecard, and meta.

results: A list where each item is the detailed output of a single test plugin. Key fields include:
test_name: The name of the test.
status: completed, skipped, or error.
p_value: The p-value from the statistical test. A low p-value (e.g., < 0.01) suggests the data is not random according to this test. A value of null means the test is diagnostic and doesn't produce a p-value.
fdr_rejected: true if the test's p-value was deemed significant after correcting for multiple comparisons (False Discovery Rate). This is the primary indicator of a "failed" test.
metrics: A dictionary of test-specific measurements and statistics.
scorecard: A high-level summary of the entire analysis.
failed_tests: The number of tests where fdr_rejected was true. This is the most important summary metric.
total_tests: Total number of tests that were run.
p_value_distribution: Statistics on the distribution of p-values from all statistical tests.
meta: Information about the analysis environment, including Python version, library versions, and a hash of the input data for reproducibility.