Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

rockfish hunt

Run graph-based behavioral threat detection on Parquet flow data.

Overview

The hunt engine builds a communication graph from network flow data and applies configurable detection algorithms to identify threats beyond signature matching — C2 beaconing, lateral movement, data exfiltration, and more.

Findings are scored with anomaly detection models (HBOS or Isolation Forest), assigned severity levels, and mapped to MITRE ATT&CK tactics.

Usage

rockfish hunt [OPTIONS]

Detection Types

DetectionDescriptionMITRE Tactic
beaconingC2 callbacks via inter-connection timing regularityCommand and Control
lateralMulti-hop internal attack chains (A -> B -> C)Lateral Movement
fanoutSingle external IP contacted by many internal hostsCommand and Control
portscanHosts probing many unique ports on a targetDiscovery
communityBotnet-like clusters via graph componentsCommand and Control
exfiltrationAsymmetric flows with disproportionate outbound volumeExfiltration
dns_tunnelingDNS queries with long or encoded subdomainsCommand and Control
new_connectionSource-destination pairs absent from 7-day baselineInitial Access
polling_disruptionInterruption of periodic communicationImpact
baseline_deviationVolume or pattern shifts vs. historical normsDiscovery

Select Specific Detections

rockfish hunt -d /data --sensor my-sensor --hive \
  --detections beaconing,lateral,fanout

Output

Parquet (default)

Findings are written to {data-dir}/{sensor}/hunt/*.parquet for ingestion into the report.

Stdout

# JSON output
rockfish hunt -d /data --sensor my-sensor --stdout

# Pretty-printed JSON
rockfish hunt -d /data --sensor my-sensor --stdout --pretty

# Table format
rockfish hunt -d /data --sensor my-sensor --stdout --format table

Severity Filtering

# Only high and critical findings
rockfish hunt -d /data --sensor my-sensor --min-severity high

Tuning

OptionDefaultDescription
--min-beacon-connectionsautoMinimum connections for beacon detection
--max-beacon-cvautoMaximum coefficient of variation
--min-fanout-sourcesautoMinimum internal sources for C2 fanout
--min-portscan-portsautoMinimum unique ports for port scan
--min-community-sizeautoMinimum nodes for community detection
--internal-networksRFC 1918Internal network CIDRs
--scoring-methodhbosAnomaly scoring: hbos or iforest

Anomaly Scoring Algorithms

All detections produce findings that are scored using one of two anomaly detection methods. When a detection has 5 or more candidate groups, statistical scoring is applied; otherwise, fixed severity thresholds are used.

HBOS (Histogram-Based Outlier Scoring)

The default scoring method. HBOS builds equal-width histograms for each feature dimension, then scores each observation based on how rare its bin is.

How it works:

  1. For each feature (e.g., coefficient of variation, connection count, byte ratio), divide the observed range into 10 equal-width bins
  2. Count the number of observations in each bin to compute density: density = count_in_bin / total_observations
  3. Apply a floor to prevent log(0): density = max(density, 0.5 / total)
  4. Compute per-feature score: score = -log10(density) — rarer bins produce higher scores
  5. Sum all feature scores for the final anomaly score

Properties:

  • Assumes feature independence (no cross-feature correlations)
  • O(n) time complexity — fast, single-pass histogram construction
  • Supports feature inversion for metrics where lower values are more suspicious (e.g., beacon CV where 0.01 is more suspicious than 0.5)

Isolation Forest (iForest)

An ensemble method that isolates anomalies using random partitioning trees.

How it works:

  1. Build 100 random isolation trees, each sampling 256 data points
  2. Each tree recursively partitions data by randomly selecting a feature and split value until each point is isolated (or max depth is reached)
  3. For each observation, compute the average path length across all trees — anomalies are isolated in fewer splits
  4. Convert to anomaly score: score = 2^(-avg_path_length / c(n)) where c(n) is the expected path length for a balanced BST
  5. Transform to match HBOS scale: final_score = -log10(1 - raw_score)

Properties:

  • Captures cross-feature interactions (unlike HBOS)
  • More robust to feature scaling
  • Higher computational cost than HBOS
  • Deterministic (seed = 42 for reproducibility)

Select scoring method:

rockfish hunt -d /data --sensor my-sensor --scoring-method iforest

Severity Mapping

Anomaly scores are mapped to severity levels using percentile-based thresholds across the entire finding population:

PercentileSeverity
≥ 95thCritical
≥ 85thHigh
≥ 70thMedium
< 70thLow

If the maximum score across all findings is below 2.0, severity is capped at Medium to suppress false positives in benign environments.

Detection Algorithm Details

Beaconing

Detects C2 callbacks by measuring the regularity of connection intervals.

  1. Group connections by (src_ip, dest_ip, dest_port)
  2. Compute inter-arrival time intervals between consecutive connections
  3. Calculate the coefficient of variation: CV = stddev / mean
  4. A perfect beacon has CV ≈ 0; random traffic has CV ≈ 1.0

Scoring features: CV (inverted), connection count, mean interval, byte consistency (CV of payload sizes)

ThresholdSeverity
CV < 0.05, connections > 50Critical
CV < 0.1High
CV ≤ 0.2 (max threshold)Medium

Lateral Movement

Detects multi-hop attack chains where internal hosts are progressively compromised.

  1. Build a temporal adjacency graph: src → [(dest, timestamp)]
  2. Identify pivot hosts (both source and destination)
  3. For each pivot, look for inbound → outbound sequences within a 1-hour window
  4. Extend chains recursively (up to 10 hops)
Chain LengthSeverity
≥ 5 hopsCritical
≥ 4 hopsHigh
≥ 3 hops (minimum)Medium

C2 Fanout

Detects a single external IP receiving connections from many internal hosts (botnet controller pattern).

Unique SourcesSeverity
≥ 20 internal hostsCritical
≥ 10High
≥ 5 (minimum)Medium

Port Scan

Detects hosts probing many ports on a target.

  1. Count distinct destination ports per (src_ip, dest_ip) pair
  2. Detect sequential port runs (e.g., 80-84) and compute sequential_ratio
  3. Compute scan rate (ports per second)

Scoring features: unique ports, flow count, sequential ratio, scan rate

Unique PortsSeverity
≥ 100Critical
≥ 50High
≥ 25 (minimum)Medium

Community Detection

Identifies botnet-like clusters using Kosaraju’s Strongly Connected Components algorithm.

  1. Build a directed graph from flow data
  2. Find SCCs where every node can reach every other node
  3. Compute density: edges / (n × (n-1))
Community SizeSeverity
≥ 10 hostsCritical
≥ 5High
≥ 3 (minimum)Medium

DNS Tunneling

Detects data exfiltration encoded in DNS subdomain labels.

  1. Pre-filter: average subdomain length must exceed 15 characters
  2. Analyze unique subdomain count, TXT record ratio, and query rate per base domain

Scoring features: unique subdomains, avg label length, TXT ratio, query rate

ConditionSeverity
≥ 500 subdomains AND avg length ≥ 25Critical
≥ 200 subdomains OR TXT ratio ≥ 0.5High
Meets pre-filter thresholdsMedium

Data Exfiltration

Detects internal hosts uploading disproportionate data volumes to external hosts.

  1. Compute byte ratio: bytes_out / (bytes_out + bytes_in) — ratio ≥ 0.8 is suspicious
  2. Filter: minimum 10 MB outbound

Scoring features: total bytes out, byte ratio, flow count

ConditionSeverity
≥ 1 GB AND ratio ≥ 0.95Critical
≥ 100 MBHigh
≥ 10 MB, ratio ≥ 0.8Medium

New Connection Pair

Detects (src_ip, dest_ip, dest_port) tuples never seen in the 7-day baseline window. Particularly important for OT/IoT networks where traffic is highly deterministic.

Known OT ports (Modbus 502, DNP3 20000, MQTT 1883/8883, BACnet 47808, EtherNet/IP 44818, S7comm 102, OPC UA 4840, IEC 104 2404) trigger elevated severity.

ConditionSeverity
OT port, flows ≥ 5Critical
OT port, any flowsHigh
Regular port, flows ≥ 10High
OtherwiseMedium

Polling Disruption

Detects when previously periodic communication becomes irregular or stops entirely. Designed for SCADA/OT environments.

  1. Identify connections periodic in baseline (CV ≤ 0.3)
  2. Detect disruption: either stopped (0 recent flows) or irregular (recent CV > 0.8)
ConditionSeverity
Stopped, baseline > 100 flowsCritical
StoppedHigh
Irregular, CV > 2.0High
IrregularMedium

Baseline Deviation

Detects significant deviations from historical traffic patterns.

  1. Compare recent (1 hour) vs baseline (7 days) for same connection tuples
  2. Compute ratios: flow_ratio = recent / baseline, bytes_ratio = recent / baseline
  3. Flag new protocols not seen in baseline

Scoring features: flow count ratio, bytes ratio, new protocol count

ConditionSeverity
Flow ratio > 10 OR ≥ 3 new protocolsCritical
Flow ratio > 5 OR bytes ratio > 5 OR ≥ 1 new protocolHigh
Ratio > 2.0Medium

Continuous Mode

# Run every hour (default)
rockfish hunt -d /data --sensor my-sensor --hive --continuous

# Run every 15 minutes
rockfish hunt -d /data --sensor my-sensor --hive \
  --continuous --interval-minutes 15

Time Window

rockfish hunt -d /data --sensor my-sensor -t "24 hours"  # default
rockfish hunt -d /data --sensor my-sensor -t "7 days"
rockfish hunt -d /data --sensor my-sensor -t "1 hour"

Examples

# Standard 24-hour threat hunt
rockfish hunt -d /data/rockfish --sensor prod-01 --hive -t "24 hours"

# Continuous with high severity filter
rockfish hunt -d /data --sensor prod-01 --hive \
  --continuous --interval-minutes 30 --min-severity high

# Beaconing with custom thresholds
rockfish hunt -d /data --sensor my-sensor \
  --detections beaconing --min-beacon-connections 50 --max-beacon-cv 0.15