rockfish hunt

Run graph-based behavioral threat detection on Parquet flow data.

Overview

The hunt engine builds a communication graph from network flow data and applies configurable detection algorithms to identify threats beyond signature matching — C2 beaconing, lateral movement, data exfiltration, and more.

Findings are scored with anomaly detection models (HBOS or Isolation Forest), assigned severity levels, and mapped to MITRE ATT&CK tactics.

Usage

rockfish hunt [OPTIONS]

Detection Types

Detection	Description	MITRE Tactic
beaconing	C2 callbacks via inter-connection timing regularity	Command and Control
lateral	Multi-hop internal attack chains (A -> B -> C)	Lateral Movement
fanout	Single external IP contacted by many internal hosts	Command and Control
portscan	Hosts probing many unique ports on a target	Discovery
community	Botnet-like clusters via graph components	Command and Control
exfiltration	Asymmetric flows with disproportionate outbound volume	Exfiltration
dns_tunneling	DNS queries with long or encoded subdomains	Command and Control
new_connection	Source-destination pairs absent from 7-day baseline	Initial Access
polling_disruption	Interruption of periodic communication	Impact
baseline_deviation	Volume or pattern shifts vs. historical norms	Discovery

Select Specific Detections

rockfish hunt -d /data --sensor my-sensor --hive \
  --detections beaconing,lateral,fanout

Output

Parquet (default)

Findings are written to {data-dir}/{sensor}/hunt/*.parquet for ingestion into the report.

Stdout

# JSON output
rockfish hunt -d /data --sensor my-sensor --stdout

# Pretty-printed JSON
rockfish hunt -d /data --sensor my-sensor --stdout --pretty

# Table format
rockfish hunt -d /data --sensor my-sensor --stdout --format table

Severity Filtering

# Only high and critical findings
rockfish hunt -d /data --sensor my-sensor --min-severity high

Tuning

Option	Default	Description
`--min-beacon-connections`	auto	Minimum connections for beacon detection
`--max-beacon-cv`	auto	Maximum coefficient of variation
`--min-fanout-sources`	auto	Minimum internal sources for C2 fanout
`--min-portscan-ports`	auto	Minimum unique ports for port scan
`--min-community-size`	auto	Minimum nodes for community detection
`--internal-networks`	RFC 1918	Internal network CIDRs
`--scoring-method`	`hbos`	Anomaly scoring: `hbos` or `iforest`

Anomaly Scoring Algorithms

All detections produce findings that are scored using one of two anomaly detection methods. When a detection has 5 or more candidate groups, statistical scoring is applied; otherwise, fixed severity thresholds are used.

HBOS (Histogram-Based Outlier Scoring)

The default scoring method. HBOS builds equal-width histograms for each feature dimension, then scores each observation based on how rare its bin is.

How it works:

For each feature (e.g., coefficient of variation, connection count, byte ratio), divide the observed range into 10 equal-width bins
Count the number of observations in each bin to compute density: density = count_in_bin / total_observations
Apply a floor to prevent log(0): density = max(density, 0.5 / total)
Compute per-feature score: score = -log10(density) — rarer bins produce higher scores
Sum all feature scores for the final anomaly score

Properties:

Assumes feature independence (no cross-feature correlations)
O(n) time complexity — fast, single-pass histogram construction
Supports feature inversion for metrics where lower values are more suspicious (e.g., beacon CV where 0.01 is more suspicious than 0.5)

Isolation Forest (iForest)

An ensemble method that isolates anomalies using random partitioning trees.

How it works:

Build 100 random isolation trees, each sampling 256 data points
Each tree recursively partitions data by randomly selecting a feature and split value until each point is isolated (or max depth is reached)
For each observation, compute the average path length across all trees — anomalies are isolated in fewer splits
Convert to anomaly score: score = 2^(-avg_path_length / c(n)) where c(n) is the expected path length for a balanced BST
Transform to match HBOS scale: final_score = -log10(1 - raw_score)

Properties:

Captures cross-feature interactions (unlike HBOS)
More robust to feature scaling
Higher computational cost than HBOS
Deterministic (seed = 42 for reproducibility)

Select scoring method:

rockfish hunt -d /data --sensor my-sensor --scoring-method iforest

Severity Mapping

Anomaly scores are mapped to severity levels using percentile-based thresholds across the entire finding population:

Percentile	Severity
≥ 95th	Critical
≥ 85th	High
≥ 70th	Medium
< 70th	Low

If the maximum score across all findings is below 2.0, severity is capped at Medium to suppress false positives in benign environments.

Detection Algorithm Details

Beaconing

Detects C2 callbacks by measuring the regularity of connection intervals.

Group connections by (src_ip, dest_ip, dest_port)
Compute inter-arrival time intervals between consecutive connections
Calculate the coefficient of variation: CV = stddev / mean
A perfect beacon has CV ≈ 0; random traffic has CV ≈ 1.0

Scoring features: CV (inverted), connection count, mean interval, byte consistency (CV of payload sizes)

Threshold	Severity
CV < 0.05, connections > 50	Critical
CV < 0.1	High
CV ≤ 0.2 (max threshold)	Medium

Lateral Movement

Detects multi-hop attack chains where internal hosts are progressively compromised.

Build a temporal adjacency graph: src → [(dest, timestamp)]
Identify pivot hosts (both source and destination)
For each pivot, look for inbound → outbound sequences within a 1-hour window
Extend chains recursively (up to 10 hops)

Chain Length	Severity
≥ 5 hops	Critical
≥ 4 hops	High
≥ 3 hops (minimum)	Medium

C2 Fanout

Detects a single external IP receiving connections from many internal hosts (botnet controller pattern).

Unique Sources	Severity
≥ 20 internal hosts	Critical
≥ 10	High
≥ 5 (minimum)	Medium

Port Scan

Detects hosts probing many ports on a target.

Count distinct destination ports per (src_ip, dest_ip) pair
Detect sequential port runs (e.g., 80-84) and compute sequential_ratio
Compute scan rate (ports per second)

Scoring features: unique ports, flow count, sequential ratio, scan rate

Unique Ports	Severity
≥ 100	Critical
≥ 50	High
≥ 25 (minimum)	Medium

Community Detection

Identifies botnet-like clusters using Kosaraju’s Strongly Connected Components algorithm.

Build a directed graph from flow data
Find SCCs where every node can reach every other node
Compute density: edges / (n × (n-1))

Community Size	Severity
≥ 10 hosts	Critical
≥ 5	High
≥ 3 (minimum)	Medium

DNS Tunneling

Detects data exfiltration encoded in DNS subdomain labels.

Pre-filter: average subdomain length must exceed 15 characters
Analyze unique subdomain count, TXT record ratio, and query rate per base domain

Scoring features: unique subdomains, avg label length, TXT ratio, query rate

Condition	Severity
≥ 500 subdomains AND avg length ≥ 25	Critical
≥ 200 subdomains OR TXT ratio ≥ 0.5	High
Meets pre-filter thresholds	Medium

Data Exfiltration

Detects internal hosts uploading disproportionate data volumes to external hosts.

Compute byte ratio: bytes_out / (bytes_out + bytes_in) — ratio ≥ 0.8 is suspicious
Filter: minimum 10 MB outbound

Scoring features: total bytes out, byte ratio, flow count

Condition	Severity
≥ 1 GB AND ratio ≥ 0.95	Critical
≥ 100 MB	High
≥ 10 MB, ratio ≥ 0.8	Medium

New Connection Pair

Detects (src_ip, dest_ip, dest_port) tuples never seen in the 7-day baseline window. Particularly important for OT/IoT networks where traffic is highly deterministic.

Known OT ports (Modbus 502, DNP3 20000, MQTT 1883/8883, BACnet 47808, EtherNet/IP 44818, S7comm 102, OPC UA 4840, IEC 104 2404) trigger elevated severity.

Condition	Severity
OT port, flows ≥ 5	Critical
OT port, any flows	High
Regular port, flows ≥ 10	High
Otherwise	Medium

Polling Disruption

Detects when previously periodic communication becomes irregular or stops entirely. Designed for SCADA/OT environments.

Identify connections periodic in baseline (CV ≤ 0.3)
Detect disruption: either stopped (0 recent flows) or irregular (recent CV > 0.8)

Condition	Severity
Stopped, baseline > 100 flows	Critical
Stopped	High
Irregular, CV > 2.0	High
Irregular	Medium

Baseline Deviation

Detects significant deviations from historical traffic patterns.

Compare recent (1 hour) vs baseline (7 days) for same connection tuples
Compute ratios: flow_ratio = recent / baseline, bytes_ratio = recent / baseline
Flag new protocols not seen in baseline

Scoring features: flow count ratio, bytes ratio, new protocol count

Condition	Severity
Flow ratio > 10 OR ≥ 3 new protocols	Critical
Flow ratio > 5 OR bytes ratio > 5 OR ≥ 1 new protocol	High
Ratio > 2.0	Medium

Continuous Mode

# Run every hour (default)
rockfish hunt -d /data --sensor my-sensor --hive --continuous

# Run every 15 minutes
rockfish hunt -d /data --sensor my-sensor --hive \
  --continuous --interval-minutes 15

Time Window

rockfish hunt -d /data --sensor my-sensor -t "24 hours"  # default
rockfish hunt -d /data --sensor my-sensor -t "7 days"
rockfish hunt -d /data --sensor my-sensor -t "1 hour"

Examples

# Standard 24-hour threat hunt
rockfish hunt -d /data/rockfish --sensor prod-01 --hive -t "24 hours"

# Continuous with high severity filter
rockfish hunt -d /data --sensor prod-01 --hive \
  --continuous --interval-minutes 30 --min-severity high

# Beaconing with custom thresholds
rockfish hunt -d /data --sensor my-sensor \
  --detections beaconing --min-beacon-connections 50 --max-beacon-cv 0.15

Keyboard shortcuts

Rockfish NDR Documentation