rockfish hunt
Run graph-based behavioral threat detection on Parquet flow data.
Overview
The hunt engine builds a communication graph from network flow data and applies configurable detection algorithms to identify threats beyond signature matching — C2 beaconing, lateral movement, data exfiltration, and more.
Findings are scored with anomaly detection models (HBOS or Isolation Forest), assigned severity levels, and mapped to MITRE ATT&CK tactics.
Usage
rockfish hunt [OPTIONS]
Detection Types
| Detection | Description | MITRE Tactic |
|---|---|---|
| beaconing | C2 callbacks via inter-connection timing regularity | Command and Control |
| lateral | Multi-hop internal attack chains (A -> B -> C) | Lateral Movement |
| fanout | Single external IP contacted by many internal hosts | Command and Control |
| portscan | Hosts probing many unique ports on a target | Discovery |
| community | Botnet-like clusters via graph components | Command and Control |
| exfiltration | Asymmetric flows with disproportionate outbound volume | Exfiltration |
| dns_tunneling | DNS queries with long or encoded subdomains | Command and Control |
| new_connection | Source-destination pairs absent from 7-day baseline | Initial Access |
| polling_disruption | Interruption of periodic communication | Impact |
| baseline_deviation | Volume or pattern shifts vs. historical norms | Discovery |
Select Specific Detections
rockfish hunt -d /data --sensor my-sensor --hive \
--detections beaconing,lateral,fanout
Output
Parquet (default)
Findings are written to {data-dir}/{sensor}/hunt/*.parquet for ingestion into the report.
Stdout
# JSON output
rockfish hunt -d /data --sensor my-sensor --stdout
# Pretty-printed JSON
rockfish hunt -d /data --sensor my-sensor --stdout --pretty
# Table format
rockfish hunt -d /data --sensor my-sensor --stdout --format table
Severity Filtering
# Only high and critical findings
rockfish hunt -d /data --sensor my-sensor --min-severity high
Tuning
| Option | Default | Description |
|---|---|---|
--min-beacon-connections | auto | Minimum connections for beacon detection |
--max-beacon-cv | auto | Maximum coefficient of variation |
--min-fanout-sources | auto | Minimum internal sources for C2 fanout |
--min-portscan-ports | auto | Minimum unique ports for port scan |
--min-community-size | auto | Minimum nodes for community detection |
--internal-networks | RFC 1918 | Internal network CIDRs |
--scoring-method | hbos | Anomaly scoring: hbos or iforest |
Anomaly Scoring Algorithms
All detections produce findings that are scored using one of two anomaly detection methods. When a detection has 5 or more candidate groups, statistical scoring is applied; otherwise, fixed severity thresholds are used.
HBOS (Histogram-Based Outlier Scoring)
The default scoring method. HBOS builds equal-width histograms for each feature dimension, then scores each observation based on how rare its bin is.
How it works:
- For each feature (e.g., coefficient of variation, connection count, byte ratio), divide the observed range into 10 equal-width bins
- Count the number of observations in each bin to compute density:
density = count_in_bin / total_observations - Apply a floor to prevent log(0):
density = max(density, 0.5 / total) - Compute per-feature score:
score = -log10(density)— rarer bins produce higher scores - Sum all feature scores for the final anomaly score
Properties:
- Assumes feature independence (no cross-feature correlations)
- O(n) time complexity — fast, single-pass histogram construction
- Supports feature inversion for metrics where lower values are more suspicious (e.g., beacon CV where 0.01 is more suspicious than 0.5)
Isolation Forest (iForest)
An ensemble method that isolates anomalies using random partitioning trees.
How it works:
- Build 100 random isolation trees, each sampling 256 data points
- Each tree recursively partitions data by randomly selecting a feature and split value until each point is isolated (or max depth is reached)
- For each observation, compute the average path length across all trees — anomalies are isolated in fewer splits
- Convert to anomaly score:
score = 2^(-avg_path_length / c(n))wherec(n)is the expected path length for a balanced BST - Transform to match HBOS scale:
final_score = -log10(1 - raw_score)
Properties:
- Captures cross-feature interactions (unlike HBOS)
- More robust to feature scaling
- Higher computational cost than HBOS
- Deterministic (seed = 42 for reproducibility)
Select scoring method:
rockfish hunt -d /data --sensor my-sensor --scoring-method iforest
Severity Mapping
Anomaly scores are mapped to severity levels using percentile-based thresholds across the entire finding population:
| Percentile | Severity |
|---|---|
| ≥ 95th | Critical |
| ≥ 85th | High |
| ≥ 70th | Medium |
| < 70th | Low |
If the maximum score across all findings is below 2.0, severity is capped at Medium to suppress false positives in benign environments.
Detection Algorithm Details
Beaconing
Detects C2 callbacks by measuring the regularity of connection intervals.
- Group connections by
(src_ip, dest_ip, dest_port) - Compute inter-arrival time intervals between consecutive connections
- Calculate the coefficient of variation:
CV = stddev / mean - A perfect beacon has CV ≈ 0; random traffic has CV ≈ 1.0
Scoring features: CV (inverted), connection count, mean interval, byte consistency (CV of payload sizes)
| Threshold | Severity |
|---|---|
| CV < 0.05, connections > 50 | Critical |
| CV < 0.1 | High |
| CV ≤ 0.2 (max threshold) | Medium |
Lateral Movement
Detects multi-hop attack chains where internal hosts are progressively compromised.
- Build a temporal adjacency graph:
src → [(dest, timestamp)] - Identify pivot hosts (both source and destination)
- For each pivot, look for inbound → outbound sequences within a 1-hour window
- Extend chains recursively (up to 10 hops)
| Chain Length | Severity |
|---|---|
| ≥ 5 hops | Critical |
| ≥ 4 hops | High |
| ≥ 3 hops (minimum) | Medium |
C2 Fanout
Detects a single external IP receiving connections from many internal hosts (botnet controller pattern).
| Unique Sources | Severity |
|---|---|
| ≥ 20 internal hosts | Critical |
| ≥ 10 | High |
| ≥ 5 (minimum) | Medium |
Port Scan
Detects hosts probing many ports on a target.
- Count distinct destination ports per
(src_ip, dest_ip)pair - Detect sequential port runs (e.g., 80-84) and compute
sequential_ratio - Compute scan rate (ports per second)
Scoring features: unique ports, flow count, sequential ratio, scan rate
| Unique Ports | Severity |
|---|---|
| ≥ 100 | Critical |
| ≥ 50 | High |
| ≥ 25 (minimum) | Medium |
Community Detection
Identifies botnet-like clusters using Kosaraju’s Strongly Connected Components algorithm.
- Build a directed graph from flow data
- Find SCCs where every node can reach every other node
- Compute density:
edges / (n × (n-1))
| Community Size | Severity |
|---|---|
| ≥ 10 hosts | Critical |
| ≥ 5 | High |
| ≥ 3 (minimum) | Medium |
DNS Tunneling
Detects data exfiltration encoded in DNS subdomain labels.
- Pre-filter: average subdomain length must exceed 15 characters
- Analyze unique subdomain count, TXT record ratio, and query rate per base domain
Scoring features: unique subdomains, avg label length, TXT ratio, query rate
| Condition | Severity |
|---|---|
| ≥ 500 subdomains AND avg length ≥ 25 | Critical |
| ≥ 200 subdomains OR TXT ratio ≥ 0.5 | High |
| Meets pre-filter thresholds | Medium |
Data Exfiltration
Detects internal hosts uploading disproportionate data volumes to external hosts.
- Compute byte ratio:
bytes_out / (bytes_out + bytes_in)— ratio ≥ 0.8 is suspicious - Filter: minimum 10 MB outbound
Scoring features: total bytes out, byte ratio, flow count
| Condition | Severity |
|---|---|
| ≥ 1 GB AND ratio ≥ 0.95 | Critical |
| ≥ 100 MB | High |
| ≥ 10 MB, ratio ≥ 0.8 | Medium |
New Connection Pair
Detects (src_ip, dest_ip, dest_port) tuples never seen in the 7-day baseline window. Particularly important for OT/IoT networks where traffic is highly deterministic.
Known OT ports (Modbus 502, DNP3 20000, MQTT 1883/8883, BACnet 47808, EtherNet/IP 44818, S7comm 102, OPC UA 4840, IEC 104 2404) trigger elevated severity.
| Condition | Severity |
|---|---|
| OT port, flows ≥ 5 | Critical |
| OT port, any flows | High |
| Regular port, flows ≥ 10 | High |
| Otherwise | Medium |
Polling Disruption
Detects when previously periodic communication becomes irregular or stops entirely. Designed for SCADA/OT environments.
- Identify connections periodic in baseline (CV ≤ 0.3)
- Detect disruption: either stopped (0 recent flows) or irregular (recent CV > 0.8)
| Condition | Severity |
|---|---|
| Stopped, baseline > 100 flows | Critical |
| Stopped | High |
| Irregular, CV > 2.0 | High |
| Irregular | Medium |
Baseline Deviation
Detects significant deviations from historical traffic patterns.
- Compare recent (1 hour) vs baseline (7 days) for same connection tuples
- Compute ratios:
flow_ratio = recent / baseline,bytes_ratio = recent / baseline - Flag new protocols not seen in baseline
Scoring features: flow count ratio, bytes ratio, new protocol count
| Condition | Severity |
|---|---|
| Flow ratio > 10 OR ≥ 3 new protocols | Critical |
| Flow ratio > 5 OR bytes ratio > 5 OR ≥ 1 new protocol | High |
| Ratio > 2.0 | Medium |
Continuous Mode
# Run every hour (default)
rockfish hunt -d /data --sensor my-sensor --hive --continuous
# Run every 15 minutes
rockfish hunt -d /data --sensor my-sensor --hive \
--continuous --interval-minutes 15
Time Window
rockfish hunt -d /data --sensor my-sensor -t "24 hours" # default
rockfish hunt -d /data --sensor my-sensor -t "7 days"
rockfish hunt -d /data --sensor my-sensor -t "1 hour"
Examples
# Standard 24-hour threat hunt
rockfish hunt -d /data/rockfish --sensor prod-01 --hive -t "24 hours"
# Continuous with high severity filter
rockfish hunt -d /data --sensor prod-01 --hive \
--continuous --interval-minutes 30 --min-severity high
# Beaconing with custom thresholds
rockfish hunt -d /data --sensor my-sensor \
--detections beaconing --min-beacon-connections 50 --max-beacon-cv 0.15