The previous posts in this series covered collection: where data lives, how to find it, how to harvest it. This post covers making sense of it — turning a pile of identifiers and findings into a coherent picture. Link analysis is the discipline of representing entities and the relationships between them as a graph, then querying and visualizing that graph to find patterns the underlying data hides.
Why graphs, not spreadsheets
Most OSINT investigations start in spreadsheets. For small cases, that's fine. As soon as you cross a threshold — usually around 50-100 entities or after a few pivots — spreadsheets fail in characteristic ways:
- Many-to-many relationships don't fit cleanly in rows-and-columns. A single phone number belongs to multiple people across time; a single person has multiple phone numbers.
- Path queries like "is there any connection between A and Z" are answerable in seconds in a graph and take hours of manual cross-referencing in spreadsheets.
- Cluster discovery — "which subset of these entities are densely interconnected?" — is trivial in a graph layout, invisible in tabular data.
- Provenance tracking — recording where each fact came from, with what confidence — needs first-class link metadata, which is awkward in spreadsheets.
- Presentation — investigators present findings to non-technical stakeholders, who understand pictures faster than tables.
Link analysis tooling is built for this exact transition: when the case is bigger than a single analyst's head can hold.
Core concepts: entities, links, transforms
Entities
An entity is a node in the graph. In OSINT contexts, common entity types include: Person, Organization, Phone Number, Email Address, Domain, IP Address, Username, URL, Image, Document, Geographic Location, Cryptocurrency Address, License Plate, Vessel, Aircraft, Phrase. Each entity has a type, a value, and a set of properties (e.g., a Person entity has a name, possibly a date of birth, possibly photos).
Links
A link is an edge between two entities, representing a relationship. Common types: "owns," "registered to," "communicated with," "related to," "located at," "co-occurred in document with," "shares contact with." Links carry metadata too — direction, timestamp, source, confidence.
Transforms
A transform is a function: given an entity, return other entities related to it. "Given an email address, find usernames registered with it." "Given a domain, find its current and historical resolved IPs." "Given a phone number, find associated names from public records." Transforms are the heart of an investigative tool — they're how the graph expands.
Modern link analysis tools ship with hundreds of transforms; you can also write custom transforms that wrap your own data sources or paid APIs.
Maltego in 2026
Maltego, originally released by Paterva and now maintained by Maltego Technology, is the long-standing standard for OSINT link analysis. The core product is a graph editor with a marketplace of transform integrations.
What Maltego does well:
- Rich entity ontology. Hundreds of pre-defined entity types covering the OSINT, cyber, fraud, and financial-investigation domains.
- Vast transform library. Bundled "Standard Transforms" cover free OSINT sources; paid "Transform Hubs" integrate with commercial data sources (SecurityTrails, IntelX, Have I Been Pwned, Recorded Future, Shodan, OpenSanctions, many more).
- Layout and exploration UX — the graph canvas is genuinely usable for interactive investigation. You select an entity, right-click to run transforms, and the graph expands.
- Collaboration — commercial editions support shared workspaces.
- Scripting — Python TRX framework for custom transforms.
The trade-offs:
- Pricing. The Community Edition is functional but limited (entity-count caps per graph, fewer transforms). Commercial editions add up quickly, especially with paid transform hubs layered on.
- Learning curve. The full feature set rewards a week of focused practice. Maltego's own training materials are good; budget the time.
- Vendor lock-in. Maltego graphs are best opened in Maltego; export to other formats works but loses some fidelity.
Open-source alternatives
SpiderFoot
An automated OSINT framework with built-in graph visualization. Less interactive than Maltego — you point it at a target and it runs many transforms automatically — but excellent for fast initial reconnaissance.
Gephi
A general-purpose graph-analysis tool, originally academic. You import nodes and edges as CSV/GEXF, run layout algorithms (ForceAtlas2 is the popular one), and analyze. Less integrated into OSINT collection workflows but better at large-scale visualization and quantitative graph analysis (centrality measures, community detection).
Neo4j + Bloom
A property graph database with a visualization client (Bloom). More technical to stand up but offers unbeatable query power (Cypher) and scales to very large graphs. Useful when an investigation grows into a long-running data product.
Recon-ng
A modular reconnaissance framework with a database backend and workspace concept. Less graph-oriented but pairs well with Gephi or Neo4j for visualization.
Custom Python + NetworkX
For programmatic investigators, NetworkX gives you full graph-theory operations in a few dozen lines of code. Pair with Pyvis, Plotly, or D3 for visualization. Often the right answer when an investigation has unusual data sources that no off-the-shelf transform covers.
A practical investigative workflow
- Seed entities. Start with the small set of high-confidence facts that motivated the investigation — a person's name, an email, a domain, an incident IP. Place them on the canvas.
- First-pass pivots. Run the obvious transforms on each seed. Don't yet curate; let the graph grow.
- Review and prune. Most first-pass results are noise (common usernames that happen to match, generic phone numbers, etc.). Prune obvious false hits, flag uncertain ones.
- Second-pass pivots on the kept entities. The investigation deepens. Now you're following real edges.
- Verify cross-source. When a relationship appears, verify it through at least one independent source before treating it as a fact in your graph.
- Annotate with provenance. Each link should reference the source that established it, with a confidence level. Future-you will thank present-you.
- Run graph algorithms. Centrality (who's structurally important), community detection (which entities cluster), shortest paths (how A connects to B).
- Snapshot. Save graph snapshots at each major investigative milestone. You'll want them when writing the report.
Worked example (synthetic)
Suppose an investigator is examining whether a domain registered last week is part of a coordinated phishing campaign.
- Seed: the new domain.
- Transforms: WHOIS → registrant email + registrar; passive DNS → resolving IPs; certificate transparency → subdomains; HTTP fingerprint → technology stack and favicon hash.
- Pivots:
- Favicon hash → search across Shodan for other internet-facing assets with the same favicon → 12 hits across 9 hosting providers.
- Resolving IP → reverse-DNS, then passive DNS on that IP → 47 sibling domains historically resolved.
- Registrant email → check across other registrar records → 6 additional domains registered with the same email this year.
- Cluster analysis: the 12+47+6+seed entities cluster into 3 communities by hosting provider and registration cadence. One cluster is a long-running campaign; one is a recent re-registration; one is a single decoy.
- Centrality: one specific email address sits at the intersection of all three clusters. High-value pivot for further investigation.
- Report: the graph plus narrative makes the campaign legible to a non-technical reviewer in a way a list of 60 domains never would.
Pitfalls in graph thinking
- Confusing a transform result with a fact. A transform returned "this username is registered on these 12 platforms." That's a tool finding, not a proven fact, until verified. Many false hits look identical to true hits in the graph.
- Conflation. Two different humans with the same name, two different email addresses with the same prefix — these collapse into one node by default and corrupt the analysis. Maintain identity discipline.
- Visual conviction. A dense, dramatic-looking graph feels conclusive even when most edges are weakly sourced. Centrality of a node is a function of which transforms you ran, not necessarily of objective importance.
- Stale data. Historical relationships shown in passive DNS, WHOIS, or breach data may not be current. Time-stamp your edges.
- Survivorship bias. You see the relationships your transforms surface. Absence in the graph is not absence in the world.
- Anchoring. Investigators often build the graph around their initial hypothesis and unconsciously prune contradicting paths. Periodically invert the question — what would prove me wrong?
Presenting findings
The final graph is for you. The graph for the stakeholder is usually different:
- Simplify. Prune nodes that don't carry the narrative. A 12-node graph that tells the story is better than a 200-node graph that buries it.
- Annotate the story path. Highlight the chain of edges that constitute the finding. Use color or weight.
- Caption every claim. A graph image without explanation is unhelpful. Pair each visual with the narrative paragraph it supports.
- Maintain the full graph separately with provenance, for the inevitable follow-up questions.
- Be honest about confidence. Distinguish "verified through three independent sources" from "single passive-DNS hit." Reviewers will trust the next finding more if you're calibrated on this one.
For the data sources that feed link analysis, see Shodan, Google Dorks, and the OSINT Toolkit. For identifier-specific pivots, see Username & Email OSINT and Geolocation OSINT. For boundaries on what you should do with the resulting graph, see OSINT Legal & Ethics.
- Maltego — Official documentation
- SpiderFoot — Open-source automation
- Gephi — Graph visualization platform
- NetworkX — Python network analysis
- Neo4j — Property graph database