"Google Dorking" is the practice of using advanced search operators to find specific kinds of content that ordinary searches miss. The technique is decades old, the operators are documented by Google itself, and the use cases range from helpful (finding your own exposed data) to harmful (finding other people's exposed data to misuse). This post covers the operators, the patterns, and the ethical framing.
What Google Dorks are
Google's search syntax includes a set of structured operators that filter results by specific criteria โ file type, site, URL pattern, title text, and more. Combining them produces queries far more precise than free-text search. A "dork" is just a creative combination of these operators that surfaces something interesting.
The operators themselves are public. Google publishes them. Bing and DuckDuckGo support most of the same syntax (with small differences). Nothing about using them is illegal โ they're searching content the publisher chose to expose to the internet. What you do with the findings is what matters.
Core operators
Domain and URL filters
site:example.comโ restrict to a domainsite:*.example.comโ include subdomains-site:blog.example.comโ exclude a subdomaininurl:adminโ URL contains "admin"allinurl:login adminโ URL contains both terms
Content filters
intitle:"index of"โ page title contains the phraseallintitle:resume johnโ title contains all termsintext:"confidential"โ body text contains the phrase"exact phrase"โ exact match (quotes)-wordโ exclude a word from results
File type filters
filetype:pdfโ PDF files onlyfiletype:xls OR filetype:xlsxโ Excel filesfiletype:envโ env files (often misconfigured exposed)filetype:sqlโ SQL dumpsfiletype:logโ log files
Other useful operators
cache:urlโ view Google's cached copyrelated:urlโ pages similar to a given URLinfo:urlโ Google's index info for a URLbefore:2024-01-01/after:2023-06-01โ date rangenumrange:100..500โ numbers in a rangeAROUND(5)โ words appearing within N words of each other
Combining operators
The power of dorking is in combining operators. Examples:
site:example.com filetype:pdf intext:"confidential"โ confidential PDFs on a specific sitesite:example.com -site:www.example.comโ non-www subdomainsinurl:wp-admin "login"โ exposed WordPress login pagesintitle:"index of" "parent directory"โ open directory listingsfiletype:env "DB_PASSWORD"โ exposed environment files with database credentials (used by defenders to find what's leaked)site:pastebin.com "api_key"โ leaked API keys on paste sitessite:github.com "BEGIN RSA PRIVATE KEY"โ accidentally committed private keys
The same techniques an attacker uses to find exposed secrets are the techniques a defender uses to audit their own posture. The operators don't care which side you're on.
Defensive use cases
Audit your own organization
site:yourcompany.com filetype:pdfโ every PDF Google indexed on your domainsite:yourcompany.com -site:www.yourcompany.comโ non-www subdomains, useful for shadow IT discoverysite:github.com "yourcompany.com"โ public repos that reference your companysite:pastebin.com "yourcompany.com"โ any data exposures referencing your domain"yourcompany" filetype:env OR filetype:sqlโ broader hunt for leaked config files
Find old, no-longer-needed content
site:yourcompany.com before:2018-01-01โ pre-2018 content still in Google's indexsite:yourcompany.com intitle:"draft" OR intitle:"internal"โ content marked as drafts or internal that escaped
Look for typosquatted domains
"yuorcompany.com" OR "yorcompany.com"โ common typos of your brandintitle:"yourbrand" -site:yourcompany.comโ sites using your brand name that aren't yours
Research patterns
For journalists and researchers, dorks help surface documents that exist but weren't easily discoverable. Some patterns:
- Government documents:
site:gov filetype:pdf "topic"โ federal PDFs on a topic.site:state.gov filetype:pdfโ state department PDFs. - Academic papers:
site:edu filetype:pdf "research topic" - Court records: Each jurisdiction differs, but many have searchable filing portals; combining
site:with case keywords surfaces relevant filings. - SEC filings:
site:sec.gov "company name" filetype:htm - Corporate announcements that were "soft-launched":
site:linkedin.com "company name" "joined" "as CEO"โ surfaces leadership transitions that aren't in press releases yet.
Beyond Google
Different engines index different content. Always check at least two:
- Bing supports most of the same operators with some additions (
linkfromdomain:,contains:). Often indexes content Google misses, especially Microsoft-hosted sites. - DuckDuckGo is largely Bing-powered with privacy guarantees. Same operators work.
- Yandex is the dominant Russian search engine and indexes Eastern European content Google doesn't prioritize. Russian-language OSINT often requires it.
- Baidu for Chinese-language content.
- The Internet Archive's Wayback Machine โ not a search engine per se, but searches across snapshots of historical websites. Many "deleted" pages remain here.
Ethics
Dorking is a tool. The ethics depend on what you do with the findings:
- Finding a publicly accessible PDF marked "confidential": legal. Tells you something about the organization's security posture.
- Downloading and using that PDF for unauthorized purposes: potentially illegal.
- Reporting the exposure to the organization through their responsible-disclosure channel: ethical and often appreciated.
- Publishing the contents on social media: usually unethical, sometimes illegal depending on the document content (trade secrets, personal data, etc.).
Three rules that keep practitioners on the right side of the line:
- Run dorks against your own organization first. Most legitimate use cases never need to leave that boundary.
- If you find someone else's exposure, your default action is "report it, don't exploit it."
- Document everything. Investigations that are clean enough to share with a journalist, a judge, and the subject are clean enough to be defensible.
Limits & gotchas
- Google de-indexes content over time. A dork that worked last year may produce no results today.
- Google rate-limits dorking traffic. Too many advanced queries from one IP triggers CAPTCHAs.
- Some operators are deprecated.
link:was removed years ago. Stay current with what works. - Robots.txt and noindex tags can keep content out of Google entirely, even if it's publicly accessible.
- Cached results may be stale. Always check the live URL too.
For the broader OSINT methodology, see our OSINT introduction. For internet-device search, see Shodan.io explained. For the full set of OSINT tools, see the OSINT toolkit.
- Google โ Refine searches with operators
- Microsoft Bing โ Bing advanced search help
- Internet Archive โ Wayback Machine
- Exploit-DB โ Google Hacking Database (defensive reference)