Top 5 OSINT Sources for Attack Surface Management

Probably the most frequently asked question we get from SpiderFoot users is “with so many options available, what API keys should I get for my use case?” So, we asked hakluke and dccybersec to go on a mission and figure out the top 5 for the three most common SpiderFoot use cases: Penetration Tests / Bug Bounties, Threat Intelligence, and People Investigations. This is the fourth and bonus post in the series focusing on Attack Surface Management which draws upon the previous posts for those wanting to build a comprehensive view of their attack surface beyond just hosts and open ports. We hope you find it useful!

Organizations all around the world have been actively ramping up attack surface management programs recently to keep up with attackers, who are utilising the same methods to find forgotten (and vulnerable) assets owned by organizations. Attack surface management is an excellent undertaking, but most organizations are simply not delving deep enough into their asset discovery to protect themselves against motivated attackers.

In this article, we are going to dive into why you need to incorporate more diverse OSINT and/or data sources as part of your ASM pipeline (TL;DR – because hackers are already doing it), and give some guidance on how you can do this without adding too much operational overhead by leveraging automation.

Where Attackers Go, Defenders Must Follow

For security defense to be effective, it is important that it is up to date with the most recent attack vectors. Take antivirus for example, it is only effective if it is frequently updated against the latest malware in circulation. The same is true with attack surface monitoring.

The whole point of attack surface monitoring is to give defenders a fighting chance at defending their own attack surface by providing visibility. Attackers are constantly looking for new methods of discovering an organization’s attack surface – especially those parts that the organization is unaware of. As such, an ASM program is only effective if it also utilises the most up-to-date, comprehensive asset discovery techniques and data sources.

In recent years, attackers have seriously upped their game when it comes to asset discovery. There has been an explosion of open source tools and data sources that have been designed specifically to uncover attack surfaces of organizations. As a defender, how can you keep up? If you are responsible for an ASM program in an organization – it’s time to rethink your asset discovery, and there’s so much more to ASM than just enumerating subdomains and port scanning.

Utilizing Advanced OSINT for ASM

When a motivated attacker decides to profile your organization, what will they find? They certainly will not stop at subdomain names. Here are some of the data that they might find useful:

Public code repositories
Employees on LinkedIn and other social media profiles
Breach/Leak data (doesn’t have to be from your organisation!)
Third-party (e.g. cloud) services and applications
Phone numbers
Email addresses (great for phishing attempts)
Public documents (including meta data)
Web content (great for scraped content like names, email addresses and more)
Whois, including historical Whois data
Passive DNS data, including historical DNS data
IP address space / ASN
Cloud infrastructure
Malicious / Block-list data feeds

There is an extremely high chance that your current ASM program does not detect most of these data points, even though they are part of your attack surface.

If you’re like most organizations, there’s a good chance that you don’t have a security budget to blow on hiring an OSINT expert to perform OSINT on your organization full-time, so what’s the solution? There are a few things you can do to bolster your ASM and gain an advantage over attackers, so let’s look at a few.

Gathering Diverse, Comprehensive Data

There are three aspects to this. Firstly, you want to diversify your data sources which will ensure that you have good coverage over the types of data that you collect. Secondly you want to diversify the types of data that you gather (subdomains, emails, phone numbers, etc.). Then finally you want to correlate that data to surface low hanging fruit and points of interest for rapid risk reduction.

Below are some excellent examples of data sources that can be utilised to bolster your attack surface management program. Each source provides different data types, and correlation can be performed using a range of tooling out there (more on that later).

Of very high importance for inclusion in this list was that each source provides API access, since correlating the data across these sources manually would be quite the headache. Generous free tiers help too!

Shodan

Shodan is the world’s first search engine for Internet-connected devices and provides a very generous free tier, feature-rich API and a ton of data about open ports, captured port banners, web analytics IDs, IP geolocation data, OS fingerprinting and detected CVEs (vulnerabilities). To capture data about assets on your network without having to do any scanning yourself, Shodan is fantastic and still one of the best out there.

Similar services include Censys, Spyse and BinaryEdge, and each have their unique edge over Shodan in their own way. For example, BinaryEdge includes torrent data, so you could potentially search for devices within your network seeding torrents.

SecurityTrails

SecurityTrails strives to make the biggest treasure-trove of cyber intelligence data readily available in an instant. A lot of value is to be found in their passive and historical DNS and Whois data, which can provide a lot of insight in cases where historical data (since deleted) reveals useful information for attackers.

Consider the classic example of a web server placed behind CloudFlare for DDoS protection. What if the owners only update the DNS record for that server to point to CloudFlare, but not the actual server IP? Well, there’s a historical DNS record for the website that still has the old IP, meaning there’s a good chance the server can still be reached from the Internet directly, completely bypassing CloudFlare.

Historical Whois data is also very useful in a time where most Whois data is redacted, meaning the historical data still likely contains useful information enabling attackers to find other domains owned by your organisation because they are registered by the same email address.

Similar services in the historical/passive DNS and Whois data realm include ViewDNS.info, WhoisXMLAPI, RiskIQ.

Hunter

Hunter lets you find professional email addresses on a given domain name, and is targeted more towards sales and marketing people looking to find leads, but it’s also very useful for OSINT. Their “domain search” tool lists all the people working in a company with their name and email address found on the web. With 100+ million email addresses indexed, effective search filters and scoring, it’s a very powerful email-finding tool.

Believe it or not, your attack surface definitely includes the email addresses on your company domains. Hunter will reveal which of those email addresses are public, names associated with them and more. Especially in organisations with many employees, knowing which email addresses and identities are available in public databases like Hunter can help security teams prioritise threat intelligence and monitoring activities.

Looking for more information about email addresses such as reputation and social media presence, check out Emailrep, Social Links and Seon.

Intelligence X

Intelligence X maintains a large database of data in a similar vein to Shodan, but includes breach and leak data, as well as working with a range of search selectors, i.e. specific search terms such as email addresses, domains, URLs, IPs, CIDRs, Bitcoin addresses, IPFS hashes, etc. It searches in places such as the darknet, document sharing platforms, Whois data, public data leaks/breaches and others and keeps a historical data archive of results, similar to how the Wayback Machine stores historical copies of websites.

If you want to focus purely on breach data, you may want to also check out Dehashed, which includes credentials from the breaches and enables searching only by domain name, but offers no free API tier. HaveIBeenPwned is another good option, which doesn’t include credentials or enable searching by domain, but costs basically nothing for API access.

AlienVault OTX

AlienVault Open Threat Exchange (OTX) is the neighborhood watch of the global intelligence community. It enables private companies, independent security researchers, and government agencies to openly collaborate and share the latest information (typically IOCs, or Indicators of Compromise) about emerging threats, attack methods, and malicious actors. Aside from IOCs, OTX also provides passive DNS data, enabling you to identify other hosts/domains using the IP address you query – helpful for identifying other malicious hosts on the same infrastructure but also supporting other use cases like attack surface management. Best of all, it’s free.

Similar services with generous free tiers include IBM X-Force Exchange, GreyNoise and VirusTotal.

Playing Your Strengths

In many cases, attackers do have an advantage over defenders. One of the most common security tropes is “defenders need to plug all the holes, attackers need only find one!”. Having said this, if you are working in a blue team capacity, you do have many operational advantages, and you should take full advantage of them.

Some examples are:

Import your hosts files directly from your DNS to bulk out your asset inventory
Import host lists from cloud services directly into your asset inventory
Utilise the security and monitoring tools that your cloud provider offers
Monitor all company emails for breach data

All of these data points (comprehensive host lists, emails, etc.) are difficult to obtain from an attacker’s viewpoint, but trivial for a blue team.

Automate All the Things!

If you analysed and correlated all of the above data sources one by one, you would be left with a huge amount of raw data and a tired brain. How can we organize this data in a way that is actually useful and actionable?

This is exactly what automation is for and what SpiderFoot is designed to do. SpiderFoot makes use of over 200 modules, including integrations for the data sources above, to collate the data and present the OSINT in a way that is easily consumable.

If that sounds interesting, you can check out the open source version on GitHub. If you decide that you would like to perform ongoing monitoring, and utilise a fully scalable hosted solution, you can also register a free SpiderFoot HX account and check out the trial.