Top 5 OSINT Sources for Threat Intelligence
Oct 07, 2021
Probably the most frequently asked question we get from SpiderFoot users is “with so many options available, what API keys should I get for my use case?” So, we asked hakluke and dccybersec to go on a mission and figure out the top 5 for the three most common SpiderFoot use cases: Penetration Tests / Bug Bounties, Threat Intelligence, and People Investigations. This is the second post in the three part series focusing on Threat Intelligence, and we hope you find it useful!
Keep in mind that all references to the pricing of these services were valid at the time of writing but are likely to have changed, so always visit the website of the service to get the latest pricing information.
Now let’s get started…
At its core, Threat Intelligence in Cyber Security is about leveraging data, which most often includes OSINT (Open Source Intelligence), to determine what actions are needed to help detect and prevent cyber threats before they are able to impact an organisation.
This post explores the requirements and data sources that Threat Intelligence teams can leverage to bolster their operations and automate some of the more tedious data collection and analysis tasks, creating room for the human element of Threat Intelligence, i.e. analysing the data and drawing meaningful conclusions.
The types of questions that we are most concerned with answering when performing Threat Intelligence include:
- What is the sophistication/relevance of the threat
- What infrastructure is used by the threat
- What is the threat targeting
- What techniques and methodologies is the threat leveraging
Not all of this information is available through OSINT, however OSINT can be used to supplement internal log and other data to build a more complete picture and support these aims. For this post, we’ve also only focused on sources with an easily accessible free tier and API access.
And for the same reason you shouldn’t rely on just one news source for understanding what’s going on in the world, you shouldn’t rely on just one Threat Intelligence data source. Intelligence comes from combining multiple sources of information and synthesizing it into something meaningful for your organisation.
With threat intelligence at hand, organisations can better monitor, investigate and block threats from impacting their operations. In some cases it can be used to support Law Enforcement in the apprehension of the criminals behind the attacks.
Benchmarking the data sources
It was quite difficult to narrow the list down to just five sources as there are so many different data sources out there. Each of them has a unique use-case, and some of them specialise in a very specific type of data source, while others generalise. For these reasons it is very difficult to do a direct comparison of each data source. Attempts at comparison via benchmarking simply don’t make sense. As such, this blog post should not be interpreted as a comparison of the services we mention, but a good blend of services that help answer the questions outlined in the requirements section above.
AlienVault (Alien Labs since the AT&T acquisition) OTX (Open Threat Exchange) is an open Threat Intelligence community that provides a rich user experience for sharing “pulses” (their terminology for what are almost like news headlines about observed threats, e.g. “New macOS Malware Spreads Via Sponsored Search Results“) and IOCs (Indicators of Compromise – file hashes, IP addresses, domains, etc.) and consuming them from others in the community, supplemented by data from Alien Labs. Community members can comment, tag and subscribe for updates so that you may receive notifications as more information arrives, and much more.
The best part – it’s free!
The quality of the data available in OTX is determined by the community and supplemented by information from Alien Labs, so are getting the best of both worlds; a reputable company providing researched and vetted content plus a wealth of free-flowing information provided by the community. And while there is a high chance of false positives in the community-derived data, the richness of the data gives analysts the ability to decide on the relevance to them and also decide which community members they want to pay more attention to. For instance, some community members have high rankings through their regular submissions of honeypot data.
Aside from IOCs, OTX also provides passive DNS data, enabling you to identify other hosts/domains using the IP address you query – helpful for identifying other malicious hosts on the same infrastructure but also supporting other use cases like attack surface management.
OTX provide an SDK for Java, Python and Golang but of course you can also interact with their RESTful API directly through cURL. The documentation on how to use the SDKs and APIs is of good quality and easy to understand. You can find the API documentation on the AlienVault OTX website (https://otx.alienvault.com/api) which includes pointers to many open source tools already using their API, in case you need examples to work from.
GreyNoise is a unique service that aims to separate the background noise of the Internet from genuine malicious activity. Why is this useful? Well, if you’re receiving alerts in your SIEM about IPs scanning your network or doing other things that might cause for some concern, knowing whether it is a targeted attack or something everyone on the Internet is facing can be incredibly valuable in prioritising investigative efforts. This is particularly useful these days when you have a number of services like SHODAN, Censys and more which are regularly scanning the whole Internet, including your network!
GreyNoise provide two API tiers – a free Community API and an API for Enterprise customers, the difference primarily being that the Enterprise API offers additional context data and the ability to use their more sophisticated GNQL query language as opposed to simple IP address lookup. Pricing is unfortunately not available on their website.
The quality of GreyNoise’s data is determined largely by the visibility of their sensors across the Internet, supplemented by their own analysis and enrichment capabilities. Aside from recognizing the background noise of the Internet, they are also actively researching and identifying widespread malicious behavior. Their Twitter feed is one to keep a close eye on and gives insight into the new threats being observed. Their data also includes OS fingerprinting, company information and geo-location data.
GreyNoise’s API is very well documented (https://docs.greynoise.io/) and has various different use cases to help you with your threat intelligence investigations. The Community API is currently running on version 3 (https://docs.greynoise.io/reference/get_v3-community-ip), with examples available in Python, Ruby, Node, cURL, PHP, and many others.
VirusTotal (a part of Google) is a web and API based application that allows you to upload potentially malicious files to detect malware and automatically share the results with the VirusTotal community. Its simple web interface has made it a popular product since its inception in 2012, and has since become one of the most popular community malware analysis platforms and Threat Intelligence sources on the market today.
VirusTotal is a free to use product, and they have a premium service offering available. Pricing is not available as the premium service is custom built to your needs. The premium service offering allows you access to the VirusTotal intelligence, hunting, graph, monitor and unlimited API with support for each service. The main difference between the free and premium service is;
- The Public API is limited to 500 requests per day and a rate of 4 requests per minute.
- The Public API must not be used in commercial products or services.
- The Public API must not be used in business workflows that do not contribute new files.
- You are not allowed to register multiple accounts to overcome the limitations.
As with AlienVault, the combination of community-derived as well as vetted data provides tremendous value. Being such a long-time player in this space with integrations into so many platforms and backed by Google, there is a high chance that if there is a malicious binary floating around on the Internet, VirusTotal has seen it, analysed it and parsed out IOCs for your use.
The API functionality of VirusTotal is extremely well documented (https://developers.virustotal.com/reference#overview) and provides great insight into how you can utilise the API for your Threat Intelligence operations. The API follows the REST principles and uses predictable, resource orientated URLs.
For more than a decade, RiskIQ (recently acquired by Microsoft) has been crawling the Internet and absorbing the data publicly available to enable RiskIQ users to discover threats across their digital attack surfaces. Through fingerprinting techniques of IP connected devices and other known infrastructure, you as a user are able to see when and how an attacker may be targeting your own network infrastructure using data captured by their sensors.
Pricing for RiskIQ is not publicly available on their website, but they do offer free tier usage of their API with up to 3,500 API queries per month.
The quality of data obtainable through RiskIQ is quite useful when actively searching for intelligence on threats as it covers a lot of different areas and integrates with other threat intelligence data sources, including VirusTotal, for instance. This includes information on Domains, URLs, Hosts, IPs, Tags, Certificates and Articles, which can all be searched from through the web search function or through the API.
The RiskIQ API follows the principles and guidelines of REST and sample code in Python, Ruby, cURL and Rust is provided.
IBM X-Force Exchange
IBM X-Force Exchange is a cloud based threat intelligence sharing platform that allows you to gather data from the latest global security events, aggregate actionable intelligence, and collaborate with other interested parties through their community offering. X-Force Exchange was originally created by ISS (Internet Security Systems) in 1996 and was bought by IBM in 2006.
X-Force Exchange is free to use via the web interface as well as through the API for up to 5,000 records per month. For users who require more record searches, there is a commercial license available for $2,000 per user per 10,000 records per month.
X-Force Exchange offers intelligence on:
- IP and URL reputation
- web applications
Users can then enhance their security insights with machine-generated intelligence and curated human-generated insights from IBM X-Force researchers available via public case file collections on the latest malware campaigns and threats. The quality of the data obtained is very in depth and allows you to pivot to other information with ease. Users can enhance their security insights with machine generated threat intelligence through the X-Force Exchange to then cross correlate the data against relevant sources.
The documentation on X-Force Exchange (https://api.xforce.ibmcloud.com/doc/) follows the principles and guidelines of REST and can be utilised with the free tier product. The documentation has a lot of great use cases to utilise as templates and build your own API queries from.
Other data sources
Below are a few data sources that did not make the list, but they are other excellent data sources that are worth a look due to the richness of data they offer.
- Farsight (DNSDB)
It isn’t easy to compare each of the services directly as they all focus on unique points of data for Threat Intelligence. For your Threat Intelligence operations, the decision on which ones to use greatly depends on the information you want to gather and in which method you want to gather that information. When SpiderFoot users ask us which APIs they should subscribe to for use in SpiderFoot, we generally recommend these. By utilising these data sources in this way, you will be able to gather all of the information you want into one dashboard for easy navigation, analysis and reporting.