Attack Surface Documentation | Intel471 Skip to content

Attack Surface Documentation

Homepage Hero


What is SpiderFoot?

SpiderFoot is a reconnaissance tool that automatically queries over 100 public data sources (OSINT) to gather intelligence on IP addresses, domain names, e-mail addresses, names and more. You simply specify the target you want to investigate, pick which modules to enable and then SpiderFoot will collect data to build up an understanding of all the entities and how they relate to each other.

Up to table of contents

What is OSINT?

OSINT (Open Source Intelligence) is data available in the public domain which might reveal interesting information about your target. This includes DNS, Whois, Web pages, passive DNS, spam blacklists, file meta data, threat intelligence lists as well as services like SHODAN, HaveIBeenPwned? and more.

Up to table of contents

What can I do with SpiderFoot?

The data returned from a SpiderFoot scan will reveal a lot of information about your target, providing insight into possible data leaks, vulnerabilities or other sensitive information that can be leveraged during a penetration test, red team exercise or for threat intelligence. Try it out against your own network to see what you might have exposed!

Up to table of contents

SpiderFoot HX

Additional Capabilities

In addition to the data collection capabilities of the open source version, SpiderFoot HX takes things a step further with the following features:

  • No installation or setup needed at all. Once you register, everything is ready to go. No Python dependencies to install, no virtual machines to spin up or ensuring you have enough compute/memory/disk to run a large scan.
  • Investigations. Sometimes, you don’t want full automation of your scan and want to step through the data collection process step-by-step, module-by-module. Investigations provide you with a visual way to take full control of the scanning process.
  • Multi-target scanning. In cases where you have multiple entities (domains, e-mail addresses, etc.) related to the same target, you can supply them all as targets of the one scan. This enables SpiderFoot to better identify relationships and find relevant information.
  • Scans are faster. Thanks to the completely overhauled backend architecture of SpiderFoot HX, scans run up to 10x faster than the open source version. This means you get the data you need, faster.
  • OSINT monitoring. Run scans automatically on a daily, weekly or monthly basis at a time of your choice and have all changes between scans automatically tracked and alerted on.
  • Email notifications. Receive email notifications when SpiderFoot scans finish, or when scheduled scans identify changes between scan runs.
  • Slack integration. Prefer your notifications over Slack? No problem; input your Slack hook URL and you’ll see notifications in Slack for scan completions and/or change notifications from scheduled scans.
  • Import scan targets. When scanning many targets, it might be easier to load them in via CSV, or as exported from Hunchly.
  • More modules. SpiderFoot HX adds additional modules for UDP port scanning, identification of languages used in content and screenshotting of certain content like social media profiles, dark web sites and security-sensitive webpages such as those that accept credentials.
  • Reporting & Visualisations. Slice and dice your scan results by data type, data family, module, module category and data source. Look at each data point in-depth to see how it was discovered, its relationships and more.
  • Team collaboration. Got a team working on OSINT and threat intelligence? With SpiderFoot HX, you can have multiple users with role-based access control, collaborating on scans and investigations.
  • Annotations. Add notes to scan results and pull them out with the API for rich integrations with internal SIEM tools, investigative platforms and ticketing systems.
  • Security. Two-factor authentication (2FA), role-based access control and a fully locked down cloud infrastructure mean you don’t need to deal with the security of your OSINT platform and investigations.
  • Anonymous. SpiderFoot HX has TOR integration out of the box and provides no way for a scanned entity to know that it’s you doing the scanning.
  • Custom Scan Profiles. Got a particular combination of modules you like to use for your scans but don’t like having to define them each time? With SpiderFoot HX, you can define scan profiles and re-use them for future scans.
  • SpiderFoot HX API. The SpiderFoot HX API is a fully documented RESTful API that supports virtually all UI functions so you can orchestrate the platform and extract data programmatically.
Up to table of contents

Seeking Help


Aside from this document, you’ll be able to get help with SpiderFoot from a number of places:

Up to table of contents


Using Docker

If you would like to side-step having to install anything to get SpiderFoot running on Linux, follow the instructions here to run SpiderFoot in a Docker container.

Up to table of contents


SpiderFoot is written in Python 3, so to run on Linux/Solaris/FreeBSD/etc. you need Python 3.7+ installed, in addition to the various module dependencies (shown below).

Up to table of contents


If you’re using the legacy SpiderFoot 2.12 for Windows, you’ll have a compiled executable (.EXE) file and so all dependencies are packaged with it. No third party tools/libraries need to be installed, not even Python.

After version 2.12 however, SpiderFoot no longer ships with a .EXE file for running on Windows due to the stale nature of py2exe and inability to build some dependencies properly anymore on Windows.

Fortunately, with Python for Windows you can follow the below instructions to get SpiderFoot dependencies installed on Windows easily:

  1. Install Python for Windows
  2. Install PIP by downloading this file and running it with Python simply by doing: python
  3. (Optional if you want to run from the repository and not a packaged release) Install git
Up to table of contents


Installing on MacOS X is facilitated by using the Homebrew package manager to install Python 3.7+, pip and then installing SpiderFoot dependencies as you would on Linux:

  1. First, make sure you have Homebrew installed. Try running brew and if that doesn’t work, install it.
  2. Install Python 3.7+ with brew install python and this will also install pip
  3. (Optional if you want to run from the repository and not a packaged release) Install git with brew install git
Up to table of contents


SpiderFoot can be installed using git (this is the recommended approach as you’ll always have the latest version by simply doing a git pull), or by downloading a tarball of a release. The approach is the same regardless of platform:

Up to table of contents

From git

$ git clone
$ cd spiderfoot
~/spiderfoot$ pip install -r requirements.txt
Up to table of contents

As a package

$ wget
$ tar zxvf v3.5-final.tar.gz
$ cd spiderfoot
~/spiderfoot$ pip install -r requirements.txt
Up to table of contents

Running SpiderFoot

Web UI mode

To start SpiderFoot in Web UI mode, you need to tell it what IP and port to listen to. The below example binds SpiderFoot to localhost ( on port 5001:

~/spiderfoot$ python3 -l

Once executed, a web-server will be started, which will listen on You can then use the web-browser of your choice by browsing to Or, since version 2.10 you can use the CLI, which by default will connect to the server locally, on, or you can provide a URL of your server explicitly:

~/spiderfoot$ python3 -s

If you wish to make SpiderFoot accessible from another system, for example running it on a server and controlling it remotely using, then you can specify an external IP for SpiderFoot to bind to, or use so that it binds to all addresses, including

~/spiderfoot$ python3 -l

Then to use the CLI from a remote system where the file has been copied to, you would run:

$ python3 -u https://<remote ip>:5001

Run python3 ./ --help to better understand how to use the client CLI.

If port 5001 is used by another application on your system, you can change the port:

~/spiderfoot$ python3 -l

Once started, you will see something similar to this, which means you are ready to go. If you instead see an error message about missing modules, please go back and ensure you’ve installed all the pre-requisites.

~/spiderfoot$ python3 ./ -l
Attempting to verify database and update if necessary...
Starting web server at ...

Use SpiderFoot by starting your web browser of choice and
browse to https://<IP of this host>:5001

[08/Jul/2019:14:40:53] ENGINE Listening for SIGHUP.
[08/Jul/2019:14:40:53] ENGINE Listening for SIGTERM.
[08/Jul/2019:14:40:53] ENGINE Listening for SIGUSR1.
[08/Jul/2019:14:40:53] ENGINE Bus STARTING
[08/Jul/2019:14:40:53] ENGINE Serving on
[08/Jul/2019:14:40:53] ENGINE Bus STARTED


By default, SpiderFoot does not authenticate users connecting to its user-interface or serve over HTTPS, so avoid running it on a server/workstation that can be accessed from untrusted devices, as they will be able to control SpiderFoot remotely and initiate scans from your devices. As of SpiderFoot 2.7, to use authentication and HTTPS, see the Security section below.

Up to table of contents

Scan mode

New in SpiderFoot 3.0 is the ability to run SpiderFoot entirely via the command-line (without starting a web server) to run a scan. You can see all the available command-line arguments by using the --help flag:

~/spiderfoot$ python3 ./ --help
usage: [-h] [-d] [-l IP:port] [-m mod1,mod2,...] [-M] [-s TARGET]
[-t type1,type2,...] [-T] [-o tab|csv|json] [-n] [-r] [-S LENGTH]
[-D DELIMITER] [-f] [-F FILTER] [-x] [-q]

SpiderFoot 3.0: Open Source Intelligence Automation.

optional arguments:
-h, --help show this help message and exit
-d, --debug Enable debug output.
-l IP:port IP and port to listen on.
-m mod1,mod2,... Modules to enable.
-M, --modules List available modules.
-s TARGET Target for the scan.
-t type1,type2,... Event types to collect.
-T, --types List available event types.
-o tab|csv|json Output format. Tab is default.
-n Strip newlines from data.
-r Include the source data field in tab/csv output.
-S LENGTH Maximum data length to display. By default, all data is
-D DELIMITER Delimiter to use for CSV output. Default is ,.
-f Filter out other event types that weren't requested with
-F FILTER Filter out a set of event types.
-x STRICT MODE. Will only enable modules that can directly
consume your target, and if -t was specified only those
events will be consumed by modules. This overrides -t
and -m options.
-q Disable logging.

The command-line arguments are fairly self explanatory, however a few require some explaining. First, some simple examples…

The below example is running a scan against as a target, enabling a very simple module, sfp_dnsresolve which performs simple DNS resolutions of any identified IP addresses and hostnames.

~/spiderfoot$ python3 ./ -m sfp_dnsresolve -s
Attempting to verify database and update if necessary...
[*] Modules enabled (3): sfp_dnsresolve,sfp__stor_db,sfp__stor_stdout
[*] Scan [FA0D8528] initiated.
[*] Downloading configuration data from:
[*] Identifying aliases for specified target(s)
[*] Aliases identified: [{'value': '', 'type': 'IP_ADDRESS'}, {'value': '', 'type': 'INTERNET_NAME'}, {'value': b'', 'type': 'INTERNET_NAME'}]
[*] sfp_dnsresolve module loaded.
[*] sfp__stor_db module loaded.
[*] sfp__stor_stdout module loaded.
SpiderFoot UI Internet Name
sfp_dnsresolve IP Address
sfp_dnsresolve Domain Name (Parent) com
[*] Scan [FA0D8528] completed.
[*] Scan completed with status FINISHED

We can see that it is a little noisy, so we can add the -q flag to reduce output to just the data from the scan:

~/spiderfoot$ python3 ./ -m sfp_dnsresolve -s -q
Attempting to verify database and update if necessary...
SpiderFoot UI Internet Name
sfp_dnsresolve IP Address
sfp_dnsresolve Domain Name
SpiderFoot UI Domain Name

It’s also not necessary to specify any module, and just run all modules for your scan:

~/spiderfoot$ python3 ./ -s -q
Attempting to verify database and update if necessary...
WARNING: You didn't specify any modules or types, so all will be enabled.
SpiderFoot UI Internet Name
sfp_hunter Email Address [email protected]
sfp_emailrep Raw Data from RIRs/APIs {'references': 5, 'email': '[email protected]', 'suspicious': False, 'reputation': 'high', 'details': {'valid_mx': True, 'data_breach': True, 'suspicious_tld': False, 'spf_strict': False, 'days_since_domain_creation': 5315, 'last_seen': '10/16/2019', 'credentials_leaked_recent': False, 'malicious_activity': False, 'spoofable': True, 'dmarc_enforced': False, 'profiles': ['pastebin', 'twitter'], 'free_provider': False, 'disposable': False, 'deliverable': True, 'spam': False, 'domain_reputation': 'high', 'credentials_leaked': True, 'accept_all': True, 'new_domain': False, 'domain_exists': True, 'first_seen': '05/05/2012', 'malicious_activity_recent': False, 'blacklisted': False}}
sfp_email Email Address [email protected]
sfp_emailrep Hacked Email Address [email protected] [Unknown]

So far this has all been fairly simple. But if we want to do something a little more advanced, such as getting every possible e-mail address on a domain name (-t EMAILADDR), using only modules that take our target directly as input (referred to as “strict mode”, -x), and only get e-mail address but not any other data (-f), we can do the following:

~/spiderfoot$ python3 ./ -s -t EMAILADDR -f -x -q
Attempting to verify database and update if necessary...
sfp_hunter Email Address [email protected]
Up to table of contents



SpiderFoot will require basic digest authentication if a file named passwd exists in $HOME/.spiderfoot/passwd. The format of the file is simple – just create an entry per account, in the format of:


For example:


Once the file is created, restart SpiderFoot.

Up to table of contents


SpiderFoot will serve HTTPS (and only that) if it detects the existence of a public certificate and key file in SpiderFoot’s root directory. This means whatever port you set SpiderFoot to listen on is the port TLS/SSL will be used. It is not possible for SpiderFoot to serve both HTTP and HTTPS simultaneously on different ports. If you need to do that, an nginx proxy in front of SpiderFoot would be a better solution.

Simply place two files in the SpiderFoot directory – spiderfoot.crt (RSA public key in PEM format) and spiderfoot.key (RSA private key in PEM format). Restart SpiderFoot and you will now be serving HTTPS only.

A helper script has been provided for Linux users to generate self-signed certificate using OpenSSL, or you can follow the instructions in this StackOverflow article.

Up to table of contents

API Keys


Many SpiderFoot modules require API keys to function to their fullest extent (or at all), so you will need to go to each service and obtain an API key where you feel that having such a key would add value to your scans. Instructions for how to obtain each API key can be found within the Settings for the respective module:

Up to table of contents

Configuring SpiderFoot


One of the main principles behind SpiderFoot is to be highly configurable. Every setting is available in the user interface within the Settings section and should be adequately explained there. Just a few key points to note:

  • API keys can be imported and exported between SpiderFoot and SpiderFoot HX using the “Import API Keys” and “Export API Keys” functions. The format is also a simple CSV so can also be manipulated outside of SpiderFoot to be loaded in, if you prefer.
  • When Debugging is enabled, a lot of logs are generated and can sometimes result in error messages about database locking. This appears to be harmless towards the scan but can mean that logs get dropped.
  • It is worth going through the modules you intend to rely upon heavily to ensure they are configured appropriately for your needs, most importantly the DNS-related modules as they tend to have a knock-on impact to many other modules.
Up to table of contents

Using SpiderFoot

Running a Scan

When you run SpiderFoot in Web UI mode for the first time, there is no historical data, so you should be presented with a screen like the following:

To initiate a scan, click on the ‘New Scan’ button in the top menu bar. You will then need to define a name for your scan (these are non-unique) and a target (also non-unique):

You can then define how you would like to run the scan – either by use case (the tab selected by default), by data required or by module.

Module-based scanning is for more advanced users who are familiar with the behavior and data provided by different modules, and want more control over the scan:

Beware though, there is no dependency checking when scanning by module, only for scanning by required data. This means that if you select a module that depends on event types only provided by other modules, but those modules are not selected, you will get no results.
Up to table of contents

Scan Results

From the moment you click ‘Run Scan’, you will be taken to a screen for monitoring your scan in near real time:

That screen is made up of a graph showing a break down of the data obtained so far plus log messages generated by SpiderFoot and its modules.

The bars of the graph are clickable, taking you to the result table for that particular data type.

Up to table of contents

Browsing Results

By clicking on the ‘Browse’ button for a scan, you can browse the data by type:

This data is exportable and searchable. Click the Search box to get a pop-up explaining how to perform searches.

By clicking on one of the data types, you will be presented with the actual data:

The fields displayed are explained as follows:

  • Checkbox field: Use this to set/unset fields as false positive. Once at least one is checked, click the orange False Positive button above to set/unset the record.
  • Data Element: The data the module was able to obtain about your target.
  • Source Data Element: The data the module received as the basis for its data colletion. In the example above, the sfp_portscan_tcp module received an event about an open port, and used that to obtain the banner on that port.
  • Source Module: The module that identified this data.
  • Identified: When the data was identified by the module.

You can click the black icons to modify how this data is represented. For instance you can get a unique data representation by clicking the Unique Data View icon:

Up to table of contents

Setting False Positives

Version 2.6.0 introduced the ability to set data records as false positive. As indicated in the previous section, use the checkbox and the orange button to set/unset records as false positive.

Once you have set records as false positive, you will see an indicator next to those records, and have the ability to filter them from view, as shown below:

NOTE: Records can only be set to false positive once a scan has finished running. This is because setting a record to false positive also results in all child data elements being set to false positive. This obviously cannot be done if the scan is still running and can thus lead to an inconsistent state in the back-end. The UI will prevent you from doing so.

The result of a record being set to false positive, aside from the indicator in the data table view and exports, is that such data will not be shown in the node graphs.

Up to table of contents

Searching Results

Results can be searched either at the whole scan level, or within individual data types. The scope of the search is determined by the screen you are on at the time.

As indicated by the pop-up box when selecting the search field, you can search as follows:

  • Exact value: Non-wildcard searching for a specific value. For example, search for 404 within the HTTP Status Code section to see all pages that were not found.
  • Pattern matching: Search for simple wildcards to find patterns. For example, search for *:22 within the Open TCP Port section to see all instances of port 22 open.
  • Regular expression searches: Encapsulate your string in ‘/’ to search by regular expression. For example, search for ‘/\d+.\d+.\d+.\d+/’ to find anything looking like an IP address in your scan results
Up to table of contents

Managing Scans

When you have some historical scan data accumulated, you can use the list available on the ‘Scans’ section to manage them:

You can filter the scans shown by altering the Filter drop-down selection. Except for the green refresh icon, all icons on the right will all apply to whichever scans you have checked the checkboxes for.

Up to table of contents

Tor Integration

Refer to this post for more information.

Up to table of contents



SpiderFoot has all data collection modularised. When a module discovers a piece of data, that data is transmitted to all other modules that are ‘interested’ in that data type for processing. Those modules will then act on that piece of data to identify new data, and in turn generate new events for other modules which may be interested, and so on.

For example, sfp_dnsresolve may identify an IP address associated with your target, notifying all interested modules. One of those interested modules would be the sfp_ripe module, which will take that IP address and identify the netblock it is a part of, the BGP ASN and so on.

This might be best illustrated by looking at module code. For example, the sfp_names module looks for TARGET_WEB_CONTENT and EMAILADDR events for identifying human names:

 # What events is this module interested in for input
# * = be notified about all events.
def watchedEvents(self):

# What events this module produces
# This is to support the end user in selecting modules based on events
# produced.
def producedEvents(self):
return ["HUMAN_NAME"]

Meanwhile, as each event is generated to a module, it is also recorded in the SpiderFoot database for reporting and viewing in the UI.

Up to table of contents

Module List

To see a list of all SpiderFoot modules, run -M:

~/spiderfoot$ python3 ./ -M
Attempting to verify database and update if necessary...
Modules available:
sfp_abusech Check if a host/domain, IP or netblock is malicious according to
sfp_abuseipdb Check if a netblock or IP is malicious according to
sfp_accounts Look for possible associated accounts on nearly 200 websites like Ebay, Slashdot, reddit, etc.
sfp_adblock Check if linked pages would be blocked by AdBlock Plus.
sfp_ahmia Search Tor 'Ahmia' search engine for mentions of the target domain.
sfp_alienvault Obtain information from AlienVault Open Threat Exchange (OTX)
sfp_alienvaultiprep Check if an IP or netblock is malicious according to the AlienVault IP Reputation database.
sfp_apility Search Apility API for IP address and domain reputation.
sfp_archiveorg Identifies historic versions of interesting files/pages from the Wayback Machine.
sfp_arin Queries ARIN registry for contact information.
Up to table of contents

Data Elements

As mentioned above, SpiderFoot works on an “event-driven” module, whereby each module generates events about data elements which other modules listen to and consume.

The data elements are one of the following types:

  • entities like IP addresses, Internet names (hostnames, sub-domains, domains),
  • sub-entities like port numbers, URLs and software installed,
  • descriptors of those entities (malicious, physical location information, …) or
  • data which is mostly unstructured data (web page content, port banners, raw DNS records, …)

To see a full list of all the types available, run -T:

~/spiderfoot$ python3 ./ -T
Attempting to verify database and update if necessary...
Types available:
ACCOUNT_EXTERNAL_OWNED Account on External Site
AFFILIATE_COMPANY_NAME Affiliate - Company Name
AFFILIATE_DESCRIPTION_ABSTRACT Affiliate Description - Abstract
AFFILIATE_DESCRIPTION_CATEGORY Affiliate Description - Category
AFFILIATE_DOMAIN Affiliate - Domain Name
AFFILIATE_DOMAIN_NAME Affiliate - Domain Name
AFFILIATE_DOMAIN_UNRESOLVED Affiliate - Domain Name - Unresolved
AFFILIATE_DOMAIN_WHOIS Affiliate - Domain Whois
AFFILIATE_EMAILADDR Affiliate - Email Address

Up to table of contents

Writing a Module

To write a SpiderFoot module, start by looking at the file which is a skeleton module that does nothing. Use the following steps as your guide:

  1. Create a copy of to whatever your module will be named. Try and make this something descriptive, i.e. not something like sfp_mymodule.pybut instead something like if you were creating a module to analyse image content.
  2. Replace XXX in the new module with the name of your module and update the descriptive information in the header and comment within the module.
  3. The comment for the class (check in is used by SpiderFoot in the UI to correctly categorise modules, so make it something meaningful. Look at other modules for examples.
  4. Set the events in watchedEvents() and producedEvents() accordingly, based on the data element table in the previous section. If you are producing a new data element not pre-existing in SpiderFoot, you must create this in the database:
    • ~/spiderfoot$ sqlite3 spiderfoot.db sqlite> INSERT INTO tbl_event_types (event, event_descr, event_raw) VALUES ('NEW_DATA_ELEMENT_TYPE_NAME_HERE', 'Description of your New Data Element Here', 0, 'DESCRIPTOR or DATA or ENTITY or SUBENTITY');`
  5. Put the logic for the module in handleEvent(). Each call to handleEvent() is provided a SpiderFootEvent object. The most important values within this object are:
    • eventType: The data element ID (IP_ADDRESS, WEBSERVER_BANNER, etc.)
    • data: The actual data, e.g. the IP address or web server banner, etc.
    • module: The name of the module that produced the event (sfp_dnsresolve, etc.)
  6. When it is time to generate your event, create an instance of SpiderFootEvent:
    • e = SpiderFootEvent("IP_ADDRESS", ipaddr, self.__name__, event)
    • Note: the event passed as the last variable is the event that your module received. This is what builds a relationship between data elements in the SpiderFoot database.
  7. Notify all modules that may be interested in the event:
    • self.notifyListeners(e)
Up to table of contents



All SpiderFoot data is stored in a SQLite database (spiderfoot.db in your SpiderFoot installation folder) which can be used outside of SpiderFoot for analysis of your data.

The schema is quite simple and can be viewed in the GitHub repo.

The below queries might provide some further clues:

# Total number of scans in the SpiderFoot database
sqlite> select count(*) from tbl_scan_instance;

# Obtain the ID for a particular scan
sqlite> select guid from tbl_scan_instance where seed_target = '';

# Number of results per data type
sqlite> select count(*), type from tbl_scan_results where scan_instance_id = 'b459e339523b8d06235bd06087ae6c6017aaf4ed68dccea0b65a1999a17e460a' group by type;
Up to table of contents