This project is only partially complete and I have yet to implement many of the features described in the following blog post I made: https://penafieljlm.com/2017/07/14/inquisitor/.
Inquisitor is a simple tool for gathering information on companies and organizations through the use of Open Source Intelligence (OSINT) sources. It is heavily inspired from how Maltego and recon-ng operates, and the tool pretty much re-implements some of the features of those tools but adds an additonal layer of opinion-based semantics on top of asset types in order to create an easy-to-use workflow.
The key features of Inquisitor include:
- The ability to cascade the ownership label of an asset (e.g. if a Registrant Name is known to belong to the target organization, then the hosts and networks registered with that name shall be marked as belonging to the target organization)
- The ability transform assets into other potentially related assets through querying open sources such as Google and Shodan
- The ability to visualize the relationships of those assets through a zoomable pack layout
The whole concept of Inquisitor revolves around the idea of extracting information from open sources based on what is already known about a target organization. In the context of Inquisitor these are called "transforms". Related information may also be immidiately retrieved from an known asset based on metadata also retrievable from open sources such as whois and internet registries.
The concepts are discussed in further detail in this blog article: https://penafieljlm.com/2017/07/14/inquisitor/
To install Inquisitor, simply clone the repository, enter it, and execute the installation script.
pip install Cython click
git clone [email protected]:penafieljlm/inquisitor.git
cd inquisitor
python setup.py install
Inquisitor has five basic commands which include scan
, status
, classify
, dump
, and visualize
.
usage: inq [-h] {scan,status,classify,dump,visualize} ...
optional arguments:
-h, --help show this help message and exit
command:
{scan,status,classify,dump,visualize}
The action to perform.
scan Search OSINT sources for intelligence based on known
assets belonging to the target.
status Prints out the current status of the specified
intelligence database.
classify Classifies an existing asset as either belonging or
not belonging to the target. Adds a new asset with the
specified classification if none is present.
dump Dumps the contents of the database into a JSON file
visualize Create a D3.js visualization based on the contents of
the specified intelligence database.
In scan mode, the tool runs all available transforms for all the assets you have in your Intelligence Database. Make sure to create API Keys for the various OSINT sources indicated below and provide it to the script lest the transforms using those sources be skipped. Also, make sure you seed your Intelligence Database with some known owned target assets using the classify
command first because if the database does not contain any owned assets, there will be nothing to transform.
usage: inq scan [-h] [--google-dev-key GOOGLE_DEV_KEY]
[--google-cse-id GOOGLE_CSE_ID]
[--google-limit GOOGLE_LIMIT]
[--shodan-api-key SHODAN_API_KEY]
[--shodan-limit SHODAN_LIMIT]
DATABASE
positional arguments:
DATABASE The path to the intelligence database to use. If
specified file does not exist, a new one will be
created.
optional arguments:
-h, --help show this help message and exit
--google-dev-key GOOGLE_DEV_KEY
Specifies the developer key to use to query Google
Custom Search. Visit the Google APIs Console
(http://code.google.com/apis/console) to get an API
key. If notspecified, the script will simply skip
asset transforms that involve Google Search.
--google-cse-id GOOGLE_CSE_ID
Specifies the custom search engine to query. Visit the
Google Custom Search Console
(https://cse.google.com/cse/all) to create your own
Google Custom Search Engine. If not specified, the
script will simply skip asset transforms that involve
Google Search.
--google-limit GOOGLE_LIMIT
The number of pages to limit Google Search to. This is
to avoid exhausting your daily quota.
--shodan-api-key SHODAN_API_KEY
Specifies the API key to use to query Shodan. Log into
your Shodan account (https://www.shodan.io/) and look
at the top right corner of the page in order to view
your API key. If not specified, the script will simply
skip asset transforms that involve Shodan.
--shodan-limit SHODAN_LIMIT
The number of pages to limit Shodan Search to. This is
to avoid exhausting your daily quota.
In status mode, the tool simply prints out a quick summary of the status of your scan database.
usage: inq status [-h] [-s] DATABASE
positional arguments:
DATABASE The path to the intelligence database to use. If specified
file does not exist, a new one will be created.
optional arguments:
-h, --help show this help message and exit
-s, --strong Indicates if the status will be based on the strong ownership
classification.
In classify mode, you will be able to manually add assets and re-classify already existing assets in the Intelligence Database. You should use this command to seed your Intelligence Database with known owned target assets.
usage: inq classify [-h] [-ar REGISTRANT [REGISTRANT ...]]
[-ur REGISTRANT [REGISTRANT ...]]
[-rr REGISTRANT [REGISTRANT ...]]
[-ab BLOCK [BLOCK ...]] [-ub BLOCK [BLOCK ...]]
[-rb BLOCK [BLOCK ...]] [-ah HOST [HOST ...]]
[-uh HOST [HOST ...]] [-rh HOST [HOST ...]]
[-ae EMAIL [EMAIL ...]] [-ue EMAIL [EMAIL ...]]
[-re EMAIL [EMAIL ...]]
[-al LINKEDIN [LINKEDIN ...]]
[-ul LINKEDIN [LINKEDIN ...]]
[-rl LINKEDIN [LINKEDIN ...]]
DATABASE
positional arguments:
DATABASE The path to the intelligence database to use. If
specified file does not exist, a new one will be
created.
optional arguments:
-h, --help show this help message and exit
-ar REGISTRANT [REGISTRANT ...], --accept-registrant REGISTRANT [REGISTRANT ...]
Specifies a registrant to classify as accepted.
-ur REGISTRANT [REGISTRANT ...], --unmark-registrant REGISTRANT [REGISTRANT ...]
Specifies a registrant to classify as unmarked.
-rr REGISTRANT [REGISTRANT ...], --reject-registrant REGISTRANT [REGISTRANT ...]
Specifies a registrant to classify as rejected.
-ab BLOCK [BLOCK ...], --accept-block BLOCK [BLOCK ...]
Specifies a block to classify as accepted.
-ub BLOCK [BLOCK ...], --unmark-block BLOCK [BLOCK ...]
Specifies a block to classify as unmarked.
-rb BLOCK [BLOCK ...], --reject-block BLOCK [BLOCK ...]
Specifies a block to classify as rejected.
-ah HOST [HOST ...], --accept-host HOST [HOST ...]
Specifies a host to classify as accepted.
-uh HOST [HOST ...], --unmark-host HOST [HOST ...]
Specifies a host to classify as unmarked.
-rh HOST [HOST ...], --reject-host HOST [HOST ...]
Specifies a host to classify as rejected.
-ae EMAIL [EMAIL ...], --accept-email EMAIL [EMAIL ...]
Specifies a email to classify as accepted.
-ue EMAIL [EMAIL ...], --unmark-email EMAIL [EMAIL ...]
Specifies a email to classify as unmarked.
-re EMAIL [EMAIL ...], --reject-email EMAIL [EMAIL ...]
Specifies a email to classify as rejected.
-al LINKEDIN [LINKEDIN ...], --accept-linkedin LINKEDIN [LINKEDIN ...]
Specifies a LinkedIn Account to classify as accepted.
-ul LINKEDIN [LINKEDIN ...], --unmark-linkedin LINKEDIN [LINKEDIN ...]
Specifies a LinkedIn Account to classify as unmarked.
-rl LINKEDIN [LINKEDIN ...], --reject-linkedin LINKEDIN [LINKEDIN ...]
Specifies a LinkedIn Account to classify as rejected.
In dump mode, you will be able to dump the contents of the Intelligence Database into a human-readable JSON file.
usage: inq dump [-h] [-j FILE] [-a] DATABASE
positional arguments:
DATABASE The path to the intelligence database to use. If
specified file does not exist, a new one will be
created.
optional arguments:
-h, --help show this help message and exit
-j FILE, --json FILE The path to dump the JSON file to. Overwrites existing
files.
-a, --all Include rejected assets in dump.
In visualize mode, you will be able to acquire a hierarchical visualization of the Intelligence Repository.
usage: inq visualize [-h] [-l] DATABASE
positional arguments:
DATABASE The path to the intelligence database to use. If specified file
does not exist, a new one will be created.
optional arguments:
-h, --help show this help message and exit
-l, --last Simply open the last visualization generated instead of creating
a new one.
Now that you know the basic features of Inquisitor, it's time you learn how to actually use it. Inquisitor has been written with the following steps in mind:
In this step, your Intelligence Database doesn't have anything in it yet. We're going to have to start somewhere so go ahead and seed the database with assets that you know belong to your target organization. You can do this using the classify
command.
Now that the database has assets that are known to belong to your target organization. You can then proceed with scanning. You can do this using the scan
command.
When you invoke the scan
command on your Intelligence Database, Inquisitor proceeds to run the transform
methods of assets that are classified as accepted
. Once scanning is finished, you're going to end up with more assets that might potentially belong to your target organization.
If you don't end up with any new assets, you can either seed your Intelligence Database with new information, or simply proceed to wrap up the process by proceeding to the Reporting step.
While Inquisitor performs automatic asset classification for you, it might end up missing some assets that do, in fact, belong to your target organization.
When this happens, you're going to have to check the database contents and manually classify the assets. Usually, you'd want to pay attention to Registrant assets as there is no way to automatically determine ownership for that asset type. Also most other asset types rely on the ownership classification of Registrant assets in order to determine whether they belong to your target or not, so it's definitelty best to pay attention to your Registrant assets. Additionally, you don't end up with a lot of Registrant assets in the first place so it's not going to be that hard sifting through them.
You can generate a visualization of the assets that belong to your target organization using the visualize
command or the dump
command.
I have video ddemonstrations of the tool running in the following link: https://drive.google.com/open?id=0B_O70BVu38TRclo5dWRBWkdTTWc
I wasn't able to fully record the run of the scan command though since my free screen recorder only records up to 10 minutes.
The the Inquisitor project is laid out in the following format:
.
|-- README.md
|-- inquisitor
| |-- __init__.py
| |-- assets
| | |-- __init__.py
| | |-- block.py
| | |-- email.py
| | |-- host.py
| | |-- linkedin.py
| | `-- registrant.py
| |-- extractors
| | |-- __init__.py
| | `-- emails.py
| `-- sources
| |-- __init__.py
| |-- google_search.py
| `-- shodan_search.py
|-- inq
|-- report
| `-- index.html
|-- setup.py
`-- tests
|-- __init__.py
`-- test_inq.py
It has three main modules named assets
, extractors
, and sources
. The main script is called inq
.
As a developer you would mostly be interested in adding new types of assets into the system so the developer guide would mostly focus on that.
Before we move on to actually implementing asset classes, we would first need to understand how to interact with the Intelligence Database as we will be interacting with it when we derive related assets from our asset classes.
The source code for the Intelligence Database is stored in the inquisitor/__init__.py
file. The actual name for the logical wrapper of the Intelligence Database is called IntelligenceRepository
.
You only need to call the IntelligenceRepository.get_asset_string
function from asset classes as appending new assets onto the Intelligence Database is the responsibility of the scan
module in the inq
script. You would mostly use this function to create instances of assets or retrieve them from the database if they exist. This function is important when returning assets from the related
and transform
functions of your asset classes as creating new asset objects is expensive since some of them use network resources during initialization.
Function
IntelligenceRepository.get_asset_string(asset_type, identifier, create=False, store=False)
Description
Retrieves the primary key and asset object for the asset with the provided
type and identifier.
Parameters
asset_type: class, required
The type of the asset to retrieve from the Intelligence Database. You
will actually have to pass the class object of the asset type you want
to retrieve.
identifier: any, required
The identifier of the asset to retrieve. Consider the identifier as the
unique attribute of an asset object. As for which attribute is to be
used to identify an asset, it depends on the contents of the OBJECT_ID
variable in the asset module.
create: bool, optional, default=False
When no matching asset object is found, a new one will be created and
returned if this parameter is set to True. The new asset will not
necessarily be stored in the Intelligence Database unless specified
using the "store" parameter. However, I suggest you do not do this as
adding assets to the Intelligence Database is the responsibility of
another module.
store: bool, optional, default=False
When a new asset is created when none is found, the new one will be
stored in the Intelligence Database. As said previously, I suggest that
you do not do this as adding assets to the Intelligence Database is the
responsibility of another module.
Returns
A two-element tuple where the first element is the database primary key of
the element returned, and the second element is the deserialized asset
object retrieved from the database.
None if the asset was not found.
If the asset was not found and the create flag was set to True, the primary
key member of the tuple will be set to None.
To create a new asset type, create a new file inside the inquisitor/assets
directory and paste the following skeleton code inside:
import inquisitor.assets
class ASSET_NAMEValidateException(Exception):
pass
def canonicalize(ASSET_IDENTIFIER):
return ASSET_IDENTIFIER
def main_classify_args(parser):
parser.add_argument(
'-aASSET_NAME_LETTER', '--accept-ASSET_NAME',
metavar='ASSET_NAME',
type=canonicalize,
nargs='+',
help='Specifies a ASSET_NAME to classify as accepted.',
dest='ASSET_NAMEs_accepted',
default=list(),
)
parser.add_argument(
'-uASSET_NAME_LETTER', '--unmark-ASSET_NAME',
metavar='ASSET_NAME',
type=canonicalize,
nargs='+',
help='Specifies a ASSET_NAME to classify as unmarked.',
dest='ASSET_NAMEs_unmarked',
default=list(),
)
parser.add_argument(
'-rASSET_NAME_LETTER', '--reject-ASSET_NAME',
metavar='ASSET_NAME',
type=canonicalize,
nargs='+',
help='Specifies a ASSET_NAME to classify as rejected.',
dest='ASSET_NAME_rejected',
default=list(),
)
def main_classify_canonicalize(args):
accepted = set(args.ASSET_NAMEs_accepted)
unmarked = set(args.ASSET_NAMEs_unmarked)
rejected = set(args.ASSET_NAME_rejected)
redundant = set.intersection(accepted, unmarked, rejected)
if redundant:
raise ValueError(
('Conflicting classifications for ASSET_NAMEs '
': {}').format(list(redundant))
)
accepted = set([canonicalize(a) for a in accepted])
unmarked = set([canonicalize(a) for a in unmarked])
rejected = set([canonicalize(a) for a in rejected])
return (accepted, unmarked, rejected)
class ASSET_NAME(inquisitor.assets.Asset):
def __init__(self, ASSET_IDENTIFIER, owned=None):
super(self.__class__, self).__init__(owned=owned)
self.ASSET_IDENTIFIER = canonicalize(ASSET_IDENTIFIER)
# TODO: Perform other initialization actions here
def __eq__(self, other):
if not isinstance(other, self.__class__):
return False
return self.ASSET_IDENTIFIER == other.ASSET_IDENTIFIER
def related(self, repo):
# Prepare the results
results = set()
# TODO: Create related assets here based on the attributes of this asset
# Return the results
return results
def transform(self, repo, sources):
# Prepare the results
assets = set()
# Google Transforms
if sources.get('google'):
subassets = self.cache_transform_get('google', repo)
if not subassets:
# Acquire API
google = sources['google']
# TODO: Perform Google queries here and the results to 'subassets'
# Cache The Transform
self.cache_transform_store('google', subassets)
assets.update(subassets)
# Shodan Transforms
if sources.get('shodan'):
subassets = self.cache_transform_get('shodan', repo)
if not subassets:
# Acquire API
shodan = sources['shodan']
# TODO: Perform Google queries here and the results to 'subassets'
# Cache The Transform
self.cache_transform_store('shodan', subassets)
assets.update(subassets)
# Return the results
return assets
def is_owned(self, repo):
if self.owned:
return True
# TODO: Automatically determine ownership based on repo contents
return False
def parent_asset(self, repo):
# TODO: Return parent asset based on repo contents
return None
REPOSITORY = 'ASSET_REPOSITORY'
ASSET_CLASS = ASSET_NAME
OBJECT_ID = 'ASSET_IDENTIFIER'
Now replace the following strings with the appropriate values
ASSET_NAME
: Proper name of your asset (e.g. Registrant, Host, etc.)ASSET_IDENTIFIER
: The name of the identifier attribute of your assetASSET_NAME_LETTER
: The first letter of your asset in lowercaseASSET_REPOSITORY
: Lower case of the plural form of your asset name
Finally, in inquisitor/__init__.py
, register your asset in the ASSET_MODULES
list. Make sure you import your new asset from the file in question.
Congratulations! By this point, you now have a new working asset type!
However, you are going to need to implement the following methods to make sure your assets get correlated with other asset types:
Function
related
Description
Returns the set of assets directly related to the asset in question (i.e.
those that can be derived without querying a search engine).
When creating asset objects, make sure you use the
IntelligenceRepository.get_asset_string method instead of instatiating a
new one your self so the asset can be returned from the repository if it
exists.
Set the create flag to True when calling the method in question in order
to return a new object when one isn't found.
Set the store flag to False as appending assets is the job of another
module.
Parameters
repo: IntelligenceRepository
The Intelligence Repository that is being used in the current context.
Returns
Set of assets directly related to the asset in question.
Function
transform
Description
Returns the set of assets potentially related to the asset in question
(i.e. those that can be derived by querying a search engine).
You may access search engine objects through the provided sources
parameter.
Each search engine object has a transform method which automatically
creates asset objects for you. You just need to provide it the repository
and your query string, and then append the objects it returns to the set
of assets to be returned by your asset's transform method.
Parameters
repo: IntelligenceRepository
The Intelligence Repository that is being used in the current context.
sources: dict
The list of search engine objects that are available for use.
Returns
Set of assets potentially related to the asset in question.
Function
is_owned
Description
Determines if there is high confidence that this asset does indeed belong
to the target. Usually checks for any "strong" classification tag first by
looking at the contents of the "owned" variable, before performing
automatic evaluation.
Automatic evaluation depends on what type of asset you're writing. For
example, for a Host asset, the secondary sources of determining ownership
would include looking if its registrant is owned by the target, if it's
parent domain is owned by the target. etc.
Parameters
repo: IntelligenceRepository
The Intelligence Repository that is being used in the current context.
Returns
True it is determined with high confidence that this asset does indeed
belong to the target.
Function
parent_asset
Description
Returns the asset object that is considered the parent of this asset
object.
Parameters
repo: IntelligenceRepository
Returns
The asset object that this asset falls under (e.g. a Block is under a
Registrant, a Host is under a Block, a Host is under another Host, an Email
is under a Host, etc. This is primarily used for visualization.
After implementing the above methods, make sure you set the REPOSITORY
, ASSET_CLASS
, and OBJECT_ID
variables on the bottom of your asset's source code.
The scan mode isn't fully tested because of quotas concerning the search engines involved. Also, this project was made in a rush as part of a week-long hackaton challenge so there might be a lot of problems lying around. Please create an issue ticket or contact me at [email protected] if you find a bug or have some questions.
This work is derived from the approaches implemented by the Maltego and recon-ng Open Source Intelligence tools. I supplemented these approaches with ideas that are either already common knowledge (e.g. whois tells you who the owner of a domain is, subdomains are owned by the same organization owning their parent - as implied by domain name bruteforcing attacks, organizations are authoritative of the domain names that they own, etc.), or are original and were conceived by me in my own personal time as part of my hobby (e.g. acceptability ratings, various transforms, classification inheritance, etc.).
No component of this work was derived from any work that I have done for any employer in the past. The whole project, including the proof-of-concept, was written from scratch and was augmented with ideas from the information security community.