Skip to content

Commit

Permalink
Add an index
Browse files Browse the repository at this point in the history
  • Loading branch information
floren committed Apr 21, 2022
1 parent 1d4b67b commit 5ba8b1f
Show file tree
Hide file tree
Showing 16 changed files with 138 additions and 124 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@
images/*.tar.gz
images/*.tgz
dockerimages/
gravwell_training.tgz
gravwell_training.tgz
*.idx
*.ilg
*.ind
13 changes: 7 additions & 6 deletions API/api.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
\chapter{The Gravwell REST API}

\index{API}
\section{Introduction}
Gravwell implements a REST API over HTTP. This API powers the Gravwell UI, but it can also be used to interface other systems with Gravwell. For instance, a Python script can run a Gravwell query by hitting a single API endpoint. This chapter discusses \emph{API Tokens}, which are special authentication tokens given to client applications for accessing the API, and demonstrates how to access perhaps the most userful REST endpoint: the direct search API.

\section{Tokens}

\section{API Tokens}
\index{API!tokens}\index{Tokens|see {API}}
Gravwell users can generate tokens which allow an application to act as that user, but with limited privileges. Tokens are passed to the Gravwell webserver with HTTP requests as a method of authentication. Tokens are managed through the Tokens page (Figure \ref{fig:token-page}), accessible in the Main Menu under the ``Tools and Resources'' section.

\begin{figure}
Expand Down Expand Up @@ -40,7 +40,7 @@ \section{Tokens}
If the secret is lost, the token can be \emph{regenerated}, creating a new secret key, but any applications using the old secret will stop working until updated.

\subsection{Token Permissions and Restrictions}

\index{API!permissions}
Token permissions are defined using specific allowances, in which the user selects exactly which functionality a given token is allowed to perform. The Gravwell user interface provides some nice features to let you select groups of permissions that might be logically related, but in the end each token must declare exactly what APIs and systems it is allowed to access. Most permissions are divided into read and write components. This means a token might be configured so it can read resources but not write them, or a token can read the state of automation scripts but not create, update, or schedule them.

Permissions on tokens are an overlay on the user's existing permissions. This means that if the current user cannot access an API or feature, then the token cannot either--tokens can only restrict access, they cannot grant access that a user does not currently have.
Expand All @@ -60,6 +60,7 @@ \section{Accessing the Gravwell API}
\end{verbatim}

\section{Direct Search API}
\index{API!direct search}
The Gravwell Direct Query API is designed to provide atomic, REST-powered access to the Gravwell query system. This API allows for simple integration with external tools and systems that do not normally know how to interact with Gravwell. The API is designed to be as flexible as possible and support any tool that knows how to interact with an HTTP API.

The Direct Query API is authenticated and requires a valid Gravwell account with access to the Gravwell query system; a Gravwell token is the best way to access the API.
Expand All @@ -73,7 +74,7 @@ \subsection{Query Endpoints}
The Direct Query API consists of two REST endpoints which can parse a search and execute a search. The parse API is useful for testing whether a query is valid and could execute while the search API will actually execute a search and deliver the results. Both the query and parse APIs require the user and/or token to have the `Search` permission.

\subsubsection{Parse API}

\index{API!parse}
The parse API is used to validate the syntax of a Gravwell query before attempting to execute it. The parse API is accessed via a POST request to \code{/api/parse}. The parse API a query string delivered by header value, URL parameter, or a ParseSearchRequest\footnote{https://pkg.go.dev/github.com/gravwell/gravwell/v3/client/types\#ParseSearchRequest} JSON object. A ParseSearchResponse\footnote{https://pkg.go.dev/github.com/gravwell/gravwell/v3/client/types\#ParseSearchResponse} object and a 200 code will be returned if the query is valid.

The following curl commands are all functionally equivalent:
Expand Down Expand Up @@ -139,7 +140,7 @@ \subsubsection{Parse API}
\end{verbatim}

\subsubsection{Query API}

\index{API!direct search}
The query API actually runs a search and returns the results. It is accessed via a POST to \code{/api/search/direct}. The search API requires the parameters in Table \ref{table:query-parameters} be delivered by header values, URL parameters, or a JSON object.

\begin{table}[H]
Expand Down
48 changes: 24 additions & 24 deletions Architecture/architecture.tex
Original file line number Diff line number Diff line change
Expand Up @@ -15,52 +15,52 @@ \section{Terminology}
speak the same language.

\begin{description}[font=\sffamily\bfseries, leftmargin=1cm, style=nextline]
\item[Indexer]
\item[Indexer]\index{Indexers}
Stores data and manages wells.

\item[Webserver]
\item[Webserver]\index{Webservers}
Serves web interface, controls and coordinates indexers.

\item[Entry]
\item[Entry]\index{Entries}
A single tagged record or data item (line from a log file, Windows event, packet, etc.)

\item[Enumerated Value]
\item[Enumerated Value]\index{Enumerated Values}
Named data item that is extracted from the raw entry during a search.

\item[Tag]
\item[Tag]\index{Tags}
Human-readable name for a data group. The most basic grouping of data.

\item[Well]
\item[Well]\index{Wells}
On-disk collection of entries. Every entry ends up in exactly one well, sorted by tag.

\item[Shard]
\item[Shard]\index{Shards}
A slice of data within a well. Each shard contains about 1.5 days of entries.

\item[Ingester]
\item[Ingester]\index{Ingesters}
Program that accepts raw data and packages it as entries for transport to an indexer.

\item[Renderer]
\item[Renderer]\index{Search!renderers}\index{Renderers}
Query component that collects search output and presents results to a human.

\item[Datastore]
\item[Datastore]\index{Datastore}
Central authority of users and user-owned objects for distributed webservers.

\item[Search Agent]
\item[Search Agent]\index{Search agent}
Monitors and launches automated queries and scripts on behalf of users.

\item[Cluster Deployment]
\item[Cluster Deployment]\index{Clusters}
Multiple Indexers all participating in a single Gravwell instance.

\item[Distributed Webservers]
\item[Distributed Webservers]\index{Distributed webservers}
Multiple webservers sharing the load of GUI interactions and queries, but controlling the same set of indexers.

\item[Load Balancer]
\item[Load Balancer]\index{Load balancer}
An HTTP reverse proxy which transparently balances load across multiple webservers.
\end{description}

\section{Gravwell Entries}

Gravwell stores all data as \emph{entries}. An entry consists of a piece of data (just an array of bytes), a timestamp, a tag, and a source address. Each of these components deserves a bit of explanation, so we will cover each separately. Entries are stored in an efficient binary format on disk, but a user-friendly representation of an example entry would look something like this:
Gravwell stores all data as \emph{entries}\index{Entries}. An entry consists of a piece of data (just an array of bytes), a timestamp, a tag, and a source address. Each of these components deserves a bit of explanation, so we will cover each separately. Entries are stored in an efficient binary format on disk, but a user-friendly representation of an example entry would look something like this:

\begin{Verbatim}[breaklines=true]
{
Expand All @@ -75,23 +75,23 @@ \section{Gravwell Entries}

\subsection{``Timestamp'' field}

The timestamp is meant to indicate the creation time of the \emph{data}. This is typically extracted from the data itself by the ingester, e.g. by parsing out the timestamps on syslog messages. However, some ingesters such as the packet logger will instead set the timestamp to the current time, since the packet was captured ``now''.
The timestamp\index{Timestamps} is meant to indicate the creation time of the \emph{data}. This is typically extracted from the data itself by the ingester, e.g. by parsing out the timestamps on syslog messages. However, some ingesters such as the packet logger will instead set the timestamp to the current time, since the packet was captured ``now''.

\subsection{``Data'' field}

The data field contains the actual raw data of the entry. If log files are being ingested, the data field will typically contain a single line from the log file. When ingesting network packets, the data field will contain a single binary packet.

\subsection{``Tag'' field}

The tag field categorizes the entry. Users refer to tags by strings, e.g. "default" or "pcap" or "windows-logs", but under the hood Gravwell assigns each tag string a unique numeric ID for more efficient storage.
The tag\index{Tags} field categorizes the entry. Users refer to tags by strings, e.g. "default" or "pcap" or "windows-logs", but under the hood Gravwell assigns each tag string a unique numeric ID for more efficient storage.

\subsection{``Source'' field}

The source field indicates where the entry originated. It is an IPv4 or IPv6 address. It is perhaps the most free-form field of the entry, because it can indicate the machine on which the ingester was located, the machine from which the ingester \emph{read} the entry data, or it can be an entirely arbitrary number chosen as a second layer of categorization beyond the tag. Most ingesters provide configuration options for how the source field should be set.
The source\index{SRC|see {Source}}\index{Source} field indicates where the entry originated. It is an IPv4 or IPv6 address. It is perhaps the most free-form field of the entry, because it can indicate the machine on which the ingester was located, the machine from which the ingester \emph{read} the entry data, or it can be an entirely arbitrary number chosen as a second layer of categorization beyond the tag. Most ingesters provide configuration options for how the source field should be set.

\section{Data Ingest}

A core function of Gravwell is data ingest; \emph{ingesters} take raw data, package it as entries,
A core function of Gravwell is data ingest\index{Data ingest}; \emph{ingesters}\index{Ingesters} take raw data, package it as entries,
and transmit those entries to a Gravwell indexer (or
indexers) for storage, indexing, and searching. The Gravwell
ingest API\footnote{https://github.com/gravwell/gravwell/tree/master/ingest}
Expand Down Expand Up @@ -185,7 +185,7 @@ \subsection{Single Node}
\subsection{Cluster Deployment}

Gravwell deployments that need to handle large data volumes may require
multiple indexers. A Gravwell cluster is comprised of multiple indexers
multiple indexers. A Gravwell cluster\index{Clusters} is comprised of multiple indexers
which are controlled by a single webserver as shown in Figure \ref{fig:cluster}. The webserver and
search agent are typically on a separate node and connect to the indexers
via an IPv4 or IPv6 link. When operating in a cluster topology Gravwell
Expand All @@ -208,7 +208,7 @@ \subsection{Cluster Deployment}
\subsection{Distributed Webserver Architecture}

Very large Gravwell deployments with many users may need to also employ
multiple webservers to handle the load. Gravwell supports a distributed
multiple webservers\index{Distributed webservers} to handle the load. Gravwell supports a distributed
webserver topology by coordinating and synchronizing the webservers,
shown in Figure \ref{fig:distributed}.

Expand All @@ -219,7 +219,7 @@ \subsection{Distributed Webserver Architecture}
\end{figure}

Webservers coordinate and synchronize using a component called the
\emph{datastore}. The datastore acts as a master storage system for user data,
\emph{datastore}\index{Datastore}. The datastore acts as a master storage system for user data,
search resources, scheduled queries, scripts, and any other component
that can be uploaded or modified by a user. Webservers and the
datastore will attempt to maintain a complete copy of all resources,
Expand All @@ -230,7 +230,7 @@ \subsection{Distributed Webserver Architecture}

\section{Replication}

Gravwell supports full data replication, so that in the event of
Gravwell supports full data replication\index{Replication}, so that in the event of
hardware failure, data is not lost. Replication strategies depend on
the type of deployment and general tolerance for distributed failures.
Replication is controlled entirely by the indexers and is unaffected by
Expand Down Expand Up @@ -290,7 +290,7 @@ \section{Scheduled Search and Orchestration}
\section{Cloud Archive}

Gravwell supports remote, offline archival of data using the Cloud
Archive service. Cloud Archive enables indexers to transmit shards to a
Archive\index{Cloud archive} service. Cloud Archive enables indexers to transmit shards to a
remote system with low-cost mass storage so that data can be held long
term. Indexers can configure individual wells to participate in the
archive process, giving fine-grained control over which entries are archived.
Expand Down
11 changes: 6 additions & 5 deletions Automation/automation.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
\chapter{Automation}

\index{Automation}
Gravwell provides several utilities to enable automated operations.
At the most basic level, users can schedule searches to be executed at specific times, for
example every morning at 1:00. They can also schedule
Expand All @@ -14,7 +14,7 @@ \chapter{Automation}
the search agent, scheduled searches, flows, and scripts.

\section{Configuring User Email Settings}

\index{Email configuration}
In order to send emails from scheduled scripts, each user must input
settings for their preferred email server. This will allow Gravwell to
act as an SMTP client and send emails on the user's behalf. The email
Expand Down Expand Up @@ -53,7 +53,7 @@ \section{Configuring User Email Settings}


\section{The Search Agent}

\index{Search agent}
The search agent is the Gravwell component which actually handles the
execution of scheduled searches and scripts. It is a separate process
which connects to a Gravwell webserver as a client, obtains a list of
Expand Down Expand Up @@ -171,7 +171,7 @@ \subsubsection{Max-Script-Run-Time}


\subsection{Scheduling Searches}

\index{Automation!scheduled searches}\index{Scheduled searches|see {Automation}}
Users can schedule searches to run at regular intervals. This enables
several useful possibilities, such as automatically updating lookup
tables (e.g. MAC address to IP mappings) or executing a very detailed /
Expand Down Expand Up @@ -330,7 +330,8 @@ \subsection{Hands-On Lab}
\include{flows}

\section{Search Scripting and Orchestration}

\index{Automation!scripts}\index{Scheduled scripts|see {Automation}}
\index{Anko}
Scripts provide an additional layer of power beyond scheduled
searches. A script can execute multiple searches, filter and enrich the
results, then re-ingest the resulting entries under a different tag,
Expand Down
7 changes: 4 additions & 3 deletions Automation/flows.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
\section{Flows}

\index{Automation!flows|see {Flows}}\index{Flows}
Flows provide a no-code method for developing advanced automations in Gravwell. A flow consists of one of more \emph{nodes}; each node performs a single action and passes the results (if any) on to the next node(s). By wiring together nodes in a drag-and-drop user interface, users can:

\begin{itemize}
Expand All @@ -26,7 +26,7 @@ \subsection{Flow Concepts}
\end{enumerate}

\subsubsection{Nodes}

\index{Flows!nodes}
A flow is a collection of \emph{nodes}, linked together to define an order of execution. Each node does a single task, such as running a query or sending an email. Figure \ref{fig:nodes} shows a simple flow of three nodes; the leftmost node runs a Gravwell query, then the middle node formats the results of that query into a PDF document, and finally the rightmost node sends that PDF document as an email attachment.

\begin{figure}
Expand All @@ -38,7 +38,7 @@ \subsubsection{Nodes}
All nodes have a single output socket. Most have only a single input socket, but some nodes which merge \emph{payloads} (see below) have multiple input sockets. One node's output socket may be connected to the \emph{inputs} of multiple other nodes, but each input socket can only take one connection.

\subsubsection{Payloads}

\index{Flows!payloads}
\emph{Payloads} are collections of data passed from node to node, representing the state of execution. For instance, the ``Run a Query'' node will insert an item named ``search'' into the payload, containing things like the query results and metadata about the search. The PDF node can \emph{read} that ``search'' item, format it into a nice PDF document, and insert the PDF file back into the payload with a name like ``gravwell.pdf''. Then the Email node can be configured to attach ``gravwell.pdf'' to the outgoing email.

Most nodes receive a single incoming payload through a single \emph{input} socket, then pass a single outgoing payload via the \emph{output} socket. In most cases, the outgoing payload will be a modified version of the incoming payload.
Expand All @@ -58,6 +58,7 @@ \subsubsection{Execution order}
Note that some nodes may block execution of downstream nodes. The \code{If} node is configured with a boolean logic expression; if that expression evaluates to \emph{false}, none of the If node's downstream nodes are executed. Nodes which can block downstream execution will always have a note to that effect in the online documentation.

\subsection{The Flow Editor}
\index{GUI!flow editor}
Flows are created using the flow editor. Although the Gravwell flow editor can be intimidating at first glance, a few minutes' worth of experimentation and exploration should be enough to get started building flows. This section will go through the various components of the UI, explaining each component.

Note: If you're not yet familiar with the basic components of a flow (nodes, sockets, payloads), refer to Section \ref{sec:flow-concepts} for an overview.
Expand Down
2 changes: 1 addition & 1 deletion CLI/cli.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
\chapter{Command Line Interface}

\index{Command-line interface}
In addition to the GUI we've shown so far, Gravwell provides a
command-line client. This client can be useful for certain repetitive
tasks such as testing scripts (see the Automation chapter for more info)
Expand Down
Loading

0 comments on commit 5ba8b1f

Please sign in to comment.