diff --git a/.gitignore b/.gitignore index bb9dc60..6540bb7 100644 --- a/.gitignore +++ b/.gitignore @@ -8,4 +8,7 @@ images/*.tar.gz images/*.tgz dockerimages/ -gravwell_training.tgz \ No newline at end of file +gravwell_training.tgz +*.idx +*.ilg +*.ind diff --git a/API/api.tex b/API/api.tex index 156af07..44189f4 100644 --- a/API/api.tex +++ b/API/api.tex @@ -1,10 +1,10 @@ \chapter{The Gravwell REST API} - +\index{API} \section{Introduction} Gravwell implements a REST API over HTTP. This API powers the Gravwell UI, but it can also be used to interface other systems with Gravwell. For instance, a Python script can run a Gravwell query by hitting a single API endpoint. This chapter discusses \emph{API Tokens}, which are special authentication tokens given to client applications for accessing the API, and demonstrates how to access perhaps the most userful REST endpoint: the direct search API. -\section{Tokens} - +\section{API Tokens} +\index{API!tokens}\index{Tokens|see {API}} Gravwell users can generate tokens which allow an application to act as that user, but with limited privileges. Tokens are passed to the Gravwell webserver with HTTP requests as a method of authentication. Tokens are managed through the Tokens page (Figure \ref{fig:token-page}), accessible in the Main Menu under the ``Tools and Resources'' section. \begin{figure} @@ -40,7 +40,7 @@ \section{Tokens} If the secret is lost, the token can be \emph{regenerated}, creating a new secret key, but any applications using the old secret will stop working until updated. \subsection{Token Permissions and Restrictions} - +\index{API!permissions} Token permissions are defined using specific allowances, in which the user selects exactly which functionality a given token is allowed to perform. The Gravwell user interface provides some nice features to let you select groups of permissions that might be logically related, but in the end each token must declare exactly what APIs and systems it is allowed to access. Most permissions are divided into read and write components. This means a token might be configured so it can read resources but not write them, or a token can read the state of automation scripts but not create, update, or schedule them. Permissions on tokens are an overlay on the user's existing permissions. This means that if the current user cannot access an API or feature, then the token cannot either--tokens can only restrict access, they cannot grant access that a user does not currently have. @@ -60,6 +60,7 @@ \section{Accessing the Gravwell API} \end{verbatim} \section{Direct Search API} +\index{API!direct search} The Gravwell Direct Query API is designed to provide atomic, REST-powered access to the Gravwell query system. This API allows for simple integration with external tools and systems that do not normally know how to interact with Gravwell. The API is designed to be as flexible as possible and support any tool that knows how to interact with an HTTP API. The Direct Query API is authenticated and requires a valid Gravwell account with access to the Gravwell query system; a Gravwell token is the best way to access the API. @@ -73,7 +74,7 @@ \subsection{Query Endpoints} The Direct Query API consists of two REST endpoints which can parse a search and execute a search. The parse API is useful for testing whether a query is valid and could execute while the search API will actually execute a search and deliver the results. Both the query and parse APIs require the user and/or token to have the `Search` permission. \subsubsection{Parse API} - +\index{API!parse} The parse API is used to validate the syntax of a Gravwell query before attempting to execute it. The parse API is accessed via a POST request to \code{/api/parse}. The parse API a query string delivered by header value, URL parameter, or a ParseSearchRequest\footnote{https://pkg.go.dev/github.com/gravwell/gravwell/v3/client/types\#ParseSearchRequest} JSON object. A ParseSearchResponse\footnote{https://pkg.go.dev/github.com/gravwell/gravwell/v3/client/types\#ParseSearchResponse} object and a 200 code will be returned if the query is valid. The following curl commands are all functionally equivalent: @@ -139,7 +140,7 @@ \subsubsection{Parse API} \end{verbatim} \subsubsection{Query API} - +\index{API!direct search} The query API actually runs a search and returns the results. It is accessed via a POST to \code{/api/search/direct}. The search API requires the parameters in Table \ref{table:query-parameters} be delivered by header values, URL parameters, or a JSON object. \begin{table}[H] diff --git a/Architecture/architecture.tex b/Architecture/architecture.tex index 0b1d630..50322a3 100644 --- a/Architecture/architecture.tex +++ b/Architecture/architecture.tex @@ -15,52 +15,52 @@ \section{Terminology} speak the same language. \begin{description}[font=\sffamily\bfseries, leftmargin=1cm, style=nextline] -\item[Indexer] +\item[Indexer]\index{Indexers} Stores data and manages wells. -\item[Webserver] +\item[Webserver]\index{Webservers} Serves web interface, controls and coordinates indexers. -\item[Entry] +\item[Entry]\index{Entries} A single tagged record or data item (line from a log file, Windows event, packet, etc.) -\item[Enumerated Value] +\item[Enumerated Value]\index{Enumerated Values} Named data item that is extracted from the raw entry during a search. -\item[Tag] +\item[Tag]\index{Tags} Human-readable name for a data group. The most basic grouping of data. -\item[Well] +\item[Well]\index{Wells} On-disk collection of entries. Every entry ends up in exactly one well, sorted by tag. -\item[Shard] +\item[Shard]\index{Shards} A slice of data within a well. Each shard contains about 1.5 days of entries. -\item[Ingester] +\item[Ingester]\index{Ingesters} Program that accepts raw data and packages it as entries for transport to an indexer. -\item[Renderer] +\item[Renderer]\index{Search!renderers}\index{Renderers} Query component that collects search output and presents results to a human. -\item[Datastore] +\item[Datastore]\index{Datastore} Central authority of users and user-owned objects for distributed webservers. -\item[Search Agent] +\item[Search Agent]\index{Search agent} Monitors and launches automated queries and scripts on behalf of users. -\item[Cluster Deployment] +\item[Cluster Deployment]\index{Clusters} Multiple Indexers all participating in a single Gravwell instance. -\item[Distributed Webservers] +\item[Distributed Webservers]\index{Distributed webservers} Multiple webservers sharing the load of GUI interactions and queries, but controlling the same set of indexers. -\item[Load Balancer] +\item[Load Balancer]\index{Load balancer} An HTTP reverse proxy which transparently balances load across multiple webservers. \end{description} \section{Gravwell Entries} -Gravwell stores all data as \emph{entries}. An entry consists of a piece of data (just an array of bytes), a timestamp, a tag, and a source address. Each of these components deserves a bit of explanation, so we will cover each separately. Entries are stored in an efficient binary format on disk, but a user-friendly representation of an example entry would look something like this: +Gravwell stores all data as \emph{entries}\index{Entries}. An entry consists of a piece of data (just an array of bytes), a timestamp, a tag, and a source address. Each of these components deserves a bit of explanation, so we will cover each separately. Entries are stored in an efficient binary format on disk, but a user-friendly representation of an example entry would look something like this: \begin{Verbatim}[breaklines=true] { @@ -75,7 +75,7 @@ \section{Gravwell Entries} \subsection{``Timestamp'' field} -The timestamp is meant to indicate the creation time of the \emph{data}. This is typically extracted from the data itself by the ingester, e.g. by parsing out the timestamps on syslog messages. However, some ingesters such as the packet logger will instead set the timestamp to the current time, since the packet was captured ``now''. +The timestamp\index{Timestamps} is meant to indicate the creation time of the \emph{data}. This is typically extracted from the data itself by the ingester, e.g. by parsing out the timestamps on syslog messages. However, some ingesters such as the packet logger will instead set the timestamp to the current time, since the packet was captured ``now''. \subsection{``Data'' field} @@ -83,15 +83,15 @@ \subsection{``Data'' field} \subsection{``Tag'' field} -The tag field categorizes the entry. Users refer to tags by strings, e.g. "default" or "pcap" or "windows-logs", but under the hood Gravwell assigns each tag string a unique numeric ID for more efficient storage. +The tag\index{Tags} field categorizes the entry. Users refer to tags by strings, e.g. "default" or "pcap" or "windows-logs", but under the hood Gravwell assigns each tag string a unique numeric ID for more efficient storage. \subsection{``Source'' field} -The source field indicates where the entry originated. It is an IPv4 or IPv6 address. It is perhaps the most free-form field of the entry, because it can indicate the machine on which the ingester was located, the machine from which the ingester \emph{read} the entry data, or it can be an entirely arbitrary number chosen as a second layer of categorization beyond the tag. Most ingesters provide configuration options for how the source field should be set. +The source\index{SRC|see {Source}}\index{Source} field indicates where the entry originated. It is an IPv4 or IPv6 address. It is perhaps the most free-form field of the entry, because it can indicate the machine on which the ingester was located, the machine from which the ingester \emph{read} the entry data, or it can be an entirely arbitrary number chosen as a second layer of categorization beyond the tag. Most ingesters provide configuration options for how the source field should be set. \section{Data Ingest} -A core function of Gravwell is data ingest; \emph{ingesters} take raw data, package it as entries, +A core function of Gravwell is data ingest\index{Data ingest}; \emph{ingesters}\index{Ingesters} take raw data, package it as entries, and transmit those entries to a Gravwell indexer (or indexers) for storage, indexing, and searching. The Gravwell ingest API\footnote{https://github.com/gravwell/gravwell/tree/master/ingest} @@ -185,7 +185,7 @@ \subsection{Single Node} \subsection{Cluster Deployment} Gravwell deployments that need to handle large data volumes may require -multiple indexers. A Gravwell cluster is comprised of multiple indexers +multiple indexers. A Gravwell cluster\index{Clusters} is comprised of multiple indexers which are controlled by a single webserver as shown in Figure \ref{fig:cluster}. The webserver and search agent are typically on a separate node and connect to the indexers via an IPv4 or IPv6 link. When operating in a cluster topology Gravwell @@ -208,7 +208,7 @@ \subsection{Cluster Deployment} \subsection{Distributed Webserver Architecture} Very large Gravwell deployments with many users may need to also employ -multiple webservers to handle the load. Gravwell supports a distributed +multiple webservers\index{Distributed webservers} to handle the load. Gravwell supports a distributed webserver topology by coordinating and synchronizing the webservers, shown in Figure \ref{fig:distributed}. @@ -219,7 +219,7 @@ \subsection{Distributed Webserver Architecture} \end{figure} Webservers coordinate and synchronize using a component called the -\emph{datastore}. The datastore acts as a master storage system for user data, +\emph{datastore}\index{Datastore}. The datastore acts as a master storage system for user data, search resources, scheduled queries, scripts, and any other component that can be uploaded or modified by a user. Webservers and the datastore will attempt to maintain a complete copy of all resources, @@ -230,7 +230,7 @@ \subsection{Distributed Webserver Architecture} \section{Replication} -Gravwell supports full data replication, so that in the event of +Gravwell supports full data replication\index{Replication}, so that in the event of hardware failure, data is not lost. Replication strategies depend on the type of deployment and general tolerance for distributed failures. Replication is controlled entirely by the indexers and is unaffected by @@ -290,7 +290,7 @@ \section{Scheduled Search and Orchestration} \section{Cloud Archive} Gravwell supports remote, offline archival of data using the Cloud -Archive service. Cloud Archive enables indexers to transmit shards to a +Archive\index{Cloud archive} service. Cloud Archive enables indexers to transmit shards to a remote system with low-cost mass storage so that data can be held long term. Indexers can configure individual wells to participate in the archive process, giving fine-grained control over which entries are archived. diff --git a/Automation/automation.tex b/Automation/automation.tex index 03621c3..8aa2d78 100644 --- a/Automation/automation.tex +++ b/Automation/automation.tex @@ -1,5 +1,5 @@ \chapter{Automation} - +\index{Automation} Gravwell provides several utilities to enable automated operations. At the most basic level, users can schedule searches to be executed at specific times, for example every morning at 1:00. They can also schedule @@ -14,7 +14,7 @@ \chapter{Automation} the search agent, scheduled searches, flows, and scripts. \section{Configuring User Email Settings} - +\index{Email configuration} In order to send emails from scheduled scripts, each user must input settings for their preferred email server. This will allow Gravwell to act as an SMTP client and send emails on the user's behalf. The email @@ -53,7 +53,7 @@ \section{Configuring User Email Settings} \section{The Search Agent} - +\index{Search agent} The search agent is the Gravwell component which actually handles the execution of scheduled searches and scripts. It is a separate process which connects to a Gravwell webserver as a client, obtains a list of @@ -171,7 +171,7 @@ \subsubsection{Max-Script-Run-Time} \subsection{Scheduling Searches} - +\index{Automation!scheduled searches}\index{Scheduled searches|see {Automation}} Users can schedule searches to run at regular intervals. This enables several useful possibilities, such as automatically updating lookup tables (e.g. MAC address to IP mappings) or executing a very detailed / @@ -330,7 +330,8 @@ \subsection{Hands-On Lab} \include{flows} \section{Search Scripting and Orchestration} - +\index{Automation!scripts}\index{Scheduled scripts|see {Automation}} +\index{Anko} Scripts provide an additional layer of power beyond scheduled searches. A script can execute multiple searches, filter and enrich the results, then re-ingest the resulting entries under a different tag, diff --git a/Automation/flows.tex b/Automation/flows.tex index 49097bd..5cbacd2 100644 --- a/Automation/flows.tex +++ b/Automation/flows.tex @@ -1,5 +1,5 @@ \section{Flows} - +\index{Automation!flows|see {Flows}}\index{Flows} Flows provide a no-code method for developing advanced automations in Gravwell. A flow consists of one of more \emph{nodes}; each node performs a single action and passes the results (if any) on to the next node(s). By wiring together nodes in a drag-and-drop user interface, users can: \begin{itemize} @@ -26,7 +26,7 @@ \subsection{Flow Concepts} \end{enumerate} \subsubsection{Nodes} - +\index{Flows!nodes} A flow is a collection of \emph{nodes}, linked together to define an order of execution. Each node does a single task, such as running a query or sending an email. Figure \ref{fig:nodes} shows a simple flow of three nodes; the leftmost node runs a Gravwell query, then the middle node formats the results of that query into a PDF document, and finally the rightmost node sends that PDF document as an email attachment. \begin{figure} @@ -38,7 +38,7 @@ \subsubsection{Nodes} All nodes have a single output socket. Most have only a single input socket, but some nodes which merge \emph{payloads} (see below) have multiple input sockets. One node's output socket may be connected to the \emph{inputs} of multiple other nodes, but each input socket can only take one connection. \subsubsection{Payloads} - +\index{Flows!payloads} \emph{Payloads} are collections of data passed from node to node, representing the state of execution. For instance, the ``Run a Query'' node will insert an item named ``search'' into the payload, containing things like the query results and metadata about the search. The PDF node can \emph{read} that ``search'' item, format it into a nice PDF document, and insert the PDF file back into the payload with a name like ``gravwell.pdf''. Then the Email node can be configured to attach ``gravwell.pdf'' to the outgoing email. Most nodes receive a single incoming payload through a single \emph{input} socket, then pass a single outgoing payload via the \emph{output} socket. In most cases, the outgoing payload will be a modified version of the incoming payload. @@ -58,6 +58,7 @@ \subsubsection{Execution order} Note that some nodes may block execution of downstream nodes. The \code{If} node is configured with a boolean logic expression; if that expression evaluates to \emph{false}, none of the If node's downstream nodes are executed. Nodes which can block downstream execution will always have a note to that effect in the online documentation. \subsection{The Flow Editor} +\index{GUI!flow editor} Flows are created using the flow editor. Although the Gravwell flow editor can be intimidating at first glance, a few minutes' worth of experimentation and exploration should be enough to get started building flows. This section will go through the various components of the UI, explaining each component. Note: If you're not yet familiar with the basic components of a flow (nodes, sockets, payloads), refer to Section \ref{sec:flow-concepts} for an overview. diff --git a/CLI/cli.tex b/CLI/cli.tex index 8e5b922..de16829 100644 --- a/CLI/cli.tex +++ b/CLI/cli.tex @@ -1,5 +1,5 @@ \chapter{Command Line Interface} - +\index{Command-line interface} In addition to the GUI we've shown so far, Gravwell provides a command-line client. This client can be useful for certain repetitive tasks such as testing scripts (see the Automation chapter for more info) diff --git a/GUI/gui.tex b/GUI/gui.tex index d98b28a..67e9786 100644 --- a/GUI/gui.tex +++ b/GUI/gui.tex @@ -2,10 +2,10 @@ \chapter{Using the GUI} \section{Introduction} -This chapter discusses the basic organization and usage of the Gravwell GUI. It begins with a general overview of the user interface, then discusses more specialized and advanced options. Feel free to skip ahead to the next chapter to begin searching with Gravwell immediately, referring back to this chapter when needed. +This chapter discusses the basic organization and usage of the Gravwell GUI\index{GUI}\index{User Interface|see {GUI}}. It begins with a general overview of the user interface, then discusses more specialized and advanced options. Feel free to skip ahead to the next chapter to begin searching with Gravwell immediately, referring back to this chapter when needed. \section{Menus} - +\index{GUI!menus} By default, users will end up on the ``New Search'' page after logging in. This page, like all pages in the Gravwell GUI, includes a bar across the top containing the main menu, notifications, the account menu, and more. These menus are described further below. \begin{figure} @@ -17,6 +17,7 @@ \section{Menus} Clicking the Gravwell logo in the upper left will always take you back to your home page (see section \ref{sec:account-menu} for configuration options). \subsection{The Main Menu} +\index{GUI!main menu} Clicking the ``hamburger'' button in the upper left will open the Main Menu, as shown in Figure \ref{fig:mainmenu}. @@ -37,7 +38,7 @@ \subsection{The Main Menu} \clearpage \subsection{Notifications} - +\index{GUI!notifications}\index{Notifications} Important notifications are accessible under the bell icon in the upper right corner of the page. Regular notifications are indicated by a small red circle containing the number of notifications. A critical notification will change the entire icon to a more attention-catching red icon; see Figure \ref{fig:notifications-icons} for examples. \begin{figure}[H] @@ -58,7 +59,7 @@ \subsection{Notifications} \end{figure} \subsection{The Account Menu} -\label{sec:account-menu} +\label{sec:account-menu}\index{GUI!account preferences} The round button in the upper-right of the page is the Account Menu button. It will display either the initials of the current user, or a profile image if set. Clicking it brings up a small drop-down menu (Figure \ref{fig:accountmenu}). @@ -101,7 +102,7 @@ \subsection{The Account Menu} The ``Advanced Preferences'' section can be ignored by most users. Selecting ``Developer mode'' enables manual editing of JSON preferences, while toggling ``Experimental Features'' will enable the Experimental Features section in the main menu. -The final tab, ``Email Server'' (Figure \ref{fig:email-prefs-gui}, is extremely important for users who intend to do automated email alerting via scheduled scripts. It must be set up with a valid SMTP configuration before emails can be sent. +The final tab, ``Email Server''\index{GUI!email configuration}\index{Email configuration} (Figure \ref{fig:email-prefs-gui}, is extremely important for users who intend to do automated email alerting via scheduled scripts. It must be set up with a valid SMTP configuration before emails can be sent. \begin{figure} \includegraphics[width=0.7\linewidth]{images/email-prefs.png} @@ -114,7 +115,7 @@ \subsection{The Account Menu} Once the fields have been populated, click ``Update Settings'' to save them, then click ``Test Configuration'' to send a test email. \section{Labels and Filtering} - +\index{GUI!labels}\index{Labels} Objects in Gravwell such as dashboards, resources, macros, etc. can be labeled for organizational purposes. Some objects distributed in kits may be pre-labeled for convenience. The following object types can be labeled: \begin{itemize} @@ -207,7 +208,7 @@ \subsection{Special Labels} The string \code{hidden} is a special label; applying it to an object will prevent the object from being displayed by default. To see the object, toggle the ``Show hidden record'' option in the filter menu, as detailed above. \subsubsection{Kit Label Prefixes} - +\index{Kits!kit labels} Three label prefixes are used to manage Gravwell-internal information about objects which were installed as part of a kit. You should never manually apply kit labels to objects; these labels are documented to prevent users from accidentally applying a conflicting label to an object. The following are considered reserved kit label prefixes: @@ -221,7 +222,7 @@ \subsubsection{Kit Label Prefixes} Users should not create labels beginning with these strings, e.g. \code{kit/foo} or \code{kit/dependency:bar}. These labels are managed internally by Gravwell. \section{Playbooks} - +\index{Playbooks} Playbooks are hypertext documents within Gravwell which help guide users through common tasks, describe functionality, and record information about data in the system. Most Gravwell kits (see Chapter \ref{ch:kits}) include a playbook or two to help users get oriented in the kit, but regular users can also \emph{create} playbooks for themselves, documenting their data investigations with a mix of text, images, and \emph{executable queries}. The Playbooks page (Figure \ref{fig:playbooks}) lists the playbooks currently in the system and allows the creation of new ones. @@ -241,7 +242,7 @@ \section{Playbooks} \end{figure} \subsection{Playbook Markdown} - +\index{Markdown} Playbooks are written in Markdown. You can edit an existing playbook by clicking the edit button, or create your own new playbook. This will bring up an editor as shown in Figure \ref{fig:playbook-edit} \begin{figure} diff --git a/Indexers/indexers.tex b/Indexers/indexers.tex index 6bf163c..000e601 100644 --- a/Indexers/indexers.tex +++ b/Indexers/indexers.tex @@ -1,5 +1,5 @@ \chapter{Indexers and Well Configuration} - +\index{Indexers}\index{Data ingest} The two core components of a Gravwell deployment are the Indexer and Webserver. If either of those components are not properly configured, Gravwell will not function. The indexer is the most important @@ -27,7 +27,7 @@ \chapter{Indexers and Well Configuration} \section{Indexer Configuration} - +\index{gravwell.conf@\code{gravwell.conf}} Gravwell always ships with a functional gravwell.conf that provides a basic deployment right out of the box. However, with very large or very complex deployments, tuning the configuration can yield much better @@ -39,7 +39,7 @@ \section{Indexer Configuration} package both contain logic to inject randomly generated tokens into each required authentication parameter. The published Docker image on Docker Hub\footnote{https://hub.docker.com/} contains -environment variables preconfigured to set the authentication tokens. +environment variables preconfigured to set the authentication tokens.\index{Docker} The preconfigured environment variables for the Docker deployment are not unique and not secure. If you plan to use Gravwell in Docker, you must override these configuration values using either Docker secrets or @@ -164,7 +164,7 @@ \subsubsection{Hands-on Lab Tips and Solutions} going to use it again for the next lab. \section{Well Configuration} - +\index{Wells} Properly configuring a well can yield significant performance gains when querying data. Isolating data into different wells means that when you issue a query on a tag, the indexer can look at storage that is only @@ -301,7 +301,7 @@ \subsection{Hands-on Lab: Well Definitions} \end{figure} \section{Well Ageout} - +\index{Ageout}\index{Wells!ageout} Gravwell is designed to manage data sets with minimal user interaction; once a system is appropriately configured it will manage data sets and storage arrays on its own. Data ageout is one of the most critical @@ -591,7 +591,7 @@ \subsection{Hands-on Lab: Ageout} \end{Verbatim} \section{Replication} - +\index{Replication} Hardware failures happen, drives crash, and humans mis-type commands; only a robust data backup solution can prevent catastrophic data loss. Gravwell implements a data replication system which enables transparent @@ -745,7 +745,7 @@ \subsubsection{Lab Questions} \section{Query Acceleration and Indexing} \label{sec:acceleration} - +\index{Acceleration}\index{Indexing|see {Acceleration}} Gravwell indexers provide a couple of different data management methodologies, ranging from raw storage with only temporal indexing to full feature extraction with and direct data indexing. This section @@ -753,7 +753,7 @@ \section{Query Acceleration and Indexing} pros and cons of methodology. A Gravwell well without any acceleration configuration will employ only -temporal indexing, this means that every entry is grouped according to a +temporal indexing, which means that every entry is grouped according to a timestamp that is indexed using a temporal index. The temporal index allows for specifying subsections of time without combing through data that isn't in the time region specified by the query. Wells can also be @@ -1129,7 +1129,7 @@ \subsubsection{Lab Questions} \end{enumerate} \section{Indexer Optimization} - +\index{Tuning!indexers} Gravwell prides itself on not requiring specific machine specs and being able to scale across a broad range of hardware capabilities. While the software makes a best effort at scaling without human @@ -1262,7 +1262,7 @@ \section{Indexer Optimization} \section{Docker Configuration} \label{sec:docker-config} - +\index{Docker} Throughout the training so far we have made heavy use of Docker as a simple container platform that makes it easy to configure and run Gravwell. As you were working with Gravwell, there may have been a diff --git a/Ingesters/ingesters.tex b/Ingesters/ingesters.tex index 0005482..2d1363c 100644 --- a/Ingesters/ingesters.tex +++ b/Ingesters/ingesters.tex @@ -1,5 +1,5 @@ \chapter{Ingesters} - +\index{Ingesters} Ingesters are programs which collect entries from some data source and send them to one or more indexers. @@ -106,7 +106,7 @@ \section{Configuration} sections of this document. \subsection{Timestamp Format Overrides (Optional)} - +\index{Timestamps} Data values may contain multiple timestamps, which can cause some confusion when attempting to derive timestamps out of the data. Normally, the Listeners will grab the left most timestamp that can be @@ -123,7 +123,7 @@ \subsection{Timestamp Format Overrides (Optional)} Listener config. \section{Simple Relay Ingester} - +\index{Ingesters!simple relay} Simple Relay is the go-to ingester for text based data sources that can be delivered over plaintext TCP and/or UDP network connections via either IPv4 or IPv6. @@ -465,7 +465,7 @@ \subsubsection{Lab Questions} \clearpage \section{File Follower Ingester} - +\index{Ingesters!file follow} The File Follower ingester is the best way to ingest files on the local filesystem in situations where those files may be updated on the fly. It ingests each line of the files as a single entry. @@ -823,7 +823,7 @@ \subsubsection{Lab Questions} % about the winlog module as the ingester itself %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Windows Event Ingester} - +\index{Ingesters!windows event} The Windows event ingester is designed to monitor the Windows event service and ingest raw events into Gravwell. Windows events are formatted in XML and frankly, can get a little unruly; as a result the @@ -1034,7 +1034,7 @@ \subsubsection{Lab Questions:} \clearpage \section{Netflow and IPFIX Ingester} - +\index{Ingesters!netflow}\index{Ingesters!IPFIX} The Netflow ingester acts as a Netflow collector, gathering records created by Netflow exporters and capturing them as Gravwell entries for later analysis. These entries can then be analyzed using the netflow @@ -1142,7 +1142,7 @@ \subsection{Hands-on Lab: Netflow Ingester} \section{Packet Capture Ingester} - +\index{Ingesters!packet capture}\index{PCAP} The packet capture ingester illustrates one of Gravwell's unique strengths: its ability to ingest raw, unprocessed binary data. This ingester listens on one or more network interfaces and ingests every @@ -1260,7 +1260,7 @@ \subsection{Hands-on Lab: Packet Capture Ingester} \clearpage \section{Tag Management / Federation} \label{sec:federator} - +\index{Federator}\index{Ingesters!federator} The Federator is an entry relay: ingesters connect to the Federator and send it entries, then the Federator passes those entries to an indexer. The Federator can act as a trust boundary, securely relaying entries @@ -1424,7 +1424,7 @@ \subsection{Hands-on Lab: Federation} \end{Verbatim} \section{Ingester Caching} - +\index{Ingesters!caching} Ingester caching allows Gravwell ingesters to continue receiving data and generating entries even when there are no active connections to indexers. Entries are stored in an on-disk cache until indexer @@ -1570,7 +1570,7 @@ \subsection{Hands-on Lab: Ingester Cache} \clearpage \section{Ingest API and Source Code} - +\index{Ingest API} Gravwell's ingest library is open-sourced for public use at \href{github.com/gravwell/gravwell/tree/master/ingest}{github.com/gravwell/gravwell/ingest}. This makes it relatively easy to write a custom ingester for a specific @@ -1639,7 +1639,7 @@ \subsection{Configuring and Starting the Ingest Muxer} \subsection{Creating and Uploading Entries} - +\index{Entries} Once the muxer has been started and \code{WaitForHot} has returned successfully, entries can be sent to the indexer(s). Note, however, that indexers use a mapping of string tag names to numeric tag IDs, and that @@ -1711,7 +1711,7 @@ \subsection{Cleaning Up/Shutting Down} \section{Permissions and Port Binding} - +\index{Troubleshooting!permissions}\index{Troubleshooting!ports} Gravwell is designed to execute with as few privileges as possible. The shell and Debian installers will create an unprivileged user and group named gravwell and create a directory structure in @@ -1815,7 +1815,7 @@ \subsection{Hands-on Lab: Permissions and Port Binding} \section{Gravwell and Systemd} - +\index{SystemD} {Gravwell installers assume the availability of Systemd. While there are other init services out there, Systemd has steadily become the init system of choice for most popular Linux distributions. Systemd @@ -1871,7 +1871,7 @@ \section{Gravwell and Systemd} \section{Gravwell and Docker} - +\index{Docker} Gravwell supports Docker as a first class citizen (as you can tell by our heavy use for this training). Most components can use environment variables and Docker secrets for configuration and process control. Our diff --git a/Kits/kits.tex b/Kits/kits.tex index aeb3813..3ba8026 100644 --- a/Kits/kits.tex +++ b/Kits/kits.tex @@ -1,6 +1,6 @@ \chapter{Gravwell Kits} \label{ch:kits} - +\index{Kits} Gravwell Kits act like a sort of `app store' from which Gravwell users can install kits to gain out-of-box capabilities for common technologies. As an example, let's take an organization that uses Zeek for firewall/IDS @@ -85,11 +85,11 @@ \section{What's in a Kit?} \end{itemize} \subsection{Dependencies} - +\index{Kits!dependencies}: A kit may have \emph{dependencies} defined. Dependencies are other kits which the kit requires for proper functionality. For example, many kits depend on the Network Enrichment kit, which provides some baseline resources for enriching network data, such as a GeoIP database. Dependencies are installed automatically when you deploy a kit, provided the dependency exists on the Gravwell kit server. \section{Browsing and Installing Kits} - +\index{Kits!installation} On a fresh Gravwell cluster, no kits are installed. Opening the `Kits' section will present an empty page (Figure \ref{fig:blank-kits}). If you click `Manage Kits', you will be taken directly to the list of kits on the Gravwell kit server, as shown in Figure \ref{fig:available-kits}. \begin{figure} @@ -161,7 +161,7 @@ \subsection{Exploring a Kit} \section{Managing Installed Kits} \subsection{Upgrading Kits} - +\index{Kits!upgrading} Once a kit has been installed, little administration is required. The sole point of manual intervention required is \emph{upgrading} a kit when a new version comes out. Gravwell will periodically push updates to the official kit server. When one of your installed kits has an update available, an ``Upgrade'' button will appear on that kit's tile, as shown in Figure \ref{fig:upgradekit} \begin{figure}[H] @@ -175,7 +175,7 @@ \subsection{Upgrading Kits} Be warned that upgrading a kit to a new version involves the complete deletion of the previous version's contents. Do not click the ``Deploy'' button at the end of the wizard until you are prepared for this to happen! \subsection{Uninstalling Kits} - +\index{Kits!uninstalling} To remove an installed kit, enter kit management mode by clicking the ``Manage Kits'' button in the upper-right corner of the main kits page. Then select the trash can icon on the desired kit. A dialog will pop up for confirmation, as shown in Figure \ref{fig:uninstall-confirm}. If you then click ``Uninstall'', the kit will be removed, \emph{unless} you have manually changed any of the kit contents. If you have modified any of the kit items, you will see a second dialog warning you of this fact and allowing one last chance to abort the process, as seen in Figure \ref{fig:uninstall-warn} \begin{figure}[H] @@ -242,7 +242,7 @@ \section{Hands-on Lab: Installing Kits} When you're done, return to the `Manage kits' page and uninstall the kit. Verify that none of the Netflow dashboards or other objects still exist. Note that there is still a kit installed--which kit is it, and why was it installed? \section{Building Kits} - +\index{Kits!building} Although Gravwell distributes pre-built official kits, any user can build a kit themselves. This is a convenient way to share objects built on one Gravwell instance with another instance. Note that kits built like this are not signed by Gravwell and therefore can only be installed by administrators. You can build a kit by clicking the `Build' button on the Manage Kits page. This launches the kit-building wizard. On the first page, seen in Figure \ref{fig:buildwizard1}, you set general options. The name should be a user-friendly short name like ``Network Enrichment'', ``Zeek Kit'', etc. The kit ID should be a namespaced and unique ID for your kit; although the field is free-form, we recommend using domain namespaces, e.g. ``io.gravwell.example''. The Version field sets a version for the kit, which is useful for upgrades if you decide to set up your own kit server. The optional Gravwell minimum and maximum versions allow you to restrict the kit's compatiblity to particular versions of Gravwell. Finally, the kit icon field lets you optionally set a small image which may be used by Gravwell to help identify the kit to users. diff --git a/LabSetup/labsetup.tex b/LabSetup/labsetup.tex index cceb823..4cf8919 100644 --- a/LabSetup/labsetup.tex +++ b/LabSetup/labsetup.tex @@ -1,6 +1,6 @@ \chapter{Lab Setup and Docker Testing} -We will be making extensive use of the Docker container platform during +We will be making extensive use of the Docker\index{Docker} container platform during the hands on exercises. The purpose of this chapter is to ensure that all training attendees have a Linux system with a functioning Docker installation. To fully participate in the training each attendee will diff --git a/Search/search.tex b/Search/search.tex index b5a1480..42b1763 100644 --- a/Search/search.tex +++ b/Search/search.tex @@ -1,7 +1,7 @@ \chapter{Searching} \section{Search Pipeline Architectural Overview} - +\index{Search}\index{Search!pipeline} Gravwell search is designed to be an asynchronous pipeline that behaves somewhat like a stream processor. The pipeline is transparently distributed across indexers, enabling distributed search without the @@ -20,18 +20,18 @@ \section{Search Pipeline Architectural Overview} \begin{itemize} \item - Tags. A tag is a data group which informs the indexers what set of + Tags.\index{Tags} A tag is a data group which informs the indexers what set of data to feed into a pipeline. Indexers use tags to transparently bind to the appropriate wells in order to feed data to the pipeline. \item - Data, composed of entries. These entries are stored on disk by + Data, composed of entries\index{Entries}. These entries are stored on disk by indexers until a search is run. \item - Search modules. These apply structure, extract elements, filter, and + Search modules.\index{Search!modules} These apply structure, extract elements, filter, and condense. Search modules do most of the ``heavy lifting'' in the pipeline. \item - Render module. Any given pipeline has only one render module which + Render module.\index{Search!renderers} Any given pipeline has only one render module which must be the last module. Once entries have been processed by the search modules, the render module formats them to display to the user. If no render module is specified in a search, the text module is @@ -70,7 +70,7 @@ \section{Search Pipeline Architectural Overview} \section{Query Entries and Tags} - +\index{Search!entries}\index{Entries}\index{Tags} The first part of any query is the \emph{tag set}; all queries operate on at least one tag, but a single query can operate on multiple tags. If no tag is specified, the \code{default} tag is assumed. Every entry in @@ -237,7 +237,7 @@ \subsection{Hands-on Lab: Basic Filtering} \section{Entries, Enumerated Values, and Field Extraction} - +\index{Search!enumerated values}\index{Enumerated Values} One of the most common operations any Gravwell user will perform is field extraction. Gravwell is designed to operate on unstructured data, meaning that you don't necessarily have to understand the form of your data @@ -436,7 +436,7 @@ \subsection{Hands-on Lab: Searching with Enumerated Values and Field Extraction} \section{Search Modules} - +\index{Search!modules} Gravwell supports a wide variety of search modules that are designed to perform various functions. Many of the search modules are used to perform feature extraction such as \code{json}, \code{syslog}, \code{cef}, @@ -477,7 +477,7 @@ \section{Search Modules} \section{Inline Filtering} - +\index{Search!filtering} Many search modules support what we call ``inline filtering''. Inline filtering allows the module to extract a value and filter on it using its native type without involving another module. There are some real @@ -693,7 +693,7 @@ \subsubsection{Lab Tasks} \section{Render Modules} - +\index{Search!renderers} Gravwell provides a selection of render modules to help users display their results in the most comprehensible manner possible. Render modules are in charge of receiving data from the search module pipeline and @@ -706,7 +706,7 @@ \section{Render Modules} original data is expired or purposefully deleted. \subsection{Temporal vs. Non-Temporal Rendering} - +\index{Search!temporal vs. non-temporal} In discussing renderers, a distinction should be made between temporal and non-temporal mode. Most searches will operate in \emph{temporal mode}, where the entries arrive at the renderer in order of their @@ -817,7 +817,7 @@ \subsection{Hands-on Lab: Temporal vs. Non-Temporal Rendering} \subsection{Downloading Results} - +\index{Search!downloading}\index{Search!renderers} All renderers allow the user to download results in at least one format. @@ -844,7 +844,7 @@ \subsection{Downloading Results} \subsection{Text/Raw Renderers} - +\index{Renderers!text}\index{Renderers!raw} These renderers provide only the most basic functionality but are useful when doing initial explorations on data. The text renderer is designed to show human readable entries in a text format, as seen in Figure \ref{fig:text-entries}. Any @@ -897,7 +897,7 @@ \subsection{Text/Raw Renderers} \clearpage \subsection{Table Renderer} - +\index{Renderers!table} The table renderer is used to create tables. The renderer's arguments specify which columns should be shown in the table. Arguments must be enumerated values, ``DATA'', ``TAG'', ``TIMESTAMP'', or ``SRC''. @@ -984,7 +984,7 @@ \subsubsection{Using the -nt option} \clearpage \subsection{Chart Renderer} - +\index{Renderers!chart} The chart renderer is used display aggregate results such as trends, quantities, counts, and other numerical data. Charting will plot an enumerated value with an optional ``by'' parameter. For example, if @@ -1023,7 +1023,7 @@ \subsection{Chart Renderer} \clearpage \subsection{Mapping Modules} - +\index{Renderers!heatmap}\index{Renderers!pointmap} The \code{pointmap} and \code{heatmap} renderer modules translate search results onto a map. Both place entries on the map based on locations in enumerated values. By default, the modules look for an enumerated value called @@ -1124,7 +1124,7 @@ \subsubsection{Heatmap} \clearpage \subsection{Stackgraph Renderer} - +\index{Renderers!stack graph} The \code{stackgraph} renderer is used to display horizontal bar graphs with stacked data points. A stackgraph is useful in displaying the magnitude of results that are accumulated from multiple components @@ -1198,8 +1198,8 @@ \subsubsection{Stackgraph Example: Failed SSH Logins by Country \& User} \clearpage \subsection{Force-Directed Graph Renderer} - -The force directed graph (fdg) module is used to generate a directed +\index{Renderers!force-directed graph} +The force-directed graph (fdg) module is used to generate a directed graph using node pairs and optional grouping. The fdg module accepts source and destination groups as well as a weight value for the resulting edge. @@ -1260,7 +1260,7 @@ \subsubsection{Examples} \section{Resources} \label{sec:resources} - +\index{Resources} \emph{Resources} are persistent data objects which can be used in search queries. Resources can be manually uploaded by a user or automatically created by search modules. Resources are used by the lookup module to @@ -1476,7 +1476,7 @@ \subsection{Hands-on Lab: Enriching Netflow with GeoIP} \clearpage \section{Data Fusion} - +\index{Data fusion}\index{Search!data fusion} The Gravwell query pipeline supports what we call module interleaving, which is basically the ability to specify that a given module should only process specific tags. This allows Gravwell to operate on multiple data formats @@ -1730,7 +1730,7 @@ \subsubsection{Lab Questions} \section{Query Optimization} \label{sec:query-optimization} - +\index{Search!optimization} The Gravwell query pipeline is very powerful. Searches are distributed to all nodes in the cluster, who intelligently share the load in order to maximize computing resources at top efficiency. It is possible, however, @@ -1742,9 +1742,9 @@ \section{Query Optimization} down into three categories: parsers, operators, and condensers. Render modules are not a consideration when it comes to optimization, as they are always the final module in a pipeline. +\index{Search!modules} \subsection{Parsing modules} - A parsing module is one that performs field extraction over a data entry. Typically, these modules are slower than operating modules as they usually read and process the entire data entry and create @@ -1765,7 +1765,7 @@ \subsection{Parsing modules} these are run very close to the data. \subsection{Parsing modules and Accelerators} - +\index{Acceleration} Accelerators are covered in Section \ref{sec:acceleration} and in the online Gravwell documentation, but they should be mentioned when discussing query optimization. When turned on, they provide very powerful filtering speedups using our hybrid @@ -1948,7 +1948,7 @@ \subsection{Hands-on Lab: Optimizing Queries} \section{Auto-extractors} - +\index{Search!extractors}\index{Auto-extractors|see {Extractors}}\index{Extractors} Gravwell enables per-tag extraction definitions that can ease the complexity of interacting with unstructured data and data formats that are not self-describing. Unstructured data often requires complicated @@ -2018,7 +2018,7 @@ \subsection{Auto-Extractor Configuration} does not take any arguments, so args will always be empty). \subsection{Managing Auto-Extractors in the GUI} - +\index{GUI!extractors} The Gravwell GUI can be used to manage extractors. The screenshot in Figure \ref{fig:extractors-page} shows the Extractors page with four defined extractors. @@ -2519,7 +2519,7 @@ \subsection{Hands-On Lab: Extractors} \section{Backgrounded and Saved Searches} - +\index{Search!background searches}\index{Search!saved searches} Backgrounding a search allows a user to do other things while a search completes--it is conceptually similar to running a Unix command with an `\&' at the end of the command line. Searches can be launched in the @@ -2681,7 +2681,7 @@ \subsection{Hands-On Lab: Groups and Sharing} \section{Dashboards} \label{sec:dashboards} - +\index{Dashboards} Gravwell dashboards put relevant information in a heads-up format suitable for continuous monitoring and situational awareness. Dashboards are a collection of searches that are all executed in @@ -2740,7 +2740,7 @@ \section{Dashboards} Once a few tiles have been added to the dashboard, they can be rearranged and resized by clicking and dragging the tiles. Note that after making a change, you must click the ``Save changes'' popup which appears in the lower right corner. \subsection{Live Update} - +\index{Dashboards!live update} Dashboards can be configured to \emph{live update}, meaning they will re-run queries and display new results after a set period of time. To enable this, click the 3-dot menu on the dashboard and select ``Settings''. Within the settings page, pick the ``Timeframe \& Live Update'' tab, then turn the ``Enable live update'' toggle on, as seen in Figure \ref{fig:live-update}. The update interval is configurable; if the queries in the dashboard cover a long timeframe or process a lot of data, consider setting the interval to a higher value so as to reduce the load on the system. \begin{figure} @@ -2849,7 +2849,7 @@ \subsection{Hands-on Lab: Network Activity Dashboard} \end{Verbatim} \section{Templates} - +\index{Search!templates}\index{Dashboards!templates} Templates are stored Gravwell queries which require one or more variables to run. This lets you build advanced queries which can investigate a particular IP address. They are particularly effective when inserted into a dashboard (section \ref{sec:dashboards}), or when coupled with an actionable (\ref{sec:actionables}). Templates are managed via the templates page, accessed through the main menu under the `Tools and Resources' section. A template consists of a name, a description, and the query itself. Inside the query, use words wrapped inside doubled percent signs to denote variables, e.g. \code{\%\%IPADDR\%\%} as seen in Figure \ref{fig:template-editor}. @@ -2878,7 +2878,7 @@ \section{Templates} \section{Actionables} \label{sec:actionables} - +\index{Actionables}\index{Dashboards!actionables} Actionables provide a way to create custom triggers and menus that key on any text rendered in a query and take one or more actions when selected. Similar to an HTML hyperlink, actionables can be used to open external URLs that key on data, but actionables can also be leveraged to submit new Gravwell queries, launch dashboards, and execute templates. Actionables are created by specifying one or more regular expressions, along with one or more actions. Gravwell automatically parses all text rendered with the table and chart renderers, bringing up appropriate actionable context menues when the text is clicked, as seen in Figure \ref{fig:actionables-overview}. @@ -2945,7 +2945,7 @@ \subsubsection{Triggers} \section{Compound Queries} \label{sec:compoundqueries} - +\index{Compound queries}\index{Search!compound queries} Compound Queries is an extension to the query language that allows you to perform multiple, in-order, queries, and use the output from a previous query anywhere in the pipeline of the next, similar to an SQL JOIN. You can combine diff --git a/Security/security.tex b/Security/security.tex index 8a2842e..9322dab 100644 --- a/Security/security.tex +++ b/Security/security.tex @@ -1,5 +1,5 @@ \chapter{Securing Gravwell} - +\index{Security} This chapter discusses security considerations for a Gravwell cluster. It will discuss indexer and webserver security, securing ingesters, and user authentication. It will also demonstrate some of the risks of an @@ -35,7 +35,7 @@ \chapter{Securing Gravwell} \section{TLS/HTTPS} - +\index{Security!encryption}\index{Security!TLS}\index{Webservers!security} By default, Gravwell is configured to communicate via unencrypted channels. This means that communications between the webserver and the user, as well as communication between ingesters and indexers, will be @@ -146,7 +146,7 @@ \subsection{Install a self-signed certificate} \end{Verbatim} \section{Indexer Security} - +\index{Indexers!security} Gravwell indexers communicate with two components: ingesters, and webservers. Users never interact directly with indexers. @@ -183,7 +183,7 @@ \subsection{Indexer-Ingester Communications} \section{Webserver Security} - +\index{Webservers!security} Webservers communicate with user clients, indexers, the search agent, and optionally the datastore. As the gateway into the Gravwell system, they are often accessible from public networks and should therefore be @@ -252,7 +252,7 @@ \subsection{Webserver-Datastore Communication} \section{Ingester security} - +\index{Ingesters!security} Ingesters communicate only with indexers, using a shared ingest secret which can be stored via the methods described at the beginning of this chapter. As documented in the section on indexer security, ingesters may @@ -270,7 +270,7 @@ \section{Ingester security} order to segment your ingesters into different security enclaves to help ameliorate this risk. \section{User Authentication and Lockout} - +\index{Users} As mentioned in the webserver section, users are authenticated to the webserver by sending their username and password in an HTTP POST request. If the webserver is not configured to offer HTTPS connections, diff --git a/UserMgmt/usermgmt.tex b/UserMgmt/usermgmt.tex index 3519f1f..66c3f66 100644 --- a/UserMgmt/usermgmt.tex +++ b/UserMgmt/usermgmt.tex @@ -13,7 +13,7 @@ \chapter{User and Group Management} of user logins, user searches, and more. \section{Managing Users} - +\index{Users}\index{GUI!users} The Gravwell GUI includes an admin-only page for managing users, located in the menu under the Administrator section. From this page, an administrator can create new users, modify existing users, or delete @@ -76,7 +76,7 @@ \section{Managing Users} \end{figure} \section{Managing Groups} - +\index{Groups}\index{GUI!groups} Group management is very similar to user management. The Groups page in the Administration section of the menu will initially show no groups, as in Figure \ref{fig:fresh-groups} diff --git a/Webserver/webserver.tex b/Webserver/webserver.tex index ef618cf..cf11e65 100644 --- a/Webserver/webserver.tex +++ b/Webserver/webserver.tex @@ -1,5 +1,5 @@ \chapter{Webserver Configuration} - +\index{Webservers} Gravwell's webserver component provides users with access to Gravwell's search capabilities. The simplest system consists of one indexer and one webserver, both on the same machine. More complex variations are @@ -8,7 +8,7 @@ \chapter{Webserver Configuration} configuration options for the Gravwell webserver. \section{Basic Configuration} - +\index{gravwell.conf@\code{gravwell.conf}} The webserver is configured via \code{gravwell.conf}. The following basic options can be set in gravwell.conf, but defaults are usually functional: @@ -123,7 +123,7 @@ \subsubsection{Lab questions} \end{enumerate} \section{Configuring Multiple Webservers} - +\index{Distributed webservers}\index{Webservers!distributed}\index{Datastore} Gravwell can use multiple webservers to load-balance user requests. These webservers must coordinate with each other to manage users, searches, etc. This coordination is handled through the \textbf{datastore}, a @@ -243,7 +243,7 @@ \subsubsection{Lab questions} % TODO: Kris please update this for the Gravwell load balancer %%%%%%%%%% \section{Setting Up a Load-Balancer} - +\index{Traefik}\index{Load balancer} Multiple webservers work well when placed behind a load-balancing proxy, such as nginx, Traefik, or Gravwell's own load-balancer. When using a load-balancer, users simply access diff --git a/master.tex b/master.tex index 89e86f3..ece877c 100644 --- a/master.tex +++ b/master.tex @@ -7,6 +7,8 @@ \usepackage{float} \usepackage{xcolor} +\usepackage{makeidx} + \usepackage{import} \usepackage{fancyvrb} \usepackage{fvextra} @@ -20,6 +22,9 @@ margin=1in, } +% enable index commands +\makeindex + % always center images, tables, etc. \makeatletter \g@addto@macro\@floatboxreset\centering @@ -155,4 +160,6 @@ %\subimport{CloudArchive/}{cloudarchive} +\printindex + \end{document}