Skip to content

Latest commit

 

History

History
 
 

elasticsearch_collector

elasticsearch_collector

This agent collects data from elasticsearch and sends it to Insightfinder.

Installing the Agent

Before install agent

Install freetds for linux/osx
osx: brew install freetds
linux: yum install freetds

Short Version

bash <(curl -sS https://raw.githubusercontent.com/insightfinder/InsightAgent/master/utils/fetch-agent.sh) elasticsearch_collector && cd elasticsearch_collector
vi config.ini
sudo ./setup/install.sh --create  # install on localhost
                                  ## or on multiple nodes
sudo ./offline/remote-cp-run.sh list_of_nodes

See the offline README for instructions on installing prerequisites.

Long Version

Download the agent tarball and untar it:
curl -fsSLO https://github.com/insightfinder/InsightAgent/raw/master/elasticsearch_collector/elasticsearch_collector.tar.gz
tar xvf elasticsearch_collector.tar.gz && cd elasticsearch_collector
Set up config.ini
python configure.py

See below for a further explanation of each variable.

Automated Install (local or remote)

Review propsed changes from install:
sudo ./setup/install.sh
Once satisfied, run:
sudo ./setup/install.sh --create
To deploy on multiple hosts, instead call
sudo ./offline/remote-cp-run.sh list_of_nodes -f <nodelist_file>

Where list_of_nodes is a list of nodes that are configured in ~/.ssh/config or otherwise reachable with scp and ssh.

Manual Install (local only)

Check Python version

Agent required Python 3 environment.

Setup pip & required packages:
sudo ./setup/pip-config.sh
Test the agent:
python getmessages_elasticsearch_collector.py -t
If satisfied with the output, configure the agent to run continuously:
sudo ./setup/cron-config.sh

Config Variables

  • es_uris: A comma delimited list of RFC-1738 formatted urls <scheme>://[<username>:<password>@]hostname:port
  • query_json: JSON to add the the query
  • query_chunk_size: The maximum messages number of each query, default is 5000, max is 10000.
  • indeces: Indeces to search over (comma-separated, wildcards supported)
  • port: Port to connect to for ES. Overridden if in URI
  • http_auth: username:password used to connect to ES. Overridden if in URI
  • use_ssl: True or False if SSL should be used. Overridden if URI scheme is https
  • ssl_assert_hostname: True or False if hostname verification should be done
  • ssl_assert_fingerprint: True or False if fingerprint verification should be done
  • ssl_version: Version of SSL to use - one of SSLv23 (default), SSLv2, SSLv3, TLSv1
  • verify_certs: True or False if certificates should be verified
  • ca_certs: Path to CA bundle
  • client_cert: Path to certificate
  • client_key: Path to client
  • his_time_range: History data time range, Example: 2020-04-14 00:00:00,2020-04-15 00:00:00. If this option is set, the agent will query metric values by time range.
  • data_format: The format of the data to parse: RAW, RAWTAIL, CSV, CSVTAIL, XLS, XLSX, JSON, JSONTAIL, AVRO, or XML. *TAIL formats keep track of the current file being read & the position in the file.
  • timestamp_format: Format of the timestamp, in python arrow. If the timestamp is in Unix epoch, this can be set to epoch. If the timestamp is split over multiple fields, curlies can be used to indicate formatting, ie: YYYY-MM-DD HH:mm:ss ZZ; alternatively, if the timestamp can be in one of multiple fields, a priority list of field names can be given: timestamp1,timestamp2.
  • timezone: Timezone of the timestamp data stored in/returned by the DB. Note that if timezone information is not included in the data returned by the DB, then this field has to be specified.
  • timestamp_field: Field name for the timestamp. Default is timestamp.
  • target_timestamp_timezone: Timezone of the timestamp data to be sent and stored in InsightFinder. Default value is UTC. Only if you wish to store data with a time zone other than UTC, this field should be specified to be the desired time zone.
  • component_field: Field name for the component name.
  • instance_field: Field name for the instance name. If no instance given, the elasticsearch's server name will be used.
  • instance_whitelist: This field is a regex string used to define which instances will be filtered.
  • device_field: Field name for the device/container for containerized projects. This can also use a priority list, field names can be given: device1,device2.
  • data_fields: Comma-delimited list of field names to use as data fields. If not set, all fields will be reported. Each data field can either be a field name (name) or a labeled field (<name>::<value>.
  • agent_http_proxy: HTTP proxy used to connect to the agent.
  • agent_https_proxy: As above, but HTTPS.
  • user_name: User name in InsightFinder
  • license_key: License Key from your Account Profile in the InsightFinder UI.
  • token: Token from your Account Profile in the InsightFinder UI.
  • project_name: Name of the project created in the InsightFinder UI.
  • project_type: Type of the project - one of metric, metricreplay, log, logreplay, incident, incidentreplay, alert, alertreplay, deployment, deploymentreplay.
  • sampling_interval: How frequently (in Minutes) data is collected. Should match the interval used in project settings.
  • run_interval: How frequently (in Minutes) the agent is ran. Should match the interval used in cron.
  • chunk_size_kb: Size of chunks (in KB) to send to InsightFinder. Default is 2048.
  • if_url: URL for InsightFinder. Default is https://app.insightfinder.com.
  • if_http_proxy: HTTP proxy used to connect to InsightFinder.
  • if_https_proxy: As above, but HTTPS.