rally-tracks-throughput/http_logs at master · gwdg/rally-tracks-throughput

Name	Name	Last commit message	Last commit date
parent directory ..
_tools	_tools
challenges	challenges
operations	operations
README.md	README.md
files.txt	files.txt
index-runtime-fields.json	index-runtime-fields.json
index.json	index.json
track.json	track.json
track.py	track.py

HTTP logs track

This track is based on Web server logs from the 1998 Football world cup.

Modifications:

Applied number to IP conversion as suggested in the original readme
Removed illegal characters in "object_mappings.sort"
Transformed the source data to a bulk-friendly JSON format (ignoring all entries that contained unrecognised / problematic characters and invalid IP addresses like "0"; around 0.001% of the source data was lost due to this approach)

Example Document

{
  "@timestamp": 898459201,
  "clientip": "211.11.9.0",
  "request": "GET /english/index.html HTTP/1.0",
  "status": 304,
  "size": 0
}

Alternatively, an unparsed set of documents are also provided. The unparsed data set is identical to the standard data set, except the timestamp is ISO8601 and all the fields are unparsed via the message field. For example:

{"message" : "211.11.9.0 - - [1998-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"}

Parameters

This track allows to overwrite the following parameters with Rally 0.8.0+ using --track-params:

bulk_size (default: 5000)
bulk_indexing_clients (default: 8): Number of clients that issue bulk indexing requests.
ingest_percentage (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
conflicts (default: "random"): Type of id conflicts to simulate. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
conflict_probability (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge. Combining conflicts=sequential and conflict-probability=0 makes Rally generate index ids by itself, instead of relying on Elasticsearch's automatic id generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>_.
number_of_replicas (default: 0)
number_of_shards (default: 5)
source_enabled (default: true): A boolean defining whether the _source field is stored in the index.
index_settings: A list of index settings. Index settings defined elsewhere (e.g. number_of_replicas) need to be overridden explicitly.
cluster_health (default: "green"): The minimum required cluster health.
ingest_pipeline: Only applicable for --challenge=append-index-only-with-ingest-pipeline, selects which ingest node pipeline to run. Valid options are 'baseline' (default), 'grok' and 'geoip'. For example: --challenge=append-index-only-with-ingest-pipeline --track-params="ingest_pipeline:'baseline'"
runtime_fields: If defined the challenge loads the unparsed set of documents, indexing the @timestamp and the raw message field and the runtime fields required for the runtime-fields challenge.
error_level (default: "non-fatal"): Available for bulk operations only to specify ignore-response-error-level.

License

Original license text:

           Copyright (C) 1997, 1998, 1999 Hewlett-Packard Company
                         ALL RIGHTS RESERVED.
 
  The enclosed software and documentation includes copyrighted works
  of Hewlett-Packard Co. For as long as you comply with the following
  limitations, you are hereby authorized to (i) use, reproduce, and
  modify the software and documentation, and to (ii) distribute the
  software and documentation, including modifications, for
  non-commercial purposes only.
      
  1.  The enclosed software and documentation is made available at no
      charge in order to advance the general development of
      the Internet, the World-Wide Web, and Electronic Commerce.
 
  2.  You may not delete any copyright notices contained in the
      software or documentation. All hard copies, and copies in
      source code or object code form, of the software or
      documentation (including modifications) must contain at least
      one of the copyright notices.
 
  3.  The enclosed software and documentation has not been subjected
      to testing and quality control and is not a Hewlett-Packard Co.
      product. At a future time, Hewlett-Packard Co. may or may not
      offer a version of the software and documentation as a product.
  
  4.  THE SOFTWARE AND DOCUMENTATION IS PROVIDED "AS IS".
      HEWLETT-PACKARD COMPANY DOES NOT WARRANT THAT THE USE,
      REPRODUCTION, MODIFICATION OR DISTRIBUTION OF THE SOFTWARE OR
      DOCUMENTATION WILL NOT INFRINGE A THIRD PARTY'S INTELLECTUAL
      PROPERTY RIGHTS. HP DOES NOT WARRANT THAT THE SOFTWARE OR
      DOCUMENTATION IS ERROR FREE. HP DISCLAIMS ALL WARRANTIES,
      EXPRESS AND IMPLIED, WITH REGARD TO THE SOFTWARE AND THE
      DOCUMENTATION. HP SPECIFICALLY DISCLAIMS ALL WARRANTIES OF
      MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
  
  5.  HEWLETT-PACKARD COMPANY WILL NOT IN ANY EVENT BE LIABLE FOR ANY
      DIRECT, INDIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES
      (INCLUDING LOST PROFITS) RELATED TO ANY USE, REPRODUCTION,
      MODIFICATION, OR DISTRIBUTION OF THE SOFTWARE OR DOCUMENTATION.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http_logs

http_logs

README.md

HTTP logs track

Example Document

Parameters

License

Files

http_logs

Directory actions

More options

Directory actions

More options

Latest commit

History

http_logs

Folders and files

parent directory

README.md

HTTP logs track

Example Document

Parameters

License