Indexing is very easy in the client. Since associative arrays can easily be converted into JSON documents, indexing documents is simply a matter of providing the correctly structured associative array and calling a method.
When indexing a document, you can either provide an ID or let elasticsearch generate one for you.
$params = array();
$params['body'] = array('testField' => 'abc');
$params['index'] = 'my_index';
$params['type'] = 'my_type';
$params['id'] = 'my_id';
// Document will be indexed to my_index/my_type/my_id
$ret = $client->index($params);
$params = array();
$params['body'] = array('testField' => 'abc');
$params['index'] = 'my_index';
$params['type'] = 'my_type';
// Document will be indexed to my_index/my_type/<autogenerated_id>
$ret = $client->index($params);
Like most of the other APIs, there are a number of other parameters that can be specified. They are specified in the parameter array just like index
or type
. For example, let’s set the routing and timestamp of this new document:
$params = array();
$params['body'] = array('testField' => 'xyz');
$params['index'] = 'my_index';
$params['type'] = 'my_type';
$params['routing'] = 'company_xyz';
$params['timestamp'] = strtotime("-1d");
$ret = $client->index($params);
Elasticsearch also supports bulk indexing of documents. The client provides an interface to bulk index too, but it is less user-friendly. In the future we will be adding "helper" methods that simplify this process.
The bulk API method expects a bulk body identical to the kind elasticsearch expects: JSON action/metadata pairs separated by new lines. A common bulk-creation pattern is as follows:
for($i = 0; $i < 100; $i++) {
$params['body'][] = array(
'index' => array(
'_id' => $i
)
);
$params['body'][] = array(
'my_field' => 'my_value',
'second_field' => 'some more values'
);
}
$responses = $client->bulk($params);
You can of course use any of the available bulk methods. Here is an example of using upserts:
for($i = 0; $i < 100; $i++) {
$params['body'][] = array(
'update' => array(
'_id' => $i
)
);
$params['body'][] = array(
'doc_as_upsert' => 'true',
'doc' => array(
'my_field' => 'my_value',
'second_field' => 'some more values'
)
);
}
$responses = $client->bulk($params);
If you are specifying bulks manually or extracting them from an existing JSON file, Nowdocs are probably the best method. Otherwise, when you construct them algorithmically, take care to ensure newlines ("\n") separates all lines…including the last!
$params = array();
$params['body'] = <<<'EOT'
{ "index" : { "_index" : "my_index", "_type" : "my_type", "_id" : "1" } }
{ "field1" : "value1" }
EOT;
$ret = $client->bulk($params);
Like the Bulk API, if you specify the index/type in the parameters, you can omit it from the bulk request itself (which often saves a lot of space and redundant data transfer):
$params = array();
$params['body'] = <<<'EOT'
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
EOT;
$params['index'] = 'my_index';
$params['type'] = 'my_type';
$ret = $client->bulk($params);