Elasticsearch input plugin
- Plugin version: v5.2.0 (Other versions)
- Released on: 2025-06-06
- Changelog
For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.
Read from an Elasticsearch cluster, based on search query results. This is useful for replaying test logs, reindexing, etc. You can periodically schedule ingestion using a cron syntax (see schedule
setting) or run the query one time to load data into Logstash.
Example:
input {
# Read all documents from Elasticsearch matching the given query
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match": { "statuscode": 200 } }, "sort": [ "_doc" ] }'
}
}
This would create an Elasticsearch query with the following format:
curl 'http://localhost:9200/logstash-*/_search?&scroll=1m&size=1000' -d '{
"query": {
"match": {
"statuscode": 200
}
},
"sort": [ "_doc" ]
}'
Input from this plugin can be scheduled to run periodically according to a specific schedule. This scheduling syntax is powered by rufus-scheduler. The syntax is cron-like with some extensions specific to Rufus (e.g. timezone support ).
Examples:
* 5 * 1-3 * |
will execute every minute of 5am every day of January through March. |
0 * * * * |
will execute on the 0th minute of every hour every day. |
0 6 * * * America/Chicago |
will execute at 6:00am (UTC/GMT -5) every day. |
Further documentation describing this syntax can be found here.
Authentication to a secure Elasticsearch cluster is possible using one of the following options:
Authorization to a secure Elasticsearch cluster requires read
permission at index level and monitoring
permissions at cluster level. The monitoring
permission at cluster level is necessary to perform periodic connectivity checks.
When ECS compatibility is disabled, docinfo_target
uses the "@metadata"
field as a default, with ECS enabled the plugin uses a naming convention "[@metadata][input][elasticsearch]"
as a default target for placing document information.
The plugin logs a warning when ECS is enabled and target
isn’t set.
Set the target
option to avoid potential schema conflicts.
When this input plugin cannot create a structured Event
from a hit result, it will instead create an Event
that is tagged with _elasticsearch_input_failure
whose [event][original]
is a JSON-encoded string representation of the entire hit.
Common causes are:
- When the hit result contains top-level fields that are reserved in Logstash but do not have the expected shape. Use the
target
directive to avoid conflicts with the top-level namespace. - When
docinfo
is enabled and the docinfo fields cannot be merged into the hit result. Combinetarget
anddocinfo_target
to avoid conflict.
Technical Preview: Tracking a field’s value
The feature that allows tracking a field’s value across runs is in Technical Preview. Configuration options and implementation details are subject to change in minor releases without being preceded by deprecation warnings.
Some uses cases require tracking the value of a particular field between two jobs. Examples include:
- avoiding the need to re-process the entire result set of a long query after an unplanned restart
- grabbing only new data from an index instead of processing the entire set on each job.
The Elasticsearch input plugin provides the tracking_field
and tracking_field_seed
options. When tracking_field
is set, the plugin records the value of that field for the last document retrieved in a run into a file. (The file location defaults to last_run_metadata_path
.)
You can then inject this value in the query using the placeholder :last_value
. The value will be injected into the query before execution, and then updated after the query completes if new data was found.
This feature works best when:
- the query sorts by the tracking field,
- the timestamp field is added by Elasticsearch, and
- the field type has enough resolution so that two events are unlikely to have the same value.
Consider using a tracking field whose type is date nanoseconds. If the tracking field is of this data type, you can use an extra placeholder called :present
to inject the nano-second based value of "now-30s". This placeholder is useful as the right-hand side of a range filter, allowing the collection of new data but leaving partially-searchable bulk request data to the next scheduled job.
This section contains a series of steps to help you set up the "tailing" of data being written to a set of indices, using a date nanosecond field added by an Elasticsearch ingest pipeline and the tracking_field
capability of this plugin.
Create ingest pipeline that adds Elasticsearch’s
_ingest.timestamp
field to the documents asevent.ingested
:PUT _ingest/pipeline/my-pipeline { "processors": [ { "script": { "lang": "painless", "source": "ctx.putIfAbsent(\"event\", [:]); ctx.event.ingested = metadata().now.format(DateTimeFormatter.ISO_INSTANT);" } } ] }
Create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:
PUT /_template/my_template { "index_patterns": ["test-*"], "settings": { "index.default_pipeline": "my-pipeline", }, "mappings": { "properties": { "event": { "properties": { "ingested": { "type": "date_nanos", "format": "strict_date_optional_time_nanos" } } } } } }
Define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
{ "query": { "range": { "event.ingested": { "gt": ":last_value", "lt": ":present" } } }, "sort": [ { "event.ingested": { "order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type": "date_nanos" } } ] }
Configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the
event.ingested
field:input { elasticsearch { id => tail_test_index hosts => [ 'https://..'] api_key => '....' index => 'test-*' query => '{ "query": { "range": { "event.ingested": { "gt": ":last_value", "lt": ":present"}}}, "sort": [ { "event.ingested": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" } } ] }' tracking_field => "[event][ingested]" slices => 5 schedule => '* * * * *' schedule_overlap => false } }
- optional use of slices to speed data processing, should be equal to or less than number of primary shards
- every minute
- don't accumulate jobs if one takes longer than 1 minute
With this sample setup, new documents are indexed into a test-*
index. The next scheduled run:
- selects all new documents since the last observed value of the tracking field,
- uses Point in time (PIT) + Search after to paginate through all the data, and
- updates the value of the field at the end of the pagination.
Technical Preview
The ES|QL feature that allows using ES|QL queries with this plugin is in Technical Preview. Configuration options and implementation details are subject to change in minor releases without being preceded by deprecation warnings.
Elasticsearch Query Language (ES|QL) provides a SQL-like interface for querying your Elasticsearch data.
To use ES|QL, this plugin needs to be installed in Logstash 8.17.4 or newer, and must be connected to Elasticsearch 8.11 or newer.
To configure ES|QL query in the plugin, set the query_type
to esql
and provide your ES|QL query in the query
parameter.
ES|QL is evolving and may still have limitations with regard to result size or supported field types. We recommend understanding ES|QL current limitations before using it in production environments.
The following is a basic scheduled ES|QL query that runs hourly:
input {
elasticsearch {
id => hourly_cron_job
hosts => [ 'https://..']
api_key => '....'
query_type => 'esql'
query => '
FROM food-index
| WHERE spicy_level = "hot" AND @timestamp > NOW() - 1 hour
| LIMIT 500
'
schedule => '0 * * * *'
}
}
- every hour at min 0
Set config.support_escapes: true
in logstash.yml
if you need to escape special chars in the query.
With ES|QL query, Logstash doesn’t generate event.original
.
ES|QL returns query results in a structured tabular format, where data is organized into columns (fields) and values (entries). The plugin maps each value entry to an event, populating corresponding fields. For example, a query might produce a table like:
timestamp |
user_id |
action |
status.code |
status.desc |
---|---|---|---|---|
2025-04-10T12:00:00 | 123 | login | 200 | Success |
2025-04-10T12:05:00 | 456 | purchase | 403 | Forbidden (unauthorized user) |
For this case, the plugin emits two events look like
[
{
"timestamp": "2025-04-10T12:00:00",
"user_id": 123,
"action": "login",
"status": {
"code": 200,
"desc": "Success"
}
},
{
"timestamp": "2025-04-10T12:05:00",
"user_id": 456,
"action": "purchase",
"status": {
"code": 403,
"desc": "Forbidden (unauthorized user)"
}
}
]
If your index has a mapping with sub-objects where status.code
and status.desc
actually dotted fields, they appear in Logstash events as a nested structure.
ES|QL query fetches all parent and sub-fields fields if your Elasticsearch index has multi-fields or subobjects. Since Logstash events cannot contain parent field’s concrete value and sub-field values together, the plugin ignores sub-fields with warning and includes parent. We recommend using the RENAME
(or DROP
to avoid warnings) keyword in your ES|QL query explicitly rename the fields to include sub-fields into the event.
This a common occurrence if your template or mapping follows the pattern of always indexing strings as "text" (field
) + " keyword" (field.keyword
) multi-field. In this case it’s recommended to do KEEP field
if the string is identical and there is only one subfield as the engine will optimize and retrieve the keyword, otherwise you can do KEEP field.keyword | RENAME field.keyword as field
.
To illustrate the situation with example, assuming your mapping has a time time
field with time.min
and time.max
sub-fields as following:
"properties": {
"time": { "type": "long" },
"time.min": { "type": "long" },
"time.max": { "type": "long" }
}
The ES|QL result will contain all three fields but the plugin cannot map them into Logstash event. To avoid this, you can use the RENAME
keyword to rename the time
parent field to get all three fields with unique fields.
...
query => 'FROM my-index | RENAME time AS time.current'
...
For comprehensive ES|QL syntax reference and best practices, see the ES|QL documentation.
This plugin supports these configuration options plus the Common options described later.
As of version 5.0.0
of this plugin, a number of previously deprecated settings related to SSL have been removed. Please check out Elasticsearch Input Obsolete Configuration Options for details.
Also see Common options for a list of options supported by all input plugins.
- Value type is password
- There is no default value for this setting.
Authenticate using Elasticsearch API key. Note that this option also requires enabling the ssl_enabled
option.
Format is id:api_key
where id
and api_key
are as returned by the Elasticsearch Create API key API.
- Value type is string, and must contain exactly 64 hexadecimal characters.
- There is no default value for this setting.
- Use of this option requires Logstash 8.3+
The SHA-256 fingerprint of an SSL Certificate Authority to trust, such as the autogenerated self-signed CA for an Elasticsearch cluster.
- Value type is password
- There is no default value for this setting.
Cloud authentication string ("<username>:<password>" format) is an alternative for the user
/password
pair.
For more info, check out the Logstash-to-Cloud documentation.
- Value type is string
- There is no default value for this setting.
Cloud ID, from the Elastic Cloud web console. If set hosts
should not be used.
For more info, check out the Logstash-to-Cloud documentation.
- Value type is number
- Default value is
10
The maximum amount of time, in seconds, to wait while establishing a connection to Elasticsearch. Connect timeouts tend to occur when Elasticsearch or an intermediate proxy is overloaded with requests and has exhausted its connection pool.
- Value type is hash
- Default value is empty
Pass a set of key value pairs as the headers sent in each request to an elasticsearch node. The headers will be used for any kind of request. These custom headers will override any headers previously set by the plugin such as the User Agent or Authorization headers.
- Value type is boolean
- Default value is
false
If set, include Elasticsearch document information such as index, type, and the id in the event.
It might be important to note, with regards to metadata, that if you’re ingesting documents with the intent to re-index them (or just update them) that the action
option in the elasticsearch output wants to know how to handle those things. It can be dynamically assigned with a field added to the metadata.
Example
input {
elasticsearch {
hosts => "es.production.mysite.org"
index => "mydata-2018.09.*"
query => '{ "query": { "query_string": { "query": "*" } } }'
size => 500
scroll => "5m"
docinfo => true
docinfo_target => "[@metadata][doc]"
}
}
output {
elasticsearch {
index => "copy-of-production.%{[@metadata][doc][_index]}"
document_type => "%{[@metadata][doc][_type]}"
document_id => "%{[@metadata][doc][_id]}"
}
}
If set, you can use metadata information in the add_field
common option.
Example
input {
elasticsearch {
docinfo => true
docinfo_target => "[@metadata][doc]"
add_field => {
identifier => "%{[@metadata][doc][_index]}:%{[@metadata][doc][_type]}:%{[@metadata][doc][_id]}"
}
}
}
- Value type is array
- Default value is
["_index", "_type", "_id"]
If document metadata storage is requested by enabling the docinfo
option, this option lists the metadata fields to save in the current event. See Meta-Fields in the Elasticsearch documentation for more information.
Value type is string
Default value depends on whether
ecs_compatibility
is enabled:- ECS Compatibility disabled:
"@metadata"
- ECS Compatibility enabled:
"[@metadata][input][elasticsearch]"
- ECS Compatibility disabled:
If document metadata storage is requested by enabling the docinfo
option, this option names the field under which to store the metadata fields as subfields.
Value type is string
Supported values are:
disabled
: CSV data added at root levelv1
,v8
: Elastic Common Schema compliant behavior
Default value depends on which version of Logstash is running:
- When Logstash provides a
pipeline.ecs_compatibility
setting, its value is used as the default - Otherwise, the default value is
disabled
- When Logstash provides a
Controls this plugin’s compatibility with the Elastic Common Schema (ECS).
- Value type is array
- There is no default value for this setting.
List of one or more Elasticsearch hosts to use for querying. Each host can be either IP, HOST, IP:port, or HOST:port. The port defaults to 9200.
- Value type is string
- Default value is
"logstash-*"
The index or alias to search. Check out Multi Indices documentation in the Elasticsearch documentation for info on referencing multiple indices.
- Value type is string
- There is no default value for this setting.
The path to store the last observed value of the tracking field, when used. By default this file is stored as <path.data>/plugins/inputs/elasticsearch/<pipeline_id>/last_run_value
.
This setting should point to file, not a directory, and Logstash must have read+write access to this file.
- Value type is password
- There is no default value for this setting.
The password to use together with the username in the user
option when authenticating to the Elasticsearch server. If set to an empty string authentication will be disabled.
- Value type is uri
- There is no default value for this setting.
Set the address of a forward HTTP proxy. An empty string is treated as if proxy was not set, this is useful when using environment variables e.g. proxy => '${LS_PROXY:}'
.
- Value type is string
- Default value is
'{ "sort": [ "_doc" ] }'
The query to be executed. Accepted query shape is DSL or ES|QL (when query_type => 'esql'
). Read the Elasticsearch query DSL documentation or ES|QL documentation for more information.
When search_api
resolves to search_after
and the query does not specify sort
, the default sort '{ "sort": { "_shard_doc": "asc" } }'
will be added to the query. Please refer to the Elasticsearch search_after parameter to know more.
- Value can be
dsl
oresql
- Default value is
dsl
Defines the query
shape. When dsl
, the query shape must be valid Elasticsearch JSON-style string. When esql
, the query shape must be a valid ES|QL string and index
, size
, slices
, search_api
, docinfo
, docinfo_target
, docinfo_fields
, response_type
and tracking_field
parameters are not allowed.
- Value can be any of:
hits
,aggregations
,esql
- Default value is
hits
Which part of the result to transform into Logstash events when processing the response from the query.
The default hits
will generate one event per returned document (i.e. "hit").
When set to aggregations
, a single Logstash event will be generated with the contents of the aggregations
object of the query’s response. In this case the hits
object will be ignored. The parameter size
will be always be set to 0 regardless of the default or user-defined value set in this plugin.
- Value type is number
- Default value is
60
The maximum amount of time, in seconds, for a single request to Elasticsearch. Request timeouts tend to occur when an individual page of data is very large, such as when it contains large-payload documents and/or the size
has been specified as a large value.
- Value type is number
- Default value is
0
The number of times to re-run the query after the first failure. If the query fails after all retries, it logs an error message. The default is 0 (no retry). This value should be equal to or greater than zero.
Partial failures - such as errors in a subset of all slices - can result in the entire query being retried, which can lead to duplication of data. Avoiding this would require Logstash to store the entire result set of a query in memory which is often not possible.
- Value type is string
- There is no default value for this setting.
Schedule of when to periodically run statement, in Cron format for example: "* * * * *" (execute query every minute, on the minute)
There is no schedule by default. If no schedule is given, then the statement is run exactly once.
- Value type is boolean
- Default value is
true
Whether to allow queuing of a scheduled run if a run is occurring. While this is ideal for ensuring a new run happens immediately after the previous on finishes if there is a lot of work to do, but given the queue is unbounded it may lead to an out of memory over long periods of time if the queue grows continuously.
When in doubt, set schedule_overlap
to false (it may become the default value in the future).
- Value type is string
- Default value is
"1m"
This parameter controls the keepalive time in seconds of the scrolling request and initiates the scrolling process. The timeout applies per round trip (i.e. between the previous scroll request, to the next).
- Value can be any of:
auto
,search_after
,scroll
- Default value is
auto
With auto
the plugin uses the search_after
parameter for Elasticsearch version 8.0.0
or higher, otherwise the scroll
API is used instead.
search_after
uses point in time and sort value to search. The query requires at least one sort
field, as described in the query
parameter.
scroll
uses scroll API to search, which is no longer recommended.
- Value type is number
- Default value is
1000
This allows you to set the maximum number of hits returned per scroll.
- Value type is number
- There is no default value.
- Sensible values range from 2 to about 8.
In some cases, it is possible to improve overall throughput by consuming multiple distinct slices of a query simultaneously using sliced scrolls, especially if the pipeline is spending significant time waiting on Elasticsearch to provide results.
If set, the slices
parameter tells the plugin how many slices to divide the work into, and will produce events from the slices in parallel until all of them are done scrolling.
The Elasticsearch manual indicates that there can be negative performance implications to both the query and the Elasticsearch cluster when a scrolling query uses more slices than shards in the index.
If the slices
parameter is left unset, the plugin will not inject slice instructions into the query.
- Value type is path
- There is no default value for this setting.
SSL certificate to use to authenticate the client. This certificate should be an OpenSSL-style X.509 certificate file.
This setting can be used only if ssl_key
is set.
- Value type is a list of path
- There is no default value for this setting
The .cer
or .pem
files to validate the server’s certificate.
You cannot use this setting and ssl_truststore_path
at the same time.
- Value type is a list of string
- There is no default value for this setting
The list of cipher suites to use, listed by priorities. Supported cipher suites vary depending on the Java and protocol versions.
- Value type is boolean
- There is no default value for this setting.
Enable SSL/TLS secured communication to Elasticsearch cluster. Leaving this unspecified will use whatever scheme is specified in the URLs listed in hosts
or extracted from the cloud_id
. If no explicit protocol is specified plain HTTP will be used.
When not explicitly set, SSL will be automatically enabled if any of the specified hosts use HTTPS.
- Value type is path
- There is no default value for this setting.
OpenSSL-style RSA private key that corresponds to the ssl_certificate
.
This setting can be used only if ssl_certificate
is set.
- Value type is password
- There is no default value for this setting.
Set the keystore password
- Value type is path
- There is no default value for this setting.
The keystore used to present a certificate to the server. It can be either .jks
or .p12
You cannot use this setting and ssl_certificate
at the same time.
- Value can be any of:
jks
,pkcs12
- If not provided, the value will be inferred from the keystore filename.
The format of the keystore file. It must be either jks
or pkcs12
.
- Value type is string
- Allowed values are:
'TLSv1.1'
,'TLSv1.2'
,'TLSv1.3'
- Default depends on the JDK being used. With up-to-date Logstash, the default is
['TLSv1.2', 'TLSv1.3']
.'TLSv1.1'
is not considered secure and is only provided for legacy applications.
List of allowed SSL/TLS versions to use when establishing a connection to the Elasticsearch cluster.
For Java 8 'TLSv1.3'
is supported only since 8u262 (AdoptOpenJDK), but requires that you set the LS_JAVA_OPTS="-Djdk.tls.client.protocols=TLSv1.3"
system property in Logstash.
If you configure the plugin to use 'TLSv1.1'
on any recent JVM, such as the one packaged with Logstash, the protocol is disabled by default and needs to be enabled manually by changing jdk.tls.disabledAlgorithms
in the $JDK_HOME/conf/security/java.security configuration file. That is, TLSv1.1
needs to be removed from the list.
- Value type is password
- There is no default value for this setting.
Set the truststore password.
- Value type is path
- There is no default value for this setting.
The truststore to validate the server’s certificate. It can be either .jks or .p12.
You cannot use this setting and ssl_certificate_authorities
at the same time.
- Value can be any of:
jks
,pkcs12
- If not provided, the value will be inferred from the truststore filename.
The format of the truststore file. It must be either jks
or pkcs12
.
- Value can be any of:
full
,none
- Default value is
full
Defines how to verify the certificates presented by another party in the TLS connection:
full
validates that the server certificate has an issue date that’s within the not_before and not_after dates; chains to a trusted Certificate Authority (CA), and has a hostname or IP address that matches the names within the certificate.
none
performs no certificate validation.
Setting certificate verification to none
disables many security benefits of SSL/TLS, which is very dangerous. For more information on disabling certificate verification please read https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf
- Value type is number
- Default value is
60
The maximum amount of time, in seconds, to wait on an incomplete response from Elasticsearch while no additional data has been appended. Socket timeouts usually occur while waiting for the first byte of a response, such as when executing a particularly complex query.
- Value type is field reference
- There is no default value for this setting.
Without a target
, events are created from each hit’s _source
at the root level. When the target
is set to a field reference, the _source
of the hit is placed in the target field instead.
This option can be useful to avoid populating unknown fields when a downstream schema such as ECS is enforced. It is also possible to target an entry in the event’s metadata, which will be available during event processing but not exported to your outputs (e.g., target \=> "[@metadata][_source]"
).
- Value type is string
- There is no default value for this setting.
Which field from the last event of a previous run will be used a cursor value for the following run. The value of this field is injected into each query if the query uses the placeholder :last_value
. For the first query after a pipeline is started, the value used is either read from last_run_metadata_path
file, or taken from tracking_field_seed
setting.
Note: The tracking value is updated after each page is read and at the end of each Point in Time. In case of a crash the last saved value will be used so some duplication of data can occur. For this reason the use of unique document IDs for each event is recommended in the downstream destination.
- Value type is string
- Default value is
"1970-01-01T00:00:00.000000000Z"
The starting value for the tracking_field
if there is no last_run_metadata_path
already. This field defaults to the nanosecond precision ISO8601 representation of epoch
, or "1970-01-01T00:00:00.000000000Z", given nano-second precision timestamps are the most reliable data format to use for this feature.
- Value type is string
- There is no default value for this setting.
The username to use together with the password in the password
option when authenticating to the Elasticsearch server. If set to an empty string authentication will be disabled.
As of version 5.0.0
of this plugin, some configuration options have been replaced. The plugin will fail to start if it contains any of these obsolete options.
Setting | Replaced by |
---|---|
ca_file | ssl_certificate_authorities |
ssl | ssl_enabled |
ssl_certificate_verification | ssl_verification_mode |
These configuration options are supported by all input plugins:
Setting | Input type | Required |
---|---|---|
add_field |
hash | No |
codec |
codec | No |
enable_metric |
boolean | No |
id |
string | No |
tags |
array | No |
type |
string | No |
- Value type is hash
- Default value is
{}
Add a field to an event
- Value type is codec
- Default value is
"json"
The codec used for input data. Input codecs are a convenient method for decoding your data before it enters the input, without needing a separate filter in your Logstash pipeline.
- Value type is boolean
- Default value is
true
Disable or enable metric logging for this specific plugin instance by default we record all the metrics we can, but you can disable metrics collection for a specific plugin.
- Value type is string
- There is no default value for this setting.
Add a unique ID
to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type, for example, if you have 2 elasticsearch inputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.
input {
elasticsearch {
id => "my_plugin_id"
}
}
- Value type is array
- There is no default value for this setting.
Add any number of arbitrary tags to your event.
This can help with processing later.
- Value type is string
- There is no default value for this setting.
Add a type
field to all events handled by this input.
Types are used mainly for filter activation.
The type is stored as part of the event itself, so you can also use the type to search for it in Kibana.
If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.