



[root@master mnt]# npm install elasticdump


[root@master bin]# pwd

[root@master bin]# ./elasticdump --help
elasticdump: Import and export tools for elasticsearch
version: 4.7.0

Usage: elasticdump --input SOURCE --output DESTINATION [OPTIONS]

Source location (required)
Source index and type
(default: all, example: index/type)
Destination location (required)
Destination index and type
(default: all, example: index/type)
How many objects to move in batch per operation
limit is approximate for file streams
(default: 100)

How many objects to retrieve
(default: -1 -> no limit)

Display the elasticsearch commands being used
(default: false)

Suppress all messages except for errors
(default: false)

What are we exporting?
(default: data, options: [data, settings, analyzer, mapping, alias])
Delete documents one-by-one from the input as they are
moved. Will not delete the source index
(default: false)
Add custom headers to Elastisearch requests (helpful when
your Elasticsearch instance sits behind a proxy)
(default: '{"User-Agent": "elasticdump"}')
Add custom parameters to Elastisearch requests uri. Helpful when you for example
want to use elasticsearch preference
(default: null)
Preform a partial extract based on search results
(when ES is the input, default values are
if ES > 5
`'{"query": { "match_all": {} }, "stored_fields": ["*"], "_source": true }'`
`'{"query": { "match_all": {} }, "fields": ["*"], "_source": true }'`
Output only the json contained within the document _source
Normal: {"_index":"","_type":"","_id":"", "_source":{SOURCE}}
sourceOnly: {SOURCE}
(default: false)
Will continue the read/write loop on write error
(default: false)
Time the nodes will hold the requested search in order.
(default: 10m)
How many simultaneous HTTP requests can we process make?
5 [node <= v0.10.x] / Infinity [node >= v0.11.x] )
Integer containing the number of milliseconds to wait for
a request to respond before aborting the request. Passed
directly to the request library. Mostly used when you don't
care too much if you lose some data when importing
but rather have speed.
Integer containing the number of rows you wish to skip
ahead from the input transport. When importing a large
index, things can go wrong, be it connectivity, crashes,
someone forgetting to `screen`, etc. This allows you
to start the dump again from the last known line written
(as logged by the `offset` in the output). Please be
advised that since no sorting is specified when the
dump is initially created, there's no real way to
guarantee that the skipped rows have already been
written/parsed. This is more of an option for when
you want to get most data as possible in the index
without concern for losing some rows in the process,
similar to the `timeout` option.
(default: 0)
Disable input index refresh.
1. Much increase index speed
2. Much less hardware requirements
1. Recently added data may not be indexed
Recommended to use with big data indexing,
where speed and system health in a higher priority
than recently added data.
Provide a custom js file to use as the input transport
Provide a custom js file to use as the output transport
When using a custom outputTransport, should log lines
be appended to the output stream?
(default: true, except for `$`)
Use [standard](https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/) location and ordering for resolving credentials including environment variables, config files, EC2 and ECS metadata locations
_Recommended option for use with AWS_
When using Amazon Elasticsearch Service protected by
AWS Identity and Access Management (IAM), provide
your Access Key ID and Secret Access Key
Alternative to --awsAccessKeyId and --awsSecretAccessKey,
loads credentials from a specified profile in aws ini file.
For greater flexibility, consider using --awsChain
environment variables to override defaults if needed
A javascript, which will be called to modify documents
before writing it to destination. global variable 'doc'
is available.
Example script for computing a new field 'f2' as doubled
value of field 'f1':
doc._source["f2"] = doc._source.f1 * 2;

When using http auth provide credentials in ini file in form

Support big integer numbers
Integer indicating the number of times a request should be automatically re-attempted before failing
when a connection fails with one of the following errors `ECONNRESET`, `ENOTFOUND`, `ESOCKETTIMEDOUT`,
(default: 0)

Integer indicating the back-off/break period between retry attempts (milliseconds)
(default : 5000)
Comma-separated list of meta-fields to be parsed
supports file splitting. This value must be a string supported by the **bytes** module.
The following abbreviations must be used to signify size in terms of units
b for bytes
kb for kilobytes
mb for megabytes
gb for gigabytes
tb for terabytes

                e.g. 10mb / 1gb / 1tb  
                Partitioning helps to alleviate overflow/out of memory exceptions by efficiently segmenting files  
                into smaller chunks that then be merged if needs be.

AWS access key ID
AWS secret access key
AWS region
Name of the bucket to which the data will be uploaded
Object key (filename) for the data to be uploaded
gzip data before sending to s3
Enable TLS X509 client authentication
--cert, --input-cert, --output-cert
Client certificate file. Use --cert if source and destination are identical.
Otherwise, use the one prefixed with --input or --output as needed.
--key, --input-key, --output-key
Private key file. Use --key if source and destination are identical.
Otherwise, use the one prefixed with --input or --output as needed.
--pass, --input-pass, --output-pass
Pass phrase for the private key. Use --pass if source and destination are identical.
Otherwise, use the one prefixed with --input or --output as needed.
--ca, --input-ca, --output-ca
CA certificate. Use --ca if source and destination are identical.
Otherwise, use the one prefixed with --input or --output as needed.
--inputSocksProxy, --outputSocksProxy
Socks5 host address
--inputSocksPort, --outputSocksPort
Socks5 host port
This page


Copy an index from production to staging with mappings:

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \

Backup index data to a file:

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \

Backup and index to a gzip using stdout:

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz

Backup the results of a query to a file

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody '{"query":{"term":{"username": "admin"}}}'

Learn more @ https://github.com/taskrabbit/elasticsearch-dump


elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \



# 备份到标准输出,且进行压缩(这里有一个需要注意的地方,我查询索引信息有6.4G,用下面的方式备份后得到一个789M的压缩文件,这个压缩文件解压后有19G):
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz


elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody '{"query":{"term":{"username": "admin"}}}'

# Copy a single shard data: 

elasticdump \

  --input=http://es.com:9200/api \

  --output=http://es.com:9200/api2 \

  --params='{"preference" : "_shards:0"}'

# Backup aliases to a file 

elasticdump \

  --input=http://es.com:9200/index-name/alias-filter \

  --output=alias.json \


# Import aliases into ES 

elasticdump \

  --input=./alias.json \

  --output=http://es.com:9200 \


# Backup templates to a file 

elasticdump \

  --input=http://es.com:9200/template-filter \

  --output=templates.json \


# Import templates into ES 

elasticdump \

  --input=./templates.json \

  --output=http://es.com:9200 \


# Split files into multiple parts 

elasticdump \

  --input=http://production.es.com:9200/my_index \

  --output=/data/my_index.json \


# Export ES data to S3 

elasticdump \

  --input=http://production.es.com:9200/my_index \

  --s3Bucket "${bucket_name}" \

  --s3AccessKeyId "${access_key_id}" \

  --s3SecretAccessKey "${access_key_secret}" \

  --s3RecordKey "${file_name}"  


1. 将es集群中的某个company的数据导出到文件中

[root@master bin]# ./elasticdump --input --output /mnt/company.json
Fri, 19 Apr 2019 03:39:20 GMT | starting dump
Fri, 19 Apr 2019 03:39:20 GMT | got 2 objects from source elasticsearch (offset: 0)
Fri, 19 Apr 2019 03:39:20 GMT | sent 2 objects to destination file, wrote 2
Fri, 19 Apr 2019 03:39:20 GMT | got 0 objects from source elasticsearch (offset: 2)
Fri, 19 Apr 2019 03:39:20 GMT | Total Writes: 2
Fri, 19 Apr 2019 03:39:20 GMT | dump complete


[root@master mnt]# curl -XDELETE ''



[root@master bin]# ./elasticdump elasticdump --input /mnt/company.json --output ""
Fri, 19 Apr 2019 03:46:56 GMT | starting dump
Fri, 19 Apr 2019 03:46:56 GMT | got 2 objects from source file (offset: 0)
Fri, 19 Apr 2019 03:46:57 GMT | sent 2 objects to destination elasticsearch, wrote 2
Fri, 19 Apr 2019 03:46:57 GMT | got 0 objects from source file (offset: 2)
Fri, 19 Apr 2019 03:46:57 GMT | Total Writes: 2
Fri, 19 Apr 2019 03:46:57 GMT | dump complete



