elasticsearch: distributing indices over multiple disk volumes

Spread the love

Question Description

I have one index which is quite large (about 100Gb), so I had to extend my disk space on my digital ocean survey by adding another volume (I run everything on only one node). I told elasticsearch that it now has to consider two disk locations by

/usr/share/elasticsearch/bin/elasticsearch -Epath.data=/var/lib/elasticsearch,/mnt/volume-sfo2-01/es_data

elasticsearch does seem to have taken notice of this since it wrote some stuff to the new location

/mnt/volume-sfo2-01/es_data# cd nodes/
/mnt/volume-sfo2-01/es_data/nodes# ls
0
/mnt/volume-sfo2-01/es_data/nodes# cd 0/
/mnt/volume-sfo2-01/es_data/nodes/0# ls
indices  node.lock  _state
/mnt/volume-sfo2-01/es_data/nodes/0# cd indices
/mnt/volume-sfo2-01/es_data/nodes/0/indices# ls
DixLGLrJRXm1gSYcFzkzzw  nmZbce8wTayJC2s_eMC0-g  Qd-9ZnFIRoSM2z7AohKm-w  Sm_tyYTJTty0ImvDamFaQw
/mnt/volume-sfo2-01/es_data/nodes/0/indices# cd DixLGLrJRXm1gSYcFzkzzw/
/mnt/volume-sfo2-01/es_data/nodes/0/indices/DixLGLrJRXm1gSYcFzkzzw# ls
_state

which is identical to the stuff I find in /var/lib/elasticsearch/data, except of the actual index information in the lowest level.

Reading the elasticsearch documentary I got the impression that elasticsearch is arranging the new index over the two disk locations, but will not split a shard between the two locations. So I initialized the index with 5 shards so that it can split the data between the volumes.

The survey does seem to have detected the two data paths since the log file shows

[2017-06-17T19:16:57,079][INFO ][o.e.e.NodeEnvironment    ] [WU6cQ-o] using [2] data paths, mounts [[/ (/dev/vda1), /mnt/volume-sfo2-01 (/dev/sda)]], net usable_space [29.6gb], net total_space [98.1gb], spins? [possibly], types [ext4]

However, when I index the new indices, with constantly uses all the disk space on my original disk and eventually runs out of disk space with the error

raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'index_failed_engine_exception', u'Index failed for [pubmed_paper#25949809]')

It never shifts one of the shards to the second volume? Do I miss anything? Can I manually guide the disk space usage?

Here are the elasticsearch version details:

# curl -XGET 'localhost:9200'
{
  "name" : "WU6cQ-o",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "hKc147QfQqCefLliStLNtw",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

and here is the default path file structure, where ekasticsearch stores all the information (instead of sharing it with the second path)

/var/lib/elasticsearch/elasticsearch/nodes/0/indices/DixLGLrJRXm1gSYcFzkzzw# ls
0  1  2  3  4  _state

one question is probably whether I can just take one of these shards and move it to the other location?

Practice As Follows

What you can do is, Add a harddrive with 1TB size to your system & copy the data of 100GB to new harddrive & update your data directory location to point to new harddrive, don’t give both path otherwise it will try to write data to old path too

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.