How to run Vidispine with high availability Part 4 – SolrCloud

In the fourth and final part of the High Availability series we take a look on how you can use SolrCloud together with Vidispine for high availability, keeping your media assets searchable all the time.

In this fourth and final part of the high availability series we will show you how to use SolrCloud together with Vidispine. In Part #1 we made a simple Vidispine cluster, in Part #2 we added HAProxy, and in Part #3 we added pgpool-II for a high availability database.

How does SolrCloud work

In SolrCloud, a logical index is called a collection. A collection is split into a number of shards. Each shard contains a number of instances. One of them would be the leader of the shard, the others would be replicas. If a leader fails, one of the replicas would be elected as the new leader. And SolrCloud uses Zookeeper to maintain the distributed Solr instances.

Each time a Solr instance receives a document, it forwards the document to its leader, who will then calculate the hash of the document id. Based on this hash, the document is then forwarded to the leader of the destination shard, who will them distribute this document to its replicas. As a result, a logical index is split evenly into different shards. And all the instances in a shard should contain the same index.

Hence, it is very important that there is no complete failure of a single shard, e.g. a shard that is not replicated, as the SolrCloud would not function, and there is no guarantee of index consistency.

Installation and configuration

ZOOKEEPER

Download Zookeeper 3.4.6 from https://apache.mirrors.spacedump.net/zookeeper/zookeeper-3.4.6/ and unzip it.

CODE

cd zookeeper-3.4.6
  mkdir data
  cp conf/zoo_sample.cfg conf/zoo.cfg

edit dataDir to the path of the created data folder.

start Zookeeper:

CODE

bin/zkServer.sh start

Your Zookeeper should be running at localhost:2181 ;

SOLR

Download Solr 4.10.4 from https://archive.apache.org/dist/lucene/solr/4.10.4/ and unzip it;

CODE

  cd solr-4.10.4
  cp example solrInstance-1
  cd solrInstance-1

replace schema.xml and solrconfig.xml under /solr/collection1/conf/ with Vidispine schema and config.
Edit /etc/hosts to make the host name pointing to the machine IP address instead of loopback IP, so:

CODE

10.185.20.1 cluster1

instead of

CODE

127.0.0.1 cluster1

repeat above on your other solr servers.

Running SolrCloud

CODE

# Our Zookeeper is running at 10.185.20.100:2181
  # On Instance 0:
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=VidiConfigTest -DzkHost=10.185.20.100:2181 -DnumShards=2 -jar start.jar

# On Instance 1:
java -DzkHost=10.185.20.100:2181 -jar start.jar

# On Instance 2:
java -DzkHost=10.185.20.100:2181 -jar start.jar

# On Instance 3:
java -DzkHost=10.185.20.100:2181 -jar start.jar

Please note that you only need to specify -DnumShards and -bootstrap_confdir when you start the first instance. And you may change the number of shards according to your need.

You could simply use “screen” to start solr in the background and redirect the log to file:

CODE

screen -S solr -d -m /bin/bash -c 'java -DzkHost=10.185.20.100:2181 -jar start.jar > solr.log'

or follow this wiki page to setup solr logging: https://wiki.apache.org/solr/SolrLogging.

There is a nice admin page on your solr instances: https://localhost:8983/solr

For more info about SolrCloud, please refer to https://cwiki.apache.org/confluence/display/solr/SolrCloud

SolrCloud test with docker

You can also test SolrCloud using this docker-compose env which sets up a ZooKeeper and SolrCloud instance, with numShards=2.

Download https://transfer.vidispine.com/d57/b84ad995cc314552c44775fdcaae9/solrcloud-docker.tar.gz and then run the following:

CODE

$ # install docker and docker-compose, then:
$ tar -xvzf solrcloud-docker.tar.gz
$ cd solrcloud-docker/
$ docker-compose up
$ docker ps # to get the ports

What’s the API call that you use to split the shards? GET solr/admin/collections?action=SPLITSHARD?

This conclude our posts on Vidispine High Availability configuration. You can find the other posts here: