Monitoring [VC 5.6 INT]
Monitoring
To get better insight into the operations of jobs and services you can collect metrics and traces in your favorite monitoring service. Metrics are exposed using JMX and StatsD.
Transcoders on the other hand only expose metrics using StatsD.
StatsD
By default metrics are not sent to a StatsD server. To enable it you have to update the metrics configuration. For example, to have metrics sent to a StatsD server on localhost listening on UDP port 8125, use:
PUT API/configuration/metrics
Content-Type: application/xml
<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
<statsd/>
</MetricsConfigurationDocument>
Metrics sent to StatsD are by default prefixed with vs.. To have metrics sent with the prefix vs1., for example if you have multiple instances running:
PUT API/configuration/metrics
Content-Type: application/xml
<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
<statsd>
<host>metrics.example.com</host>
<port>6125</port>
<prefix>vs1</prefix>
</statsd>
</MetricsConfigurationDocument>
Here metrics are sent to an external StatsD server on the non-standard port 6125. Note that the . between the prefix and metric name is added automatically.
Filtering metrics
You can set inclusion and exclusion filters to restrict which metrics are sent to the StatsD server. The default is to include all and exclude none.
Inclusion/exclusion filters may have a leading or trailing wildcard. For example, to exclude all storage.fs metrics:
<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
<statsd>
<exclude>storage.fs.*</exclude>
</statsd>
</MetricsConfigurationDocument>
Tagged metrics
Some metrics are tagged with additional information. These are sent to StatsD in the format:
<metricname>:<value>|<type>|#<tag>+
A job.step.execution.time metric might for example be sent as:
vs.job.step.execution.time:123|ms|#type:placeholder-import,step:100,sync
If your StatsD server does not support such tags then they can be disabled by setting tags to false:
<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
<statsd>
...
<tags>false</tags>
</statsd>
</MetricsConfigurationDocument>
JMX
Each metric is exposed as an JMX MBean in the “metrics” domain. You can view the metrics using for example:
A JMX client such as VisualVM with the VisualVM-MBeans plugin, or JConsole.
Programmatically using the Java JMX client interface.
Over HTTP/JSON using a bridge such as Jolokia.
Metrics
Metrics are exposed as either meters, timers or gauges. The name of a metric is meant to be self-explanatory. Timers are suffixed with time and meters are named as past tense verbs, while gauges make up the rest.
The StatsD type used for each metric, and the statistics exposed over JXM for each type are:
Type | StatsD type | MBean attributes |
|---|---|---|
Meter |
| The count, mean and 1/5/15-minute rates. |
Gauge |
| The value. |
Timer |
| The count, min/max/mean/stdev, rates and percentiles. |
Indexing
Meters:
reindex.{index}.startedreindex.{index}.finishedindexer.solr.request.failedindexer.elasticsearch.request.failed
Timers:
indexer.solr.update.timeindexer.solr.delete.timeindexer.solr.commit.timeindexer.elasticsearch.update.timeindexer.elasticsearch.delete.timeindexer.{index}.index.timeWith
indexbeing one ofitem/collection/acl/file.
indexer.library.update.timeTime spend on updating auto-refreshing libraries in the system.
Job
Meters:
job.createdjob.startedjob.finishedjob.failedjob.blocked
Gauges:
job.total.{state}Where
stateis the name of a job state, lower cased and with_replaced with-. For examplefinished-warning.
Timers:
job.{type}.step.{step}.{sync}.execution.timejob.step.execution.timeTagged with
type:{type},step:{step}andsync/async.
Solr
Meters:
solr.request.failed
Timers:
solr.query.timesolr.update.timesolr.commit.soft.timesolr.commit.hard.timesolr.optimize.time
Elasticsearch
Meters:
elasticsearch.request.failed
Timers:
elasticsearch.query.timeelasticsearch.update.timeelasticsearch.delete.time
Storage
Meters:
storage.onlineTagged with
storage:{id}.
storage.offlineTagged with
storage:{id}.
storage.method.onlineTagged with
storage:{id}.
storage.method.offlineTagged with
storage:{id}.
storage.file.foundTagged with
storage:{id}.
storage.file.changedTagged with
storage:{id}.
storage.file.deletedTagged with
storage:{id}.
storage.file.hashedstorage.file.checksum.bytes.readstorage.fs.statThe number of
statcall made.
Gauges:
storage.total.onlinestorage.total.offlinestorage.total.evacuatingstorage.total.evacuatedThe total number of storages with a specific state.
Resource
Meters:
resource.{type}.onlineTagged with
resource:{id}.
resource.{type}.offlineTagged with
resource:{id}.
Agent
Gauges:
agent.total.onlineagent.total.offlineThe total number of agents with a specific state.
Transfer
Meters:
transfer.bytes.transferredtransfer.startedtransfer.finishedtransfer.finished-parttransfer.failedtransfer.blocked
Service
Meters:
service.exception
Gauges:
service.load.5The 5 minute load.
service.load.60The 60 minute load.
Transcoder
Gauges
transcoder.{transcoder-id}.jobs.runningtranscoder.{transcoder-id}.jobs.finishedtranscoder.{transcoder-id}.jobs.failedtranscoder.{transcoder-id}.jobs.{transcoder-job-type}.runningtranscoder.{transcoder-id}.jobs.{transcoder-job-type}.finishedtranscoder.{transcoder-id}.jobs.{transcoder-job-type}.failed
Counters
transcoder.{transcoder-id}.muxer.video.framestranscoder.{transcoder-id}.encoder.{codec}.framestranscoder.{transcoder-id}.decoder.{codec}.framestranscoder.{transcoder-id}.io.{protocol}.{direction}.bytes
Broker
Gauges
broker.queue.{queue}.sizeThe size of a specific queue. Note that this metric is only present when using the embedded broker.
Cluster
Gauges
cluster.sizeThe number of members in the cluster.
APM
Vidispine supports application performance monitoring using Elastic APM. It monitors the execution of the application for easy pinpointing of performance issues.
Setup
In order to use Elastic APM you first need to set up an APM server. The elastic APM integration is disabled by default but can be enabled by adding the following configuration to the server.yaml file:
apm:
elastic:
urls: ["https://localhost:1234/"]
secretToken: secret
serviceName: vidispine
serviceVersion: 5.0
environment: staging
sampleRate: 1
Note
The server will need to restart for any changes to take effect.
Please see the APM configuration reference for details.
Each trace encapsulates an event and may have one of the following types:
requestA HTTP request, either incoming or outgoing.messagingA JMS message, either incoming or outgoing.scheduledA single iteration of a scheduled worker.serviceA cross-object method invocation of a service layer class.DBA database query