Visualizing your rendering with Kibana

Smedge provides a basic interface to your rendering that covers the basic needs of daily operation. However, it can be useful for your business to create more specific visualizations of your rendering usage in ways that make sense for your business. There are a lot of ways you can accomplish this, but we are going to look today at using ElasticSearch and the Kibana visualization interface to create useful and pretty graphs of your data.

Choice of Database

This demo is designed around using ElasticSearch as a database and Kibana as a visualization engine that uses that data. You can use any database or visualization software you want, of course. The basic operational concepts in Smedge work identically.

Components

There are tons of ways to implement a pipeline like this. We will be using the Herald component from Smedge and use Logstash from the Elastic stack as an intermediary between Smedge and the database.

Local or Cloud

You may host your database locally, or in the cloud. These instructions will assume a local instance, but Logstash makes it very simple to change your database target or add multiple data targets for you to save your data.

Prerequisites

You will need to get an instance of ElasticSearch running somewhere that you can easily access. ElasticSearch is an open source product, so you can download and run it on a local machine, or you there are several cloud service providers that can supply a preconfigured ElasticSearch database cluster in the cloud. What you use is up to you, and the only difference is the details of where the database is (and possibly log-in info needed to access it) in your Logstash conf file.

You will also need to pick a machine to run the Herald Smedge component. You can choose any node that can establish communication with the Master host on your network. Herald will never check out a Smedge license. Both the Herald and the database hosts will need to be able to access a shared storage location where the log files will be saved.

Set up Herald to log the info you want in your database

Herald is a Smedge GUI component that listens for events and performs actions in response. It is similar to the Event Commands attached to a Job or an Engine, but they run independent of any specific Job on the machine you are using to run Herald, which does not have to be an Engine, the Master, or any specific node. You will want to ensure that Herald is set to start up automatically to ensure that it is always available when renders are in progress.

Once running, we will want to create the correct event handlers that log what you want to track. This is where you will need to do some planning to determine the best way to get the data you want. For now, we will add a handler for any time a job is finished. This will allow us to get any data out of the job that can be determined only after the render is complete, like the detected output filename format, or the total elapsed CPU usage for the job.

  • Start Herald (if needed)

  • Click Add Notification

  • For the Event field select “Job Finished”

  • For the Action field selct “Save the Job”

  • Click the Settings… button to define where the file is and what gets saved
    Set your desired filename and path, and choose the option to append to an existing file. For what to save, choose the custom format. We will build a simple comma-separated-value logging data from the job that just finished using the Smedge variable substitution system:

    "$(ID)","(Name.Replace:"|')","$(Creator)","$(Created.Format:%Y-%m-%d)","$(StatusAsString)","$(ElapsedProcessTime./:1000)","$(ElapsedRealTime./:1000)","$(Note.Replace:"|')"

    You will notice that we modify some of the values before saving them. Specifically, we convert any double quote marks from the Job name or note to single quotes, so they don’t break the CSV format, we format the date value, and we do some math on the elapsed times to convert them from milliseconds to seconds. Also, notice that we are using the StatusAsString value, not the status itself, to get a human readable value.

  • Press OK to close the settings and OK again to add the notification

  • Make sure the master Enable Notifications check box is checked on the Herald main window

Like the Smedge Master process, you will need to have Herald running at all times. If Herald is ever not running while your Master is up and in use, your log files will be missing data. The Herald will not run notifications for jobs that had events that were triggered when it was offline.

Setting up Logstash to read this CSV file

Now you get the raw data log from Smedge with the data to send to your database. You will need to set up a conf file for Logstash to parse this file and upload it to the database.

input {
	# set the path to the log file saved by Herald here
	file {
		path => "path-to-the-file"
	}
}

filter {
	# parse the CSV structure generated from the log file into fields
	csv {
		columns => [ "id", "name", "creator", "timestamp", "status", "process_time", "real_time", "note" ]
	}
	# correct the timestamp
	date {
		timezone => "US/Pacific"
		match => [ "timestamp", "yyyy-MM-dd" ]
		remove_field => [ "timestamp" ]
	}
	# convert the numeric fields to the correct number type
	mutate {
		convert => {
			"process_time" => "float"
			"real_time" => "float"
		}
	}
}

output {
	# dump the data to elasticsearch instance running on localhost
	elasticsearch {
		hosts => [ 'localhost:9200' ]
	}
}

This is a simple conf file that allows Logstash to understand how to convert a line from the CSV log file we generated into a document to upload to the database. For the input, we provide the path to the log file Herald is creating. The filter section is where the CSV format is defined. We also correct the date with the time zone, and convert the representation of the elapsed time values to floating point. Finally, the output section defines the instance of the elasticsearch database we will connect to, here assumed to be running on the local host.

You will have to maintain the Logstash process running on a machine that can read the log files that Herald is generating in order to maintain your connection to the database. However, if this process does go down, you can always restart it at any time. Logstash, by default, will remember how far it was in a file and will automatically pick up where it left off.

Visualizing

Once you have started adding data, you will be able to use Kibana to visualize your data. As you build your preferred visualizations, you will have a period of back-and-forth as you make adjustments to your Logstash conf file to automate some of the processing.

You will also want to pay attention to your index sizes. By default, ElasticSearch will generate an index per day. However, if you are not putting a lot of data in it, this can be wasteful, and you may want to have your data indexed by week or by month to avoid having time-out errors in your visualizations as the they parse through a lot of mostly empty indices.

Additional Info

This is an extremely simplistic example, but of course there is a lot more you can do.

Herald event commands give you some basic level of filtering. If you wanted to log in separate ways for different types of work you are doing, you could use those filters to differentiate which event is triggered by which job. Beyond that, you can choose to have Herald run a process instead of saving to a file. This gives you the ability to build any command you wish, possibly processing data straight to a database, or doing additional data acquisition or analysis before passing the data to Logstash.

Logstash itself gives a lot of power. You can choose alternate formats besides CSV for the raw data, or do additional analysis and pre-processing in the filters. You can even run arbitrary Ruby code to generate or process data by writing it directly into the conf file.

This example also only logs the completed work events. You may want to try logging more events, from the job being created, the first work unit starting, the job finishing, to the job being removed from the system. You can even trigger events for work events, too, like work errors. Keeping track of which jobs fail, the messages, and other information associated with failures can allow you to visualize patterns of failures and more quickly debug which issues are infrastructure related, project related, or tied to specific products or workflows.

For more information on Herald, event commands and the Smedge variable substitution system, see the Smedge documentation, especially the Administrator Manual. For what you can do with Logstash conf files, see the Logstash documentation.

Artist, Engineer, and Dad