The Veritone Engine Toolkit is designed to help you build cognitive engines. Starting as part of the entrypoint for the engine container, the Engine Toolkit starts your engine as a subprocess on startup and then makes calls back to the platform via HTTP to register and get work on the engine's behalf. The work item is POSTed to your engine's webhook for processing.
In this way, the Engine Toolkit acts as a driver-like intermediary between your code and the aiWARE platform. By acting as a go-between, the Engine Toolkit takes care of low-level aiWARE interactions, so your code can focus more on AI and less on ceremony.
The Engine Toolkit drives your engine, making it easy to abstract the input/output layer between the engine and the Veritone platform using Docker. This frees you from the nuances of getting input data (e.g., from queues) and producing the engine outputs (e.g., back to queues). The Engine Toolkit is packaged as a Docker image.
The Engine Toolkit strives to provide support to all engine types, keeping the interface consistent so that the choice of implementing chunk, stream, or batch doesn't depend on how engines get their data.
The requests are made on behalf of the engines that the Engine Toolkit represents, including native engines such as Webstream Adapter, TV and Radio Adapter, Stream Ingestors, and Output Writer. For each work item in the work request, the Engine Toolkit:
- Retrieves the input data from the file system for the tasks.
- Invokes the core engine's
/process webhook. - Stores the result back to the filesystem for the next task.
The Engine Toolkit also has a heartbeat loop'to report back to the controller the work item progress, such as number of processed chunks, errors, etc.
There is a RunTimeTTL set by the controller directing the Engine Toolkit to terminate when the time is up. In addition, the controller may issue a terminate action as a response to the getWork request.
Note that besides the databases, the file system contains the state of jobs and tasks as work is being done: processed chunks, in-processing chunks, or error chunks. The input/output relationship between tasks is specified in the AI Processing database and represented in the file system accordingly. For engines to emit input/output chunks or streams, the interaction with the file system should be done by the Engine Toolkit to ensure correctness.
The Engine Toolkit includes a number of native engines such as Webstream Adapter, TV and Radio Adapter, Stream Ingestors (various flavors), and Output Writer, which provide the flexibility to turn any engine instances into "super workers," helping to push data through the initial ingestion pipeline as well as to finalize the engine outputs.
Engine IDs:
Your engine is a Docker container that runs in the aiWARE system, and you will integrate the Engine Toolkit with your engine. When creating your Docker image, use the (multistage build)[https://docs.docker.com/develop/develop-images/] by including the following line in your project's Dockerfile:
FROM veritone/aiware-engine-toolkit as vt-engine-toolkit
This will ensure that you always build using the most current version of the Toolkit.
Notes:
To refresh the Engine Toolkit: Perform a docker pull veritone/aiware-engine-toolkit prior to your docker build. For a local aiWARE environment, it is best to pull before building.
To use the Engine Toolkit in an Alpine-based image: Make sure to include this in the Dockerfile:
RUN apk add --no-cache libc6-compat
Library support
Training libraries
An engine can be started in a job with mode=library-train in the task payload. A mutation sample that creates that job is:
mutation createTrainingJob{
createJob(input:{
clusterId :"____YOUR_CLUSTER_ID____________"
tasks:[
{
engineId:"__ENGINE_ID__",
payload:{
mode:"library-train"
libraryId:"33e7c3fd-dc76-45ca-915e-49a18fe14546"
libraryEngineModelId:"e0fafd01-03db-4d6b-b651-526b4320b2b5"
}
}
]
}) {
id
tasks{
records{
engineId
payload
}
}
}
}
Details of training are private to the engine. It is up to the engine to know how to capture the state information that it considers "training," and store that captured state info in a library engine model using the Veritone GraphQL API. (Veritone provides the state-persistence API, but puts no restrictions on how an engine accomplishes or formats its training. Your library engine model can contain whatever private data the engine might need when it is called upon, in the future, to recall its training. The format of the data is up to you.
An engine -- be it chunk, stream, or batch, training mode or not -- will be invoked in the same way, e.g., via the /process webhook. The engine should examine the payload field of the request for the mode. Training libraries can take a while to run, and thus, in training mode, the engine behaves as a batch engine, which means it should respond to /process as soon as possible, and post back to the /heartbeat webhook periodically (once per minute) so the Engine Toolkit can monitor progress.
When training is finished and the library engine model is updated, the engine can then post to the /heartbeat webhook with a "complete" status.
For information on how to persist training data programmatically, see Train an engine.
Consuming training libraries
In normal processing, the engine will be given a library model's libraryId and libraryEngineModelId in the payload field. Also, mode=library-run will be in the payload field.
Troubleshooting
x509: certificate signed by unknown authority
If you see an error complaining about an unknown authority, it's likely that you do not have root certificates installed inside your Docker container. Try adding the following line to your Dockerfile:
RUN apk --no-cache add ca-certificates
This will install the certificates as part of your Docker build.
[Note] This solution has only been tested when the base Docker image is FROM alpine:latest. For other base images, you might need to install them with a different command.