Title	Task processing, engine allocation, and performance

URL Name	000004199

Audience	Public

Product (Internal List) aiWare - aiWare

Body

The controller node of the aiWARE processing environment manages the task load and capacity of engines and their jobs. Controllers ensure there are enough resources to run a job. Below, we discuss:

Engine resource allocation
Engine capacity
Engine logic increase and decrease

Engine resource allocation

Task processing and performance is handled by dynamic and static engine allocation, which ensures there are enough resources to run a job. Your jobs will automatically be set by Veritone to one of those two supported modes:

Dynamic, multi-resource load balancing
Static, a set number of engine instances

For dynamic engine allocation, when an engine agent reports to the controller its available resources (vCPU, memory), the controller node will return the list of engines to run based on the highest priority tasks and their required engine resource. Specifically, if a task requires running an engine that needs no more than the available resources on the host, that host will assign that engine to it.

For static-based engine allocation, the controller specifies the minimum number of engine instances to run for engines. Typically, this option is used in order to have the engines ready for work anytime. A static engine mix turns off the dynamic based loading for that server type.

Engine capacity

Engines are assigned work and report back via an HTTP heartbeat every 5 seconds and also when the assigned work is complete.

Increasing capacity

The controller can instruct the engine agent to launch new engines. The controller sends check-ins to the engine agent every 60 seconds, based on a database query to the forecast table, which returns the engine deficiency count to meet current or forecasted service-level agreements (SLAs). The forecast table takes into consideration not only the SLA, but the priority of the SLA tasks.

The controller sends the engine agent a request ID, which is subsequently passed to the new engine upon startup from the engine agent. The engine agent launches the engine and subsequently registers it with the controller.

When registering the request ID passed to it at startup is passed back to the controller, so the controller can verify and track the time it takes to request an engine and the lag time to registering. This is critical information, from a cost management as well as a performance management/forecasting standpoint.

Decreasing capacity

Capacity decreases occur when engines request work from the controller with a GetWork request to the API. A server that has been marked for shutdown in the database is returned with the "get work" database sql request the controller makes. The forecaster is responsible for marking a server for shutdown.

The controller will shut down this engine by sending a shutdown message to the engine in response to the API call.

Engine logic increase and decrease

The controller will modify engine allocation under the following specific circumstances.

SLA-required engine capacity increase

In the case where a high priority task will miss its SLA without more engine capacity, usually because of poor forecasting or ad-hoc requests, the controller will assign more engines to the task. This assumes lower priority tasks exist, and that the engine pings the controller within a short enough duration to catch up.

Workload is above or below current capacity

In cases where the task forecast predicts that the workload for a given engine type is below or above current capacity, the forecasting process uses data provided by the controller to predict engine supply requirements given scheduled and historical ad-hoc demand. In the case where there are clear over and under capacity engines based on the forecast, then the controller kills the over capacity engines and launches the under capacity based on the SLA rankings in the database.

Excess engine capacity

In cases where the task forecast predicts excess engine capacity across the board, the controller will mark a server for termination based on the forecasted lowest engine demand. Termination is marked in the database, and then the controller kills the engine. After the engine agent checks in, the controller will send a message to the engine agent to kill the server and then mark the server as dead.

Created Date	12/5/2023 9:34 PM

Last Modified Date	12/5/2023 9:40 PM

Last Published Date	12/4/2023 6:33 PM

Article Record Type	Documentation

Veritone Record Type	Documentation

Article Number	000004199

Task processing, engine allocation, and performance