The controller node of the aiWARE processing environment manages the task load and capacity of engines and their jobs. Controllers ensure there are enough resources to run a job. Below, we discuss:
- Engine resource allocation
- Engine capacity
- Engine logic increase and decrease
Engine resource allocation
Task processing and performance is handled by dynamic and static engine allocation, which ensures there are enough resources to run a job. Your jobs will automatically be set by Veritone to one of those two supported modes:
- Dynamic, multi-resource load balancing
- Static, a set number of engine instances
For dynamic engine allocation, when an engine agent reports to the controller its available resources (vCPU, memory), the controller node will return the list of engines to run based on the highest priority tasks and their required engine resource. Specifically, if a task requires running an engine that needs no more than the available resources on the host, that host will assign that engine to it.
For static-based engine allocation, the controller specifies the minimum number of engine instances to run for engines. Typically, this option is used in order to have the engines ready for work anytime. A static engine mix turns off the dynamic based loading for that server type.
Engine capacity
Engines are assigned work and report back via an HTTP heartbeat every 5 seconds and also when the assigned work is complete.
Increasing capacity
The controller can instruct the engine agent to launch new engines. The controller sends check-ins to the engine agent every 60 seconds, based on a database query to the forecast table, which returns the engine deficiency count to meet current or forecasted service-level agreements (SLAs). The forecast table takes into consideration not only the SLA, but the priority of the SLA tasks.
The controller sends the engine agent a request ID, which is subsequently passed to the new engine upon startup from the engine agent. The engine agent launches the engine and subsequently registers it with the controller.
When registering the request ID passed to it at startup is passed back to the controller, so the controller can verify and track the time it takes to request an engine and the lag time to registering. This is critical information, from a cost management as well as a performance management/forecasting standpoint.
Decreasing capacity
Capacity decreases occur when engines request work from the controller with a GetWork request to the API. A server that has been marked for shutdown in the database is returned with the "get work" database sql request the controller makes. The forecaster is responsible for marking a server for shutdown.
The controller will shut down this engine by sending a shutdown message to the engine in response to the API call.
Engine logic increase and decrease
The controller will modify engine allocation under the following specific circumstances.
SLA-required engine capacity increase
In the case where a high priority task will miss its SLA without more engine capacity, usually because of poor forecasting or ad-hoc requests, the controller will assign more engines to the task. This assumes lower priority tasks exist, and that the engine pings the controller within a short enough duration to catch up.
Workload is above or below current capacity
In cases where the task forecast predicts that the workload for a given engine type is below or above current capacity, the forecasting process uses data provided by the controller to predict engine supply requirements given scheduled and historical ad-hoc demand. In the case where there are clear over and under capacity engines based on the forecast, then the controller kills the over capacity engines and launches the under capacity based on the SLA rankings in the database.
Excess engine capacity
In cases where the task forecast predicts excess engine capacity across the board, the controller will mark a server for termination based on the forecasted lowest engine demand. Termination is marked in the database, and then the controller kills the engine. After the engine agent checks in, the controller will send a message to the engine agent to kill the server and then mark the server as dead.