Title	Controller

URL Name	000004245

Audience	Public

Product (Internal List) aiWare - aiWare

Body

The controller node is a component of the aiWARE platform that registers hosts and engine instances, manages all work, and communicates with the database layer.

The controller manages the load between engines and recalculates load as needed based on existing and anticipated resources, until all tasks are assigned to engines.

The following topics are covered below:

The roles and responsibilities of controllers
Controller process flow
Engine process flow
Controller communication and authentication
Task processing, engine allocation, and performance

Controller roles and responsibilities

Controllers are stateless API services, typically deployed behind a load balancer. Within a cluster, a single controller node is promoted to serve as primary controller and the aiWARE processing supervisor, and is responsible for specific critical functions including initial startup, core sync, usage reporting, engine loading, and removing stopped hosts.

The controller's principal function is to assign tasks to engines and adapters in an optimal manner to meet SLAs and minimize costs.

There are several key functions that a controller manages:

Control the starting and stopping of engines
Provide data to the primary controller to properly scale engine hardware
Provide stats data to the database to properly forecast engine and hardware demand
Route data from one task to the next
Control the assignment of engines to tasks
Manage failures and retries
Log data for analysis
Expose billing metrics
Communicate with external services

Controller process flow

Database connection

Upon launch, the controller establishes a read and write connection to the database. Engine servers (created at aiWARE processing launch) run a local aiWARE agent. This aiWARE agent connects to a controller and provides status updates on the engines running, as well as the resource capacity and current usage (memory, CPU, disk).

Engine check and assignment

The controller checks the database to see what engines it can run on AI Processing, what the base configuration of engines should be, and how many total startup engines will be available.

As each engine instance comes online, it makes HTTP requests to the controller. It registers itself with the controller, which in turn stores this information in the database.

Job processing

Once all the startup servers have launched, are registered in the database, and the startup engines are running, the engines will query the controller for work. The controller receives these requests and assigns specific processing tasks to each engine as long as there is work to do. Engines report back status and progress on processing to the controller via HTTP POST. This data is logged in database tables. For more information about job tasks, see task processing, engine allocation, and performance.

Engine process flow

How engines are initiated

The controller sends a message to the engine agent running on the server, which in turn launches an engine instance with a Docker run command.

The controller assigns the engine instance a task ID and the number of units to process. A unit is a file in the task input folder. For example, there may be 10,000 files (units) in the input task folder, but the controller might only instruct the engine instance to process 100 of them before checking back in for more work. This allows the controller to reallocate the engine instance and have it work on other higher priority tasks without blocking this task until all 10,000 files are done.

The engine agent reads the directory of files left to be processed in the task ID input folder. It randomizes the list. It selects a number of units of work from the randomized list (configurable). If it is successful in opening a file to work, it marks it as being processed.

When the engine agent has finished the work on the file, it marks the file as completed and writes the engine output to the output folder.

Engine status

When the work is complete, the engine agent notifies the controller that all work assigned is done.

Created Date	7/30/2024 8:38 PM

Last Modified Date	7/30/2024 8:38 PM

Last Published Date	7/30/2024 8:38 PM

Article Record Type	Documentation

Veritone Record Type	Documentation

Article Number	000004245