Title	Training library-enabled engines

URL Name	000004077

Audience	Public

Product (Internal List) aiWare - aiWare

Body

The Veritone platform automatically triggers training jobs for all compatible engines whenever a given library is published. For example, when a user publishes a library containing people, the platform launches training jobs for all facial recognition engines.

During development it may be helpful to manually trigger training jobs against existing, unchanged libraries.

You can do so with the following query. Substitute your own engine, library, and library engine model IDs.

mutation {
  createJob(input: {
    tasks: [
      {
        engineId: "9f60a772-e8dd-480b-918e-5779d8eb02f0"
        payload: {
          mode: "library-train"
          libraryId: "5943cafd-7eea-4913-9496-2afdea89f08b"
          libraryEngineModelId: "88540e51-6418-4b05-9349-4b2f4c230eb1"
        }
      }
    ]
  }) {
    id
    tasks {
      records {
        id
        status
        payload
        engine {
          id
          name
        }
      }
    }
  }
}

Steps

Update the model's trainStatus field to running.

    mutation {
      updateLibraryEngineModel(input: {
        id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
        trainStatus: running
      }) {
        id
        trainStatus
      }
    }

Get the library, its entities, and its identifiers.
identifierTypeId should be set to the appropriate value (for example, "face" if your engine requires faces, or "audio-recording" if it requires audio). Use limit and offset to page through returned values as necessary. Both entities and identifiers can contain metadata in a field called jsondata, which can optionally be used by your engine.
```
    query GetIdentifiers($identifierTypeId: ID, $offset: Int = 0, $limit: Int = 25) {
      library(id: "17a86304-9a82-4943-a06e-a2eb1c2506cc") {
        id
        name
        entities(identifierTypeId: $identifierTypeId, offset: $offset, limit: $limit) {
          count
          records{
            id
            name
            jsondata
            identifiers(identifierTypeId: $identifierTypeId, limit: $limit) {
              count
              records {
                id
                url
                jsondata
              }
            }
          }
        }
      }
    }
```
When training large libraries (100s or 1000s of entities), it is strongly advised to implement code that performs well and avoids exceeding usage limits. Observe the following:
- Use paging with a moderate page size, such as the default of 30
- Limit the number of entities and identifiers that are processed concurrently. In synchronous code, this is typically automatic in a simple loop-based implementation, but in asynchronous code such as JavaScript it may be necessary to use protective measures such as async.limit().
Train your engine using the returned identifiers and metadata.
This step is largely engine-dependent, but the idea here is be able to map a trained item back to a specific entity or identifier in the library. This is typically done by using either the entity ID or entity identifier ID, so that when the engine recognizes/detects that item on later executions, it will return an ID that can easily be mapped to library resources. Sometimes this is not possible and the engine will generate its own identifiers. In such cases, simply generate a mapping from the engine's IDs to the library entity ID/entity identifier IDs, and save it in some format, such as JSON. This mapping can then be saved with the engine model and referenced when necessary.

Update the trainStatus and save model data.

If training failed for any reason, update the trainStatus to failed:

    mutation {
      updateLibraryEngineModel(input: {
        id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
        trainStatus: failed
        jsondata: {
          error: "something failed"
        }
      }) {
        id
        trainStatus
        jsondata
      }
    }

If training completed successfully update the trainStatus to complete. In the case of web API-based engines, the trained model is referenced by some sort of ID that is used to refer to that model in future invocations. In this step, you would save that (as well as any other information) to the libraryEngineModel metadata container, called jsondata. For example, an API uses a parameter called collectionId. The update query would look something like the following:

    mutation {
      updateLibraryEngineModel(input: {
        id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
        trainStatus: complete
        jsondata: {
          collectionId: "abcd-1234-efgh-5678",
          fingerprints: 3
        }
      }) {
        id
        trainStatus
        jsondata
        modifiedDateTime
      }
    }

In some cases, such as for many containerized engines, a resulting data file or representation of the engine state is generated by the engine. Another case would include the custom mapping data mentioned in step 3. The idea is to export that data and save it to the libraryEngineModel record. To do so, use a multipart/form-data request. The Content-Type header is required for the file part and should specify the mime-type of the provided data file. Once saved, Content-Type will be included as a field called contentType in the model metadata.

curl -X POST \
  https://api.aws-dev.veritone.com/v3/graphql \
  -H 'authorization: Bearer <TOKEN>' \
  -H 'content-type: multipart/form-data; boundary=---------------------------9051914041544843365972754266' \
  -d '-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="query"

mutation {
  updateLibraryEngineModel(input: {
    id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
    trainStatus: complete
    jsondata: {
      fingerprints: 3
    }
  }) {
    id
    trainStatus
    dataUrl
    jsondata
    modifiedDateTime
  }
}
-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="file"; filename="file1.txt"
Content-Type: text/plain

contents of file1.txt

-----------------------------9051914041544843365972754266--'

Created Date	1/18/2024 12:25 AM

Last Modified Date	1/18/2024 12:28 AM

Last Published Date	1/18/2024 12:28 AM

Article Record Type	Documentation

Veritone Record Type	Documentation

Article Number	000004077

Training library-enabled engines

Steps