The Veritone platform automatically triggers training jobs for all compatible engines whenever a given library is published. For example, when a user publishes a library containing people, the platform launches training jobs for all facial recognition engines.
During development it may be helpful to manually trigger training jobs against existing, unchanged libraries.
You can do so with the following query. Substitute your own engine, library, and library engine model IDs.
mutation {
createJob(input: {
tasks: [
{
engineId: "9f60a772-e8dd-480b-918e-5779d8eb02f0"
payload: {
mode: "library-train"
libraryId: "5943cafd-7eea-4913-9496-2afdea89f08b"
libraryEngineModelId: "88540e51-6418-4b05-9349-4b2f4c230eb1"
}
}
]
}) {
id
tasks {
records {
id
status
payload
engine {
id
name
}
}
}
}
}
Steps
- Update the model's
trainStatus field to running.
mutation {
updateLibraryEngineModel(input: {
id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
trainStatus: running
}) {
id
trainStatus
}
}
- Get the library, its entities, and its identifiers.
identifierTypeId should be set to the appropriate value (for example, "face" if your engine requires faces, or "audio-recording" if it requires audio). Use limit and offset to page through returned values as necessary. Both entities and identifiers can contain metadata in a field called jsondata, which can optionally be used by your engine.
query GetIdentifiers($identifierTypeId: ID, $offset: Int = 0, $limit: Int = 25) {
library(id: "17a86304-9a82-4943-a06e-a2eb1c2506cc") {
id
name
entities(identifierTypeId: $identifierTypeId, offset: $offset, limit: $limit) {
count
records{
id
name
jsondata
identifiers(identifierTypeId: $identifierTypeId, limit: $limit) {
count
records {
id
url
jsondata
}
}
}
}
}
}
When training large libraries (100s or 1000s of entities), it is strongly advised to implement code that performs well and avoids exceeding usage limits. Observe the following:
- Use paging with a moderate page size, such as the default of
30 - Limit the number of entities and identifiers that are processed concurrently. In synchronous code, this is typically automatic in a simple loop-based implementation, but in asynchronous code such as JavaScript it may be necessary to use protective measures such as
async.limit().
- Train your engine using the returned identifiers and metadata.
This step is largely engine-dependent, but the idea here is be able to map a trained item back to a specific entity or identifier in the library. This is typically done by using either the entity ID or entity identifier ID, so that when the engine recognizes/detects that item on later executions, it will return an ID that can easily be mapped to library resources. Sometimes this is not possible and the engine will generate its own identifiers. In such cases, simply generate a mapping from the engine's IDs to the library entity ID/entity identifier IDs, and save it in some format, such as JSON. This mapping can then be saved with the engine model and referenced when necessary.
- Update the
trainStatus and save model data.
If training failed for any reason, update the trainStatus to failed:
mutation {
updateLibraryEngineModel(input: {
id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
trainStatus: failed
jsondata: {
error: "something failed"
}
}) {
id
trainStatus
jsondata
}
}
If training completed successfully update the trainStatus to complete. In the case of web API-based engines, the trained model is referenced by some sort of ID that is used to refer to that model in future invocations. In this step, you would save that (as well as any other information) to the libraryEngineModel metadata container, called jsondata. For example, an API uses a parameter called collectionId. The update query would look something like the following:
mutation {
updateLibraryEngineModel(input: {
id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
trainStatus: complete
jsondata: {
collectionId: "abcd-1234-efgh-5678",
fingerprints: 3
}
}) {
id
trainStatus
jsondata
modifiedDateTime
}
}
In some cases, such as for many containerized engines, a resulting data file or representation of the engine state is generated by the engine. Another case would include the custom mapping data mentioned in step 3. The idea is to export that data and save it to the libraryEngineModel record. To do so, use a multipart/form-data request. The Content-Type header is required for the file part and should specify the mime-type of the provided data file. Once saved, Content-Type will be included as a field called contentType in the model metadata.
curl -X POST \
https://api.aws-dev.veritone.com/v3/graphql \
-H 'authorization: Bearer <TOKEN>' \
-H 'content-type: multipart/form-data; boundary=---------------------------9051914041544843365972754266' \
-d '-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="query"
mutation {
updateLibraryEngineModel(input: {
id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
trainStatus: complete
jsondata: {
fingerprints: 3
}
}) {
id
trainStatus
dataUrl
jsondata
modifiedDateTime
}
}
-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="file"; filename="file1.txt"
Content-Type: text/plain
contents of file1.txt
-----------------------------9051914041544843365972754266--'