Salesforce

Training library-enabled engines

« Go Back
Information
Training library-enabled engines
000004077
Public
Product Selection
aiWare - aiWare
Article Details

The Veritone platform automatically triggers training jobs for all compatible engines whenever a given library is published. For example, when a user publishes a library containing people, the platform launches training jobs for all facial recognition engines.

During development it may be helpful to manually trigger training jobs against existing, unchanged libraries.

You can do so with the following query. Substitute your own engine, library, and library engine model IDs. 

mutation {
  createJob(input: {
    tasks: [
      {
        engineId: "9f60a772-e8dd-480b-918e-5779d8eb02f0"
        payload: {
          mode: "library-train"
          libraryId: "5943cafd-7eea-4913-9496-2afdea89f08b"
          libraryEngineModelId: "88540e51-6418-4b05-9349-4b2f4c230eb1"
        }
      }
    ]
  }) {
    id
    tasks {
      records {
        id
        status
        payload
        engine {
          id
          name
        }
      }
    }
  }
}

Steps

  1. Update the model's trainStatus field to running.
        mutation {
          updateLibraryEngineModel(input: {
            id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
            trainStatus: running
          }) {
            id
            trainStatus
          }
        }
  2. Get the library, its entities, and its identifiers.

    identifierTypeId should be set to the appropriate value (for example, "face" if your engine requires faces, or "audio-recording" if it requires audio). Use limit and offset to page through returned values as necessary. Both entities and identifiers can contain metadata in a field called jsondata, which can optionally be used by your engine.

    
        query GetIdentifiers($identifierTypeId: ID, $offset: Int = 0, $limit: Int = 25) {
          library(id: "17a86304-9a82-4943-a06e-a2eb1c2506cc") {
            id
            name
            entities(identifierTypeId: $identifierTypeId, offset: $offset, limit: $limit) {
              count
              records{
                id
                name
                jsondata
                identifiers(identifierTypeId: $identifierTypeId, limit: $limit) {
                  count
                  records {
                    id
                    url
                    jsondata
                  }
                }
              }
            }
          }
        }

    When training large libraries (100s or 1000s of entities), it is strongly advised to implement code that performs well and avoids exceeding usage limits. Observe the following:

    • Use paging with a moderate page size, such as the default of 30
    • Limit the number of entities and identifiers that are processed concurrently. In synchronous code, this is typically automatic in a simple loop-based implementation, but in asynchronous code such as JavaScript it may be necessary to use protective measures such as async.limit().
  3. Train your engine using the returned identifiers and metadata.

    This step is largely engine-dependent, but the idea here is be able to map a trained item back to a specific entity or identifier in the library. This is typically done by using either the entity ID or entity identifier ID, so that when the engine recognizes/detects that item on later executions, it will return an ID that can easily be mapped to library resources. Sometimes this is not possible and the engine will generate its own identifiers. In such cases, simply generate a mapping from the engine's IDs to the library entity ID/entity identifier IDs, and save it in some format, such as JSON. This mapping can then be saved with the engine model and referenced when necessary.

  4. Update the trainStatus and save model data.

    If training failed for any reason, update the trainStatus to failed:

        mutation {
          updateLibraryEngineModel(input: {
            id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
            trainStatus: failed
            jsondata: {
              error: "something failed"
            }
          }) {
            id
            trainStatus
            jsondata
          }
        }

    If training completed successfully update the trainStatus to complete. In the case of web API-based engines, the trained model is referenced by some sort of ID that is used to refer to that model in future invocations. In this step, you would save that (as well as any other information) to the libraryEngineModel metadata container, called jsondata. For example, an API uses a parameter called collectionId. The update query would look something like the following:

        mutation {
          updateLibraryEngineModel(input: {
            id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
            trainStatus: complete
            jsondata: {
              collectionId: "abcd-1234-efgh-5678",
              fingerprints: 3
            }
          }) {
            id
            trainStatus
            jsondata
            modifiedDateTime
          }
        }

    In some cases, such as for many containerized engines, a resulting data file or representation of the engine state is generated by the engine. Another case would include the custom mapping data mentioned in step 3. The idea is to export that data and save it to the libraryEngineModel record. To do so, use a multipart/form-data request. The Content-Type header is required for the file part and should specify the mime-type of the provided data file. Once saved, Content-Type will be included as a field called contentType in the model metadata.

    curl -X POST \
      https://api.aws-dev.veritone.com/v3/graphql \
      -H 'authorization: Bearer <TOKEN>' \
      -H 'content-type: multipart/form-data; boundary=---------------------------9051914041544843365972754266' \
      -d '-----------------------------9051914041544843365972754266
    Content-Disposition: form-data; name="query"
    
    mutation {
      updateLibraryEngineModel(input: {
        id: "4ae6d34d-8f4a-4e4e-aefa-12964cbf29c9",
        trainStatus: complete
        jsondata: {
          fingerprints: 3
        }
      }) {
        id
        trainStatus
        dataUrl
        jsondata
        modifiedDateTime
      }
    }
    -----------------------------9051914041544843365972754266
    Content-Disposition: form-data; name="file"; filename="file1.txt"
    Content-Type: text/plain
    
    contents of file1.txt
    
    -----------------------------9051914041544843365972754266--'
Additional Technical Documentation Information
Properties
1/18/2024 12:25 AM
1/18/2024 12:28 AM
1/18/2024 12:28 AM
Documentation
Documentation
000004077
Translation Information
English

Powered by