[API][yes]
[Search][yes]
[UI][yes]
The facial features engine accepts a set of face landmarks and audio with their associated timestamps as input. The engine computes the correlation between the movement of the face landmarks and the audio. The correlation between the face landmarks and the audio is close to 1.0.
Training and libraries
No training is required for the facial features engine.
Engine input
The facial features engine performs segment processing. The engine accepts a custom binary file as input that contains the following in this order:
- 8 bytes containing the number of bytes of a byte-encrypted json string.
- A byte-encrypted JSON string.
- 8 bytes containing the number of bytes of a binary audio file.
- Binary audio file.
Examples
A byte-encrypted JSON string:
{
"faceLandmarks": [ <a time-series of 68-point face landmark objects> ],
"faceTimes": [ <the timestamp (ms) of each element in the time-series> ],
"voiceStartTime": <the timestamp (ms) of the beginning of the audio file>
}
A JSON string containing each of the entries:
{
"faceLandmarks":[
[{"type":"leftEye","locationPoly":[{"x":0.3992494462245779,"y":0.4791871959208469}, ... ]},
{"type":"leftEyeBrow","locationPoly":[{"x":0.3751081599692518,"y":0.4328096755334755}, ... ]},
{"type":"rightEye","locationPoly":[{"x":0.5054534265610093,"y":0.4739422820994278}, ... ]},
{"type":"rightEyeBrow","locationPoly":[{"x":0.5046047780035698,"y":0.39965423487017754}, ... ]},
{"type":"mouth","locationPoly":[{"x":0.4395773590846622,"y":0.6907280849157791}, ... ]},
{"type":"nose","locationPoly":[{"x":0.4763590261411557,"y":0.4729081259905319}, ... ]},
{"type":"jawOutline","locationPoly":[{"x":0.3631841793953267,"y":0.47464472245756806}, ... ]}],
...
[{"type":"leftEye","locationPoly":[{"x":0.4126658274570773,"y":0.4603450470008631}, ... ]},
{"type":"leftEyeBrow","locationPoly":[{"x":0.39045616012414125,"y":0.41106438140329515}, ... ]},
{"type":"rightEye","locationPoly":[{"x":0.5192474931024308,"y":0.4613115045267522}, ... ]},
{"type":"rightEyeBrow","locationPoly":[{"x":0.5186631830886299,"y":0.38466298937949167}, ... ]},
{"type":"mouth","locationPoly":[{"x":0.4450666400615419,"y":0.6727261704418241}, ... ]},
{"type":"nose","locationPoly":[{"x":0.48727477197650026,"y":0.46038189699300586}, ... ]},
{"type":"jawOutline","locationPoly":[{"x":0.3656751344050108,"y":0.47420865840844106}, ... ]}]
],
"faceTimes":[1586993745748, ... ],
"voiceStartTime":"1586993745671"
}
Engine output
In AION, store the facial features engine output as an object and the object type as facial-features. Each face maps to a specified user identity that corresponds to an entity in a library, so the object includes the entityId and the libraryId.
The similarity score of the face to the face(s) for the entity is confidence. The mode specifies if the engine is run in enroll or verify mode.
Example
The simplest type of face verification engine output:
{
"schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
"validationContracts": [
"facialFeatures"
],
"object": [
{
"type": "facial-features",
"lipVoiceCorrelation": {
"confidence": 0.9
},
"lipMovement": {
"confidence": 1.5
}
}
]
}