Salesforce

Speaker verification engines

« Go Back
Information
Speaker verification engines
000003995
Public
Product Selection
aiWare - aiWare
Article Details
[API][yes]
[Search][yes]
[UI][yes]

Speaker verification engines analyze human voices in media assets and score them as to their similarity with the voices(s) of a specified user identity.

Training and libraries

Training for the speaker verification engine is done by using the enroll mode of the engine, specified when calling the engine. The hashed voiceprint of the trained identity is stored in the library, with the hash key stored in an additional database.

[Note] The verify mode of the engine retrieves the hashed voiceprint from the library corresponding to the specified username, decrypts the voiceprint using the hash key, and compares the decrypted voiceprint to the voiceprint extracted from the input image.

Engine input

The speaker verification engine is an audio processing engine that performs segment processing. It accepts as input a custom binary file containing the following in the respective order:

  1. 8 bytes containing the number of bytes of a byte-encrypted json string
  2. A byte-encrypted JSON string
  3. 8 bytes containing the number of bytes of a binary audio file
  4. Binary audio file

An example of the byte-encrypted JSON string is as follows: 

{
    "mode": "verify",
    "username": "jsmith@veritone.com",
    "libraryId": "13e6f4a3-0d5c-4e11-9a30-913e981cb9ad",
    "dbUser": "postgres",
    "dbHost": "127.0.0.1",
    "dbDatabase": "postgresdb",
    "dbSchema": "public",
    "dbPort": 5432,
    "userPhrase": "hello world"
}
[Note] The userPhrase is for the engine's transcription functionality. It's the phrase that the audio needs to match.

Engine output

The speaker verification engine output should be stored as an object in AION. The type of the object is verification. Each speaker maps back to a specified user identity which corresponds to an entity in a library; hence the object includes the entityId along with the libraryId. The similarity score of the speaker's audio to the audio sample(s) for the entity is the confidence.

The mode specifies whether the engine is run in enroll or verify mode. An (optional) auxiliary object contains a score showing the degree of match between the transcribed audio and the userPhrase in the input JSON object.

Example

Here is an example of the simplest type of speaker verification engine output:

{
  "schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
  "validationContracts": [
    "verification"
  ],
  "object": [{
    "type": "speaker-verification",
    "mode": "verify",
    "entityId": "11a14999-0531-4d3e-9a44-68cdd4f93659",
    "libraryId": "13e6f4a3-0d5c-4e11-9a30-913e981cb9ad",
    "confidence": 0.80,
    "transcription": {
      "text": "hello world",
      "confidence": 0.80
    }
  }]
}
Additional Technical Documentation Information
Properties
1/16/2024 10:16 PM
1/16/2024 10:16 PM
1/16/2024 10:16 PM
Documentation
Documentation
000003995
Translation Information
English

Powered by