Title	Media configuration

URL Name	000004157

Audience	Public

Product (Internal List) aiWare - aiWare

Body

This table indexes engine capabilities that help you choose the appropriate engine type for your media.

Your media may be:

Video
Image
Audio file
Data file (such as text)

Engines for each type of media are listed below.

API calls

To call single engines, see the API example for running a job using launch single engine template.

Select engines in the UI

To choose engines for your media when registering engines in the Developer utility, see Step 2 - Functionality.

Video

Video also uses the engines under Image and Audio for capturing image and audio portions of videos.

Class	Capability	Description
Biometrics	Face detection	Detects faces in an image or video.
Biometrics	Face recognition	Identifies people in an image or video by associating each individual's face with their name.
Facial Features	Facial features	Computes metrics pertaining to face movement using a series of face landmarks and audio.
Vision	License plate recognition (ALPR)	Produces a text string of alphanumeric characters for each license plate recognized in an image or video.
Vision	Logo detection	Recognizes logos or branding elements in an image or video.
Vision	Object detection	Detects objects or concepts in an image or video from a general or broad ontology, such as "car" or "person."
Vision	Text recognition (OCR)	Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video.

Image

Class	Capability	Description
Biometrics	Face detection	Detects faces in an image or video.
Biometrics	Face recognition	Identifies people in an image or video by associating each individual's face with their name.
Vision	Image classification	Classifies the entire image rather than objects within an image, such as "landscape" or "basketball game."
Vision	License plate recognition (ALPR)	Produces a text string of alphanumeric characters for each license plate recognized in an image or video.
Vision	Logo detection	Recognizes logos or branding elements in an image or video.
Vision	Object detection	Detects objects or concepts in an image or video from a general or broad ontology, such as "car" or "person."
Vision	Text recognition (OCR)	Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video.

Audio

Class	Capability	Description
Audio	Audio fingerprinting	Recognizes a specific audio segment, such as a radio advertisement, as it appears in a longer audio file or on its own.
Biometrics	Speaker verification	Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In `enroll` mode, the engine enrolls the speaker's voice into the library under the username.
Speech	Speaker detection	Speaker Separation, Diarization. Partitions an input audio stream into segments according to who is speaking when.
Speech	Speaker recognition	Speaker Identification. Identifies speakers in an audio file based on trained recordings of their voice.
Speech	Transcription	Converts speech audio to text.
Verification	Speaker verification	Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In `enroll` mode, the engine enrolls the speaker's voice into the library under the username.

Data file

Class	Capability	Description
Data	Correlation	Associates two data products based on some commonality, such as occurrence over time. For example, may associate weather data on a given date with stock prices on that date.
Data	Geolocation	Identifies the geographic location of a person or object in the real world or some virtual equivalent.
Data	Brand safety	Processes media to determine where content falls on a scale of sensitivity or concern.
Text	Anomaly detection	Assigns a value to each item in a time-series according to how anomalous the object is.
Text	Content classification	Categorizes one or multiple documents according to a pre-defined ontology.
Text	Entity extraction	aka Named-entity recognition. Classifies named entities located in unstructured text into pre-defined categories such as people, organizations and locations.
Text	Keyword extraction	Identifies key terms and/or phrases that appear in documents, based on parts of speech, salience, or other criteria.
Text	Language identification	Detects one or multiple natural languages in text.
Text	Sentiment analysis	Classifies text according to sentiment. May include a score representing negative, neutral or positive, or include a wider breadth of tags such as "happy" or "excited."
Text	Summarization	Generates a summary of written text.
Text	Text extraction	Extract textual information from documents, and expresses that extracted text in a structured format.
Text	Translation	Translates natural language from a text source. Includes translating plain text, rich text, extracted text, recognized text(OCR), and transcripts.
Verification	Face verification	Determines the similarity between the face in an image to the face of a specified username. In `enroll` mode, the engine enrolls the face image into the library under the username.
Vision	Text recognition (OCR)	Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video.