This table indexes engine capabilities that help you choose the appropriate engine type for your media.
Your media may be:
- Video
- Image
- Audio file
- Data file (such as text)
Engines for each type of media are listed below.
API calls
To call single engines, see the API example for running a job using launch single engine template.
Select engines in the UI
To choose engines for your media when registering engines in the Developer utility, see Step 2 - Functionality.
Video
Video also uses the engines under Image and Audio for capturing image and audio portions of videos.
| Class | Capability | Description |
|---|
| Biometrics | Face detection | Detects faces in an image or video. |
| Biometrics | Face recognition | Identifies people in an image or video by associating each individual's face with their name. |
| Facial Features | Facial features | Computes metrics pertaining to face movement using a series of face landmarks and audio. |
| Vision | License plate recognition (ALPR) | Produces a text string of alphanumeric characters for each license plate recognized in an image or video. |
| Vision | Logo detection | Recognizes logos or branding elements in an image or video. |
| Vision | Object detection | Detects objects or concepts in an image or video from a general or broad ontology, such as "car" or "person." |
| Vision | Text recognition (OCR) | Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video. |
Image
| Class | Capability | Description |
|---|
| Biometrics | Face detection | Detects faces in an image or video. |
| Biometrics | Face recognition | Identifies people in an image or video by associating each individual's face with their name. |
| Vision | Image classification | Classifies the entire image rather than objects within an image, such as "landscape" or "basketball game." |
| Vision | License plate recognition (ALPR) | Produces a text string of alphanumeric characters for each license plate recognized in an image or video. |
| Vision | Logo detection | Recognizes logos or branding elements in an image or video. |
| Vision | Object detection | Detects objects or concepts in an image or video from a general or broad ontology, such as "car" or "person." |
| Vision | Text recognition (OCR) | Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video. |
Audio
| Class | Capability | Description |
|---|
| Audio | Audio fingerprinting | Recognizes a specific audio segment, such as a radio advertisement, as it appears in a longer audio file or on its own. |
| Biometrics | Speaker verification | Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In enroll mode, the engine enrolls the speaker's voice into the library under the username. |
| Speech | Speaker detection | Speaker Separation, Diarization. Partitions an input audio stream into segments according to who is speaking when. |
| Speech | Speaker recognition | Speaker Identification. Identifies speakers in an audio file based on trained recordings of their voice. |
| Speech | Transcription | Converts speech audio to text. |
| Verification | Speaker verification | Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In enroll mode, the engine enrolls the speaker's voice into the library under the username. |
Data file
| Class | Capability | Description |
|---|
| Data | Correlation | Associates two data products based on some commonality, such as occurrence over time. For example, may associate weather data on a given date with stock prices on that date. |
| Data | Geolocation | Identifies the geographic location of a person or object in the real world or some virtual equivalent. |
| Data | Brand safety | Processes media to determine where content falls on a scale of sensitivity or concern. |
| Text | Anomaly detection | Assigns a value to each item in a time-series according to how anomalous the object is. |
| Text | Content classification | Categorizes one or multiple documents according to a pre-defined ontology. |
| Text | Entity extraction | aka Named-entity recognition. Classifies named entities located in unstructured text into pre-defined categories such as people, organizations and locations. |
| Text | Keyword extraction | Identifies key terms and/or phrases that appear in documents, based on parts of speech, salience, or other criteria. |
| Text | Language identification | Detects one or multiple natural languages in text. |
| Text | Sentiment analysis | Classifies text according to sentiment. May include a score representing negative, neutral or positive, or include a wider breadth of tags such as "happy" or "excited." |
| Text | Summarization | Generates a summary of written text. |
| Text | Text extraction | Extract textual information from documents, and expresses that extracted text in a structured format. |
| Text | Translation | Translates natural language from a text source. Includes translating plain text, rich text, extracted text, recognized text(OCR), and transcripts. |
| Verification | Face verification | Determines the similarity between the face in an image to the face of a specified username. In enroll mode, the engine enrolls the face image into the library under the username. |
| Vision | Text recognition (OCR) | Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video. |