| Class | Capability | Description |
|---|
| Audio | Audio fingerprinting | Recognizes a specific audio segment, such as a radio advertisement, as it appears in a longer audio file or on its own. |
| Biometrics | Face detection | Detects the presence of one or multiple faces in an image or video. |
| Biometrics | Face recognition | Identifies one or multiple people in an image or video by associating each individual's face to their name. |
| Biometrics | Speaker verification | Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In enroll mode, the engine enrolls the speaker's voice into the library under the username. |
| Data | Correlation | Associates two data products based on some commonality, such as occurrence over time. For example, may associate weather data on a given date with stock prices on that date. |
| Data | Geolocation | Identifies the geographic location of a person or object in the real world or some virtual equivalent. |
| Data | Brand safety | Processes media to determine where content falls on a scale of sensitivity or concern. |
| Facial features | Facial features | Computes metrics pertaining to face movement using a series of face landmarks and audio. |
| Speech | Speaker detection | aka Speaker Separation, Diarization. Partitions an input audio stream into segments according to who is speaking when. |
| Speech | Speaker recognition | aka Speaker Identification. Identifies speakers in an audio file based on trained recordings of their voice. |
| Speech | Transcription | Converts speech audio to text. |
| Text | Anomaly detection | Assigns a value to each item in a time-series according to how anomalous the object is. |
| Text | Content classification | Categorizes one or multiple documents according to a pre-defined ontology. |
| Text | Entity extraction | aka Named-entity recognition. Classifies named entities located in unstructured text into pre-defined categories such as people, organizations and locations. |
| Text | Keyword extraction | Identifies key terms and/or phrases that appear in documents, based on parts of speech, salience, or other criteria. |
| Text | Language identification | Detects one or multiple natural languages in text. |
| Text | Sentiment analysis | Classifies text according to sentiment. May include a score representing negative, neutral or positive, or include a wider breadth of tags such as "happy" or "excited." |
| Text | Summarization | Generates a summary of written text. |
| Text | Text extraction | Extract textual information from documents, and expresses that extracted text in a structured format. |
| Text | Translation | Translates natural language from a text source. Includes translating plain text, rich text, extracted text, recognized text (OCR), and transcripts. |
| Verification | Face verification | Determines the similarity between the face in an image to the face of a specified username. In enroll mode, the engine enrolls the face image into the library under the username. |
| Verification | Speaker verification | Determines the similarity between the speaker's voice in an audio file to the voice of a person with a specified username. In enroll mode, the engine enrolls the speaker's voice into the library under the username. |
| Vision | Image classification | Classifies the entire image rather than objects within an image, such as "landscape" or "basketball game." |
| Vision | License plate recognition (ALPR) | Produces a text string of alphanumeric characters for each license plate recognized in an image or video. |
| Vision | Logo detection | Recognizes one or more logos or branding elements in an image or video. |
| Vision | Object detection | Detects one or multiple objects or concepts in an image or video from a general/broad ontology, such as "car" or "person." |
| Vision | Text recognition | aka Optical Character Recognition. Converts alphanumeric characters to text in a document, image, or video. |