An object detection engine detects one or more objects in an image or video, based on a general (high level) ontology. For example, the engine might detect objects such as
person,
desk, or
aircraft.
Engine output
General detected objects can be reported in engine output by specifying an object of type: object. An object definition can exist in either the object array (for non-time-based detections) or in an object key in the series array (for time-based detections).
See the official object validation contract json-schema.
Time-series example
Here is an example of proper engine output for objects detected within a time series:
{
"schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
"validationContracts": ["object"],
"series": [
{
"startTimeMs": 2000,
"stopTimeMs": 2100,
"object": {
"type": "object",
"label": "dog",
"confidence": 0.9,
"boundingPoly": [
{
"x": 0.1,
"y": 0.1
},
{
"x": 0.1,
"y": 0.5
},
{
"x": 0.5,
"y": 0.5
},
{
"x": 0.5,
"y": 0.1
}
]
}
},
{
"startTimeMs": 2100,
"stopTimeMs": 2200,
"object": {
"type": "object",
"label": "cat",
"confidence": 0.55
}
}
]
}
Non-time-series example
Here is an example of proper engine output for objects detected over the entire input media:
{
"schemaId": "https://docs.veritone.com/schemas/vtn-standard/object.json",
"validationContracts": ["object"],
"object": [
{
"type": "object",
"label": "dog",
"confidence": 0.9,
"boundingPoly": [
{
"x": 0.1,
"y": 0.1
},
{
"x": 0.1,
"y": 0.5
},
{
"x": 0.5,
"y": 0.5
},
{
"x": 0.5,
"y": 0.1
}
]
}
]
}