[API][yes]
[Search][yes]
[UI][yes]
Text recognition engines process documents (primarily images) to recognize text in them and express that recognized text in a structured format. Optical character recognition (OCR) is a technology that is often used to implement text recognition engines.
They are similar to text extraction engines in their output data structure. But where text extraction engines are used to extract text content from semi-structured files, like PDFs or Microsoft Word documents, text recognition engines are used to recognize text in unstructured files, such as images.
Engine output
Text recognition engine output can follow two forms, depending on whether the file being processed is time-based file (e.g. audio, video) or a non-time-based (e.g. image).
See the text validation contract json-schema.
Example: time-based
{
"schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
"validationContracts": ["text"],
"series": [
{
"startTimeMs": 12000,
"stopTimeMs": 13000,
"object": {
"type": "text",
"text": "the quick brown fox",
"boundingPoly": [
{
"x": 0.1,
"y": 0.1
},
{
"x": 0.1,
"y": 0.5
},
{
"x": 0.5,
"y": 0.5
},
{
"x": 0.5,
"y": 0.1
}
]
}
},
{
"startTimeMs": 13000,
"stopTimeMs": 14000,
"object": {
"type": "text",
"text": "the quick brown fox jumped over the lazy dog",
"boundingPoly": [
{
"x": 0.1,
"y": 0.1
},
{
"x": 0.1,
"y": 0.5
},
{
"x": 0.5,
"y": 0.5
},
{
"x": 0.5,
"y": 0.1
}
]
}
}
]
}
Example: non-time-based
{
"schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
"validationContracts": ["text"],
"language": "en-US",
"object": [
{
"type": "text",
"text": "The quick brown fox jumped over the lazy dog.",
"page": 5,
"paragraph": 3,
"sentence": 2,
"boundingPoly": [
{
"x": 0.1,
"y": 0.1
},
{
"x": 0.1,
"y": 0.2
},
{
"x": 0.2,
"y": 0.2
},
{
"x": 0.2,
"y": 0.1
}
]
},
{
"type": "text",
"text": "That worried the dog, but he was too lazy to do anything about it.",
"page": 5,
"paragraph": 3,
"sentence": 3,
"boundingPoly": [
{
"x": 0.2,
"y": 0.2
},
{
"x": 0.2,
"y": 0.3
},
{
"x": 0.3,
"y": 0.3
},
{
"x": 0.3,
"y": 0.2
}
]
}
]
}
Translating recognized text
Some translation engines will take the outputs of text recognition engines as input to their translation engines. To learn how those engines are built, see Translating Recognized (OCR) Text.