Title	Text recognition engines

URL Name	000004074

Audience	Public

Product (Internal List) aiWare - aiWare

Body

[API][yes]

[Search][yes]

[UI][yes]

Text recognition engines process documents (primarily images) to recognize text in them and express that recognized text in a structured format. Optical character recognition (OCR) is a technology that is often used to implement text recognition engines.

They are similar to text extraction engines in their output data structure. But where text extraction engines are used to extract text content from semi-structured files, like PDFs or Microsoft Word documents, text recognition engines are used to recognize text in unstructured files, such as images.

Engine output

Text recognition engine output can follow two forms, depending on whether the file being processed is time-based file (e.g. audio, video) or a non-time-based (e.g. image).

See the text validation contract json-schema.

Example: time-based

{
  "schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
  "validationContracts": ["text"],
  "series": [
    {
      "startTimeMs": 12000,
      "stopTimeMs": 13000,
      "object": {
        "type": "text",
        "text": "the quick brown fox",
        "boundingPoly": [
          {
            "x": 0.1,
            "y": 0.1
          },
          {
            "x": 0.1,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.1
          }
        ]
      }
    },
    {
      "startTimeMs": 13000,
      "stopTimeMs": 14000,
      "object": {
        "type": "text",
        "text": "the quick brown fox jumped over the lazy dog",
        "boundingPoly": [
          {
            "x": 0.1,
            "y": 0.1
          },
          {
            "x": 0.1,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.1
          }
        ]
      }
    }
  ]
}

Example: non-time-based

{
  "schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
  "validationContracts": ["text"],
  "language": "en-US",
  "object": [
    {
      "type": "text",
      "text": "The quick brown fox jumped over the lazy dog.",
      "page": 5,
      "paragraph": 3,
      "sentence": 2,
      "boundingPoly": [
        {
          "x": 0.1,
          "y": 0.1
        },
        {
          "x": 0.1,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.1
        }
      ]
    },
    {
      "type": "text",
      "text": "That worried the dog, but he was too lazy to do anything about it.",
      "page": 5,
      "paragraph": 3,
      "sentence": 3,
      "boundingPoly": [
        {
          "x": 0.2,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.3
        },
        {
          "x": 0.3,
          "y": 0.3
        },
        {
          "x": 0.3,
          "y": 0.2
        }
      ]
    }
  ]
}

Translating recognized text

Some translation engines will take the outputs of text recognition engines as input to their translation engines. To learn how those engines are built, see Translating Recognized (OCR) Text.

Created Date	5/7/2024 6:26 PM

Last Modified Date	5/7/2024 6:26 PM

Last Published Date	5/7/2024 6:26 PM

Article Record Type	Documentation

Veritone Record Type	Documentation

Article Number	000004074