Salesforce

Text recognition engines

« Go Back
Information
Text recognition engines
000004074
Public
Product Selection
aiWare - aiWare
Article Details
[API][yes]
[Search][yes]
[UI][yes]

Text recognition engines process documents (primarily images) to recognize text in them and express that recognized text in a structured format. Optical character recognition (OCR) is a technology that is often used to implement text recognition engines.

They are similar to text extraction engines in their output data structure. But where text extraction engines are used to extract text content from semi-structured files, like PDFs or Microsoft Word documents, text recognition engines are used to recognize text in unstructured files, such as images.

Engine output

Text recognition engine output can follow two forms, depending on whether the file being processed is time-based file (e.g. audio, video) or a non-time-based (e.g. image).

See the text validation contract json-schema.

Example: time-based

{
  "schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
  "validationContracts": ["text"],
  "series": [
    {
      "startTimeMs": 12000,
      "stopTimeMs": 13000,
      "object": {
        "type": "text",
        "text": "the quick brown fox",
        "boundingPoly": [
          {
            "x": 0.1,
            "y": 0.1
          },
          {
            "x": 0.1,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.1
          }
        ]
      }
    },
    {
      "startTimeMs": 13000,
      "stopTimeMs": 14000,
      "object": {
        "type": "text",
        "text": "the quick brown fox jumped over the lazy dog",
        "boundingPoly": [
          {
            "x": 0.1,
            "y": 0.1
          },
          {
            "x": 0.1,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.5
          },
          {
            "x": 0.5,
            "y": 0.1
          }
        ]
      }
    }
  ]
}

Example: non-time-based

{
  "schemaId": "https://docs.veritone.com/schemas/vtn-standard/master.json",
  "validationContracts": ["text"],
  "language": "en-US",
  "object": [
    {
      "type": "text",
      "text": "The quick brown fox jumped over the lazy dog.",
      "page": 5,
      "paragraph": 3,
      "sentence": 2,
      "boundingPoly": [
        {
          "x": 0.1,
          "y": 0.1
        },
        {
          "x": 0.1,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.1
        }
      ]
    },
    {
      "type": "text",
      "text": "That worried the dog, but he was too lazy to do anything about it.",
      "page": 5,
      "paragraph": 3,
      "sentence": 3,
      "boundingPoly": [
        {
          "x": 0.2,
          "y": 0.2
        },
        {
          "x": 0.2,
          "y": 0.3
        },
        {
          "x": 0.3,
          "y": 0.3
        },
        {
          "x": 0.3,
          "y": 0.2
        }
      ]
    }
  ]
}

Translating recognized text

Some translation engines will take the outputs of text recognition engines as input to their translation engines. To learn how those engines are built, see Translating Recognized (OCR) Text.

Additional Technical Documentation Information
Properties
5/7/2024 6:26 PM
5/7/2024 6:26 PM
5/7/2024 6:26 PM
Documentation
Documentation
000004074
Translation Information
English

Powered by