Salesforce

Rich text translation

« Go Back
Information
Rich text translation
000003925
Public
Product Selection
aiWare - aiWare
Article Details
[API][yes]
[Search][no]
[UI][no]

Rich text documents are one of the five input formats that translation engines can support. Rich text documents are defined as textual document formats like Microsoft Word (.docx) or PDF.

[Note] The scope of this article covers translating the contents of these documents into new documents with the same styling as the original document format. If you are interested in processing the contents of rich text documents but not generating rich text outputs, consider building an engine that can process extracted text instead.

Engine input

Rich text translation engines should be implemented as segment processing engines. Each segment will be the entire content of one file.

Engine output

Because rich text translation generates a file and not merely data, generating rich text translation engine output is slightly more involved than other cognitive engine outputs, requiring the uploading of assets and the referencing of their IDs.

See the official media-translated validation contract json-schema.

The engine should create a separate output file for each desired output language. Once the files are generated, the engine should do the following to properly register the result with aiWARE:

  1. Call the getSignedWritableUrls query to retrieve pre-authorized URL destinations to which the files can be posted.

    query {
          getSignedWritableUrls(number: 2) {
            url  # The URL to PUT to
            unsignedUrl  # The URL to submit with createAsset
          }
        }
  2. For each file, upload the file to one of the URLs with an HTTP PUT request. Each file must have a distinct URL. PUTing to the same URL twice will overwrite the contents of the first PUT request.
  3. For each file, call the createAsset mutation and obtain an asset ID for each.

    mutation {
          createAsset(input: {
            containerId: "<The TDO ID of this task>"
            type: "media"
            contentType: "<the MIME type of your output file>"
            uri: "<the unsignedUrl from the getSignedWritableUrls call>"
            details: {
              language: "en-US"
            }
          }) {
            id
          }
        }
    [Tip] You can include multiple mutations in one request if you alias them.
    [Tip] Steps 2 and 3 can be combined by simply calling the createAsset mutation using multipart/form-data to submit the contents of the text file along with creating the asset, rather than referencing the contents by uri.
  4. Return your .aion response in the following format:

    {
      "validationContracts": ["media-translated"],
      "media": [
        {
          "language": "en",
          "assetId": "<ID of asset uploaded of english language translation>"
        },
        {
          "language": "fr",
          "assetId": "<ID of asset uploaded for french language translation>"
        }
      ]
    }

Advanced: Including Index-able Text in the Engine Output

[Warn] Indexing and UI support will be added over time for translated documents. Following this guidance will ensure that your engine output will be supported in the future, but support for these features may not be available in all cases today.

By outputting document files (per the above specifications), users are able to get a translated copy of their original file, but it will not be indexed in aiWARE for searching or UI display. If you would like your file to be indexed in aiWARE, you may optionally also include engine output according to the text extraction specifications. A combined document file and extracted text output would look something like this:

[Note] Notice that the output conforms to both the media-translated and text validation contracts and includes both the media array (for file references) and the object array (for extracted text).
{
  "validationContracts": ["media-translated", "text"],
  "media": [
    {
      "assetId": "<ID of asset uploaded of english language translation>",
      "language": "en"
    },
    {
      "assetId": "<ID of asset uploaded for french language translation>",
      "language": "fr"
    }
  ],
  "object": [
    {
      "type": "text",
      "text": "this is the first line of text which was originally written in spanish",
      "language": "en",
      "sentence": 1
    },
    {
      "type": "text",
      "text": "c'est la première ligne de texte qui a été écrite en espagnol",
      "language": "fr",
      "sentence": 1
    }
  ]
}
Additional Technical Documentation Information
Properties
5/7/2024 6:18 PM
5/7/2024 6:20 PM
5/7/2024 6:20 PM
Documentation
Documentation
000003925
Translation Information
English

Powered by