[API][partial]
[Search][yes]
[UI][no]
Content classification engines classify text into particular categories based on words in the text.
Engine input
Content classification engines can specify supportedInputFormats in their manifest for mime types they can support natively (e.g. text/plain, application/pdf). In this case, engines are given the entire file as their input and are responsible for outputting insights from the entire file in their .aion output.
In the future, content classification engines will also be able to accept .aion text extraction output as their input, opening up processing to any file types supported by text extraction engines.
Engine output
Content classification engine output conforms to the concept validationContract and writes results into the object array as objects of type concept. The objectCategory array is used to specify one or more categories that the text has been classified into.
- The text written to the
class key are specified by the engine provider based on their own taxonomy. - If the classification is in reference to a particular taxonomy, the
@id key can be used to provide a URI to the category definition. - If there is a weighting or confidence on the various classifications, it can be expressed with the
confidence key.
aiWARE does not mandate a master concept taxonomy that engines are required to conform to. They use class names (and @id if appropriate) to map to external taxonomies.
See the concept validation contract json-schema.
Examples
A basic example of a content classification engine output using only required keys:
{
"validationContracts": ["concept"],
"object": [
{
"type": "concept",
"objectCategory": [
{
"class": "person"
}
]
}
]
}
A standard engine output example that classifies a particular sentence in a document into two categories from the IPTC Subject Codes taxonomy. IPTC is just one example. You can use any taxonomy if its classes can be referenced with a URI.
{
"validationContracts": ["concept"],
"object": [
{
"type": "concept",
"text": "this is the text being classified, and it's talking about business finance.",
"objectCategory": [
{
"class": "economy, business and finance",
"@id": "http://cv.iptc.org/newscodes/subjectcode/04000000",
"confidence": 0.468
},
{
"class": "financial and business service",
"@id": "http://cv.iptc.org/newscodes/subjectcode/04006000",
"confidence": 0.897
}
]
}
]
}