[API][Partial]
[Search][Yes]
[UI][No]
Entity extraction, sometimes referred to as named entity extraction, is an aspect of Natural Language Processing (NLP) that refers to labeling words, phrases, or even concepts in text. Entity extraction engines classify named entities, located in unstructured text, into predefined categories such as People, Organizations, or Locations. They can also denoteĀ time or date, which can be useful when pre-processing large chunks of text.
Use cases
When building a search engine, you might want to index people, places, and things so your customers can find content based on those categories. For example, users want to find articles about the city of London. Articles about people named London can be set to rank lower.
Engine input
Entity extraction engines can specify supportedInputFormats in their manifest for mime types they can support natively (e.g. text/plain, application/pdf). In this case, engines are given the entire file as their input and are responsible for outputting the entire list of extracted entities in their .aion output.
Training and libraries
If entity extraction engines are made trainable with libraries then they can map their output back to entities in the libraries they were trained on by including an entityId in their engine output.
Engine output
See the entity validation contract json-schema.
Examples
Here is an example output that only specifies a label for the identified entity:
{
"validationContracts": ["entity"],
"object": [
{
"type": "namedEntity",
"label": "John",
"sentence": 1
}
]
}
A more involved example includes a label, confidence, a mapping to a category classification taxonomy, sentiment readings, and page/paragraph/sentence referencing (all optional):
{
"validationContracts": ["entity"],
"object": [
{
"type": "namedEntity",
"label": "John Smith",
"confidence": 0.5,
"objectCategory": [
{
"class": "person"
}
],
"sentiment": {
"positiveValue": 1,
"negativeValue": 0
},
"page": 1,
"paragraph": 1,
"sentence": 1
}
]
}
Library entity example
This is an example output that maps an extracted entity back to an aiWARE library entity.
{
"validationContracts": ["entity"],
"object": [
{
"type": "namedEntity",
"entityId": "<ID of the entity from an aiWARE library>",
"libraryId": "<Option ID of the library the entity is contained in>",
"sentence": 1
}
]
}