How does content classification work?
ABBYY Intelligent Document Processing solutions help you organize semi-structured and pure text information and enable automatic content classification. ABBYY brings sophisticated natural language processing (NLP) and data capture technologies to users through an easy-to-use interface, making classification easy for any user.
In principle, the classification technique in Intelligent Document Processing consists of three steps:
Preparing data sets for classification training
At this step, the requested document classes are defined. For each document class, several document examples—with similar appearance and/or content—are selected. With the help of machine learning and NLP algorithms, ABBYY technology analyzes the training documents within each document class and defines parameters that should be used to identify the respective document class.
Training the Classification Model
Information about document classes and respective parameters is imported into the Classification Model, and the Classification Model is trained during this step. The model can use Image Classifier, Text Classifier, or a combination of both. The performance can be optimized by defining the balance between high recall and high precision. Cross-validation of data is available to test the quality of the Classification Model.
Classification deployment
During the classification process, the Classification Model analyzes each incoming document. To correctly determine the document type, the Classification Model calculates requested parameters for each document and compares them with the information it received during the training step. Developers can create a routine, allowing users to flexibly update the training data set and re-train the Classification Model.
Comments