Document processing and handling is an everyday challenge faced by both small and large enterprises. Invoices, receipts, order forms, and documents come in large quantities, from various sources, in various quality and formats. None can be omitted; they are all equally important and must be introduced and processed as part of the company’s systems and processes.
If this issue is familiar to you, Dear Reader, then you are probably a member of a team that tirelessly cares about the flow and processing of documents in accordance with procedures. Or you may be related to RPA and are looking for innovative solutions to help your co-workers automate this responsible task.
I invite you to read this article, where I will try to explain the issue of intelligent document processing.
Intelligent document processing – what is it?
Intelligent document processing is designed to optimize the tedious document handling process by engaging machine learning models. There are already many ready-made solutions on the market, and there will certainly be even more of them. For example, ABBYY Flexi Capture and AWS Document Understanding Solution (DUS) are worth mentioning.
While these tools can retrieve information from documents, unfortunately, we still need the right tools to use this information. If only because not all applications have an API.
Please note that the information returned may contain errors. Unfortunately, AI still makes a lot of mistakes in this aspect.
UiPath, which has been trying to provide comprehensive solutions for process automation beyond RPA robots, is also trying to solve this problem. One of such solutions is Document Understanding, the development of which I have been observing with interest since 2020, and I am convinced that it is a solution that deserves attention.
UiPath and Document Understanding – solution elements
Document Understanding is a UiPath framework for processing documents and is not intelligent. At most, it provides methods (I would call them classic) for processing structured and repetitive documents with the use of regular expressions.
To discover its full potential, you should equip yourself with the AI Center and Action Center.
- The AI Center is at the heart of the entire endeavor. It provides access to machine learning models, stores data used for model training and evaluation, and enables the definition and planning of training.
- Action Center is the user interface. This is where people’s ability to make decisions comes into play when there is a suspicion that data has been misunderstood. The user can correct the data or report an exception to the document being processed.
Both services are available as separate UiPath products and can be used in both client-side installation and cloud-based service.
Equipped with these elements, and knowledge about their purpose, we can take a look at the document processing itself:
From the diagram above, two key elements emerge from the perspective of adjusting the tool to our needs: the stages of classification and extraction.
- Classifiers are used to identify and classify processed documents. Is the document an invoice or an order, or is it an unknown document? This is the stage that determines the use of an appropriate extractor.
- Extractors take out information from documents. Information taken out from an invoice can be for example: a number, issue date, payment date, tax rate, individual items, and sum. The extracted data can be delegated to robots responsible for handling a given document.
The classification and extraction step are based on the confidence level expressed as a percentage. If the level of assurance regarding the processed document does not meet the specified requirements, such a document is delegated for human verification. The user, taking advantage of the validation station available in Action Center, confirms or corrects the data extracted by the robot. Verified data is available later in the process, and human validation data can be used in the AI Center to train the model to improve model performance.
After a few model training sessions, you will notice an improvement in the performance of the model but be aware that the process of training the model itself may require many sample documents. Sometimes it takes a long time before we achieve a satisfactory effect. These numbers can go into tens or hundreds, if not thousands.
AI Center – a point for servicing machine learning models
As already mentioned, the AI Center is at the heart of the whole endeavor. It provides access to machine learning models, stores data used for model training and evaluation, and enables the definition and planning of training.
It all sounds so great that I would like to say that UiPath has found a solution to all your document processing problems. Unfortunately, there is a catch. Well: whether a document can be subjected to data extraction is determined by the availability of the machine learning model.
UiPath has prepared several models available by default for the most popular documents:
However, if our document does not fit into the framework of ready-made models, it is possible to create your own specialized models. It requires the involvement of someone familiar with Data Science, machine learning, and Python.
It is worth mentioning that the AI Center, in addition to the models available by default, also includes open-source models for image, language, or emotion analysis in written text. Thus, AI Center extends the use of robots beyond document processing to, for example, email classifiers or product reviews.
It’s me, your robot … You trust me, don’t you?
One of the undoubted advantages of RPA robots, next to the fact that they can work days and nights, is that the chance of making human mistakes, such as a typo or deviation from the process, is negligible. Unfortunately, the situation is different when it comes to intelligent document processing.
Models operate based on data provided by OCR engines, and the results of OCR processing are directly dependent on the quality of the document itself, so when creating a robot, it is necessary to consider the level of OCR engine reliability.
OCR engine errors
The following example is a perfect illustration of this situation:
The image above shows the UiPath validation station used to validate or correct the data extracted from the document. As can be seen, although the „Invoice Number” field has been recognized correctly, the confidence level of the invoice number is very high and acceptable (99%) but the value is incorrect.
This is a direct fault of the OCR engine. It may be caused by the small font and average quality of the document (jpg image 750×1000). Unfortunately, the UiPath validation station does not currently display confidence levels about OCR processing results. These data are available on the robot code side and based on them, a document can be delegated for human validation.
Here are the data available from the robot:
As a curiosity, a correctly recognized DateDue field with a low OCR confidence level should be indicated. OCR Confidence for tabular values is -1, and the value of each row is available in a separate data set.
The natural conclusion is that skipping the OCR engine reliability check can lead to incorrect data being entered into the system, which is a fundamental problem. Just imagine a situation in which the robot posts an invoice for 180,000 for payment. PLN, instead of one hundred thousand PLN.
A few words of summary
Does this completely disqualify the tool? In my opinion – no. You should check the reliability factors of the OCR engine and the model. In the event of any deviation from the adopted levels of certainty, it is necessary to delegate such a document for human validation, even if everything turns out to be correctly recognized.
When processing documents related to finances, I would suggest setting thresholds for amounts that are subject to absolute human validation. Additional security may be to use a second OCR engine and compare the results.
Tools for intelligent document processing have been with us for a long time and certainly support the daily work of thousands of office workers in their constant struggle with the processing of hundreds of documents a day. The trust we can place in these tools is still an open issue.
Who knows – perhaps, the level of their advancement will soon eliminate the trust problem?