E-mails, case files contracts, minutes, and financial reports are a few of many documents on which employees spend countless hours reading, understanding and condensing with the goal of identifying the most relevant information. Imagine a scenario where for instance legal documents would be anonymized automatically and relevant key information such as expiration date, value and costumer would be extracted from a contract of any size as soon received. This scenario is reality and known as Document AI, Intelligent Document Processing or Document Intelligence.
Advancement in technology and adaption of already well-known machine learning techniques has made it possible for computers to scan, read and understand digital and paper documents on a similar level as humans. The capability of understanding key data combined with automation capability offers organizations a unique possibility to enter the intelligent automation stage combining AI capabilities with robotics.
How it works
Information Extraction
This is an essential step in all of Document Intelligence capabilities. In itself, information extraction provides significant value as it converts unstructured text files to structured data. Extracting relevant and valuable information from text heavy documents can be labor-intensive and time-consuming. By utilizing mathematical models to identify both known and hidden content and storing it in a structured way, subsequent identification of documents is more accurate and faster.
Classification
Companies are often flooded with emails, whether it being internal support, costumer services or external reviews. Handling these can be a cumbersome process, not only because the specific request must be handled independently but the process of identifying the right recipient of the document can be a timely process. By analyzing the words and structure of the documents, they are categorized into a set of predetermined categories which will then be used to direct them to right recipient immediately.
Anonymization
Since the introduction of General Data Protection Regulation (GDPR) was first instituted in May 2016 it has influenced companies and how they handle data. The regulation on data privacy and protection forces companies to enforce a strict policy of how to work with sensitive data. To utilize documents including sensitive data they must be anonymized beforehand. Anonymization of documents is the process of identifying sensitive words or phrases and replacing them with neutral noun or pronouns, thus making it impossible for readers to identify the subjects of interest. By utilizing supervised deep learning models, Document Intelligence provides a scalable and automatic solution to identify and substitute sensitive information.
Semantic Search
Obtaining relevant information as fast and efficiently as possible is crucial to many companies. Hospitals uses past journals to decide on process of current patient, law firms use previous cases to guide their handling of a current case and costumer service centers wants to provide the most accurate information as fast as possible. This information was previously categorized to define what information was relevant. Semantic search goes past looking for purely keywords but include intent and contextual meaning. This provides higher quality recommendations as well as being indifferent to formulation.
Contact
Hans Peder Houe, Intelligent Automation Lead Denmark, tel. +45 2529 5798.