AI on Azure: Unlocking Unstructured Data — Processing Documents with Azure Document Intelligence (3/n)
Extracting data from various types of Documents helps Organisations automate humungous workloads. Most organisations amass a ton of documents in the form of contractual agreements, receipts, invoices, identification information, contact cards and more. Unlocking the unstructured data will power the organisations to act quickly and decide intelligently in their operations and decisions making processes.
In our series on AI on Azure, we looked at the services that Azure offers in the AI ecosystem in the form of Azure AI Services. In the last two articles, we were introduced to the AI Services in general as well as developed apps using the basic features.
The series is here:
- Azure on AI: Introducing Azure AI Services — Building a Basic AI App (1/n)
- Azure on AI: Developing Private Data Apps using AI Studio and AI Search (2/n)
- Azure on AI: Extracting data from documents using Azure AI Doc Intelligence (3/n)
In this article, we will address a pain point — intelligently extracting fields and data from documents using AI powered models by Azure, especially the Document Intelligence service.
Document Intelligence
The document intelligence service is a managed cloud-native AI powered document processing service. It helps extract data in a structured manner by analysing the documents using AI models.
Jump on to Azure Portal and provision Document Intelligence service:
Provisioning the Doc Intelligence service
Once the service is created, you can test your documents in the Document Intelligence Studio.
We need the API key and endpoint to connect to the DI studio. In the DI Dashboard, navigate to the Keys and Endpoints menu and fetch those:
Endpoint and Keys
Once you have noted the endpoint and the API key, open the studio from the link given in the Overview (you can also visit DI Studio via the link).
DI Studio
The DI Studio allows us to try and test as well as train models on our documents. You have three types of work that you can do in the studio:
- Document Analysis: The document analysis feature helps extract data from most documents. For example, extracting pieces of paragraphs, entities etc from documents:
Analysing documents to extract data
- Pre-built models: We may have a set of documents that fall in a particular category: invoices, passports, driving license data, receipts etc. For these types of pre-defined structures, the DI service has a set of pre-built models.
- Custom models: If none of the pre-built models suit our requirements to extract the data, we can choose the path of training a custom model on the seed data.
Pre-built and custom models for document analysis
We can try out the documents using the DI studio, but the expectation is that we’d be using the product via the APIs using SDKs or other means but not Studio.
DI Studio in Action
Let’s test the features of DI Studio. Click on the “Read” tab to test analysing a document. Here’s there are a few samples already provided by Azure, so you can check how the data has been extracted by processing the document.
Analysing and extracting data from a sample doc
Clicking on “Run Analysis”, the component will analyse the document and extract the text as provided in the output on the right hand side. The JSON structure output is also generated for our consumption and storing in a database should we need to. There’s a sample code, coded in Python, that helps you to invoke the APIs.
Similarly, you can experiment with other Document analysis models such as layout and general documents. I’ve uploaded the AI-900 Certification Study Guide and asked the DI module to run through through the document.
It managed to extract the text and tables:
Layout analysis on the document
We can also create custom query fields too: click on the “Query fields” button next to the “Run Analysis” and add the fields that you’d want:
Creating custom fields
The custom fields will be picked up automatically by the model and shown in the output (I do find this has a bit of hit-and-miss — most likely due to my understanding of how it works, I guess!)
Pre-built models
Similarly, you can find out how we can use pre-build models such as models that work on a specified format like receipts, invoices, bills, statements etc.
You can either experiment with the provided samples or upload your own documents to test and try.
Programmatic/API Access
While the studio helps us test and try the various models and features visually, the real power of DI comes when integrated with the application and invoked for document processing when and where needed.
For example, a bank may ask a user to upload a passport document for identity verification purposes. The document can then be processed by invoking the DI APIs which extracts the required info. The extracted results (typically in a JSON format) can be passed on to another API function to check the validity of the passport ID.
The following curl invocation asks the DI service to process an invoice document using a prebuilt-invoice model.
curl --location 'https://ai-doc-intelligence-mk.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice:analyze?api-version=2023-07-31' \
--header 'Ocp-Apim-Subscription-Key: --my--api--key--' \
--header 'Content-Type: application/pdf' \
--data '@/Users/mkonda/Downloads/invoice.pdf'
The location attribute is made of endpoint and the model:
{endpoint}/{model-id}:analyse
The API key is passed as Ocp-Apim-Subscription-Key attribute and the PDF as data attribute.
You can use the Postman to test it out too:
Invoking the API via Postman
The Postman helps test the API quickly: provide the URL with the POST type and the file as the raw type in the body of the request. Do make sure you add the required authentication and the content type via the headers:
API key is provided via the Headers
The response to this key will be 200 (if all goes as expected). You may be confused as to where the result is though. Once you submit your request, DI service will work on the processing of the document and will provide you a URL via the response headers once the request is completed (check the Postman figure). Look for the “Operation-Location” header in your response headers.
You can then invoke this Operation-Location URL to invoke the API to fetch the results. The results consists of JSON response:
Invoice results as JSON
The results are not in a JSON format — so the application can easily extract the required field to act on them or store them as fields in relational database (or dump in a NoSQL database as-is if working with a non-relational databases).
Imagine an application processing the files as and when are uploaded or batch processing them to extract the structured data and storing it in a database for further processing.
A design is in my head to develop an open-source framework but with paid Azure Document Intelligence service subscription. Do get in touch if you are interested in designing, developing and implementing this framework :)
We can also process a custom document by training a custom model; I’ll cover this in a short while if I can.
Wrap up
The Azure’s Document Intelligence service unlocks the intelligence from the reams of documents that exists in document databases. We can use the APIs to invoke the service seamlessly. That means the documents can be processed as part of our normal application thus enabling value to the business.
Stay tuned for the next article on AI on Azure series!
Me @ Medium || LinkedIn || Twitter || GitHub
Of course, if you like my work, catchup for a coffee BuyMeACoffee :)