Abstract: Document understanding focuses on identifying and extracting key-value pair information from unstructured documents and output it as structured data. In the past, most of the work in information extraction has only focused on extracting the entity relationships of the text, so it is not suitable for direct document understanding.

This article is shared from huawei Cloud community “Paper interpretation Series 13: The Impact of Global Information on graph Network Document Parsing” by the author: a smile.

1. Background

Document understanding focuses on identifying and extracting key-value pair information from unstructured documents and outputting it as structured data. In the past, most of the work focused on extracting the entity relationship of the text and was not applicable to direct document understanding.

In ICDAR2019, participants are asked to extract key-value pair information from documents such as invoices and receipts. Therefore, this paper proposes a graph network structure that contains global information and combines visual information to complete the task of extracting key information from unstructured documents.

2. Network structure

In this paper, the document understanding task is transformed into the graph node classification task. For global and local information retrieval of text:

Use CLS to grab the classification information of the global text sequence, generate W0, and match it to each individual text (W1, W2… ,wn) in the same input vector. After BERT model, each element is encoded independently, so that the model has local and global information, and can also be embedding global and local text

For global and local information acquisition of images, similar methods are used, but global and local image features are captured based on CNN network

Text and image feature stitching: Concat image features and text features

Network construction:

Given a set of text segments in the document, a virtual global node is constructed as the information communication hub. In this way, every two non-adjacent nodes are also two-Hopneighbors, which reduces the loss of information communication and can directly output global information to local nodes.

Aggregating neighbors causes each node to update model parameters between two-Hop Neighbors via the activation function (Leaky-relu), And k-attention was used to improve the model’s ability (through multiple attention and then merge all attention mechanism)

Information extraction:

3. Experimental results

In alibaba tianchi competition data and on the effect.

Related ablation experiments: After the removal of visual features, it is evident that visual features can play an important role in the extraction of structured information in Tianchi data and SROIE. Similarly, deleting global nodes also reduces the accuracy of the model, which also verifies the importance of global connections in the graph structure.

Click follow to learn about the fresh technologies of Huawei Cloud