This article was originally published by AI Frontier.
dwz.cn/7nBdQV


This article is published with authorization from the Afandi Research Institute


Wang Lei, ZHANG Dongxiang, GAO Lianli, SONG Jingkuan, GUO Long, SHEN Hengtao

“Augmented learning is very similar to human learning, and DeepMind has applied it to scenarios like AlphaGo and Atari games. As a leader in the field of intelligent education, Avon Research Institute for the first time proposed an automatic arithmetic word problem solver based on DQN (Deep Q-Network), which can transform the process of solving word problems into Markov decision process and make use of the good generalization ability of BP neural Network. Store and approximate Q values of state-action pairs in enhanced learning. Experiments show that the algorithm performs well on standard test sets, increasing average accuracy by nearly 15 percent.”


The research background

The history of automatic solving mathematical word problems (MWP) can be traced back to the 1960s, and it continues to attract the attention of researchers in recent years. Automatic solvers Applied math problems first map human-readable sentences into logical forms that machines can understand, and then reason. This process can not be solved simply by pattern matching or end-to-end classification. Therefore, the design of applied mathematical problem solver with semantic understanding and reasoning ability has become an indispensable step on the road to general artificial intelligence.

For mathematical word problem solver, given a mathematical word problem text, not simply through the way of text question and answer end-to-end training, so as to directly get the solution, but through text processing and numerical reasoning, to get its solution expression, so as to calculate the answer. Therefore, this task not only involves in-depth understanding of the text, but also requires the solver to have strong logical reasoning ability, which is also the difficulty and emphasis in the research of natural language understanding.

In recent years, researchers have tried to solve mathematical word problems automatically by designing algorithms and writing solving systems from different perspectives, including template-based methods, statistics-based methods, expression tree-based methods, and deep learning-based model generation methods. At present, in the field of solving mathematical word problems, the training data set is not enough, the solving algorithm is not robust, the solving efficiency is not high, the solving effect is not good and so on. Due to the math problem itself needs to have enough natural language understanding, for digital, semantics, common sense strong reasoning ability, however, most of solving method and by human intervention, generality is not strong, and with the increase of the complexity of the data, most of the effect of algorithm fell sharply, so to design a solution on the efficiency and effectiveness are automatic solver of good performance, Is both difficult and very important.

Related work

Arithmetic word problem solver:

As an early attempt, the method based on verb classification and state transition reasoning can only solve the problem of addition and subtraction. In order to improve the solving ability, a large number of mapping rules are designed based on the label method to map variables and numbers into logical expressions for reasoning. Because of too much manual intervention, its expansion is difficult.

Based on the expression tree method, it tries to identify the relevant numbers, and classifies the operators between the number pairs, and constructs the solvable expression tree from bottom to top. In addition, some restrictions such as ratio units are considered to further ensure the correctness of the constructed expression. The equality tree-based approach takes a more violent approach by enumerating all possible equality trees through integer linear programming. The tree-based methods are faced with the exponential increase of solution space as the number of numbers increases and decreases.

System of equations word problem solver:

At present, the method of solving equations word problems is mainly based on template. It needs to classify the text into predefined equations templates, infer the permutation and combination of unknown slots by artificial features, and fill the slots with identified numbers and related noun units. The template-based approach is highly dependent on data, and its performance deteriorates dramatically when the number of topics corresponding to the same template decreases or the complexity of the template increases.

The main contributions of this paper are as follows:

  1. The first one attempts to design a general framework for automatic solution of mathematical word problems using deep reinforcement learning
  2. Aiming at the application problem scenario, the corresponding state, action, reward function and network structure of deep Q network are designed.
  3. The proposed method is verified on the main data sets of arithmetic word problems, and good results are obtained in both efficiency and effectiveness.

Plan to introduce

Mathematical word problem solver based on deep Q network

The framework proposed in this article is shown in the figure above. Given a mathematical word problem, firstly extract the relevant numbers used to construct the expression tree using the numerical pattern, and then adjust the order of the extracted relevant numbers according to the rules formulated by reordering. For example, for “3+4*5”, we want to preferentially calculate 4*5. The number 5 here, the corresponding text segment is “5 yuan per hour”. Obviously, the unit of the number “5” is “yuan/hour”. When the unit of the number “4” is “hour” and the unit of the number “3” is “yuan”, in this case, adjust 4 and 5 to the first digit sequence. Then, build the expression tree with the sorted digit sequence from the bottom up.

Firstly, according to the information of number “4” and number “5”, each other’s information and the relationship with the problem, the corresponding features are extracted as the states in the enhancement learning component. This feature vector is then used as the input of the forward neural network in the deep Q-network to obtain the Q values of “+”, “-“, “reverse”, “*”, “/”, and “reverse”/” for the six actions. The appropriate operator is selected as the current action according to the Epsilon-greedy. The numbers “4” and “5” start building the expression tree based on the action currently taken. Next, repeat the process of the previous step based on the numbers “4” and “3”, or the numbers “5” and “3”, to build the expression tree from the smallest common ancestor of the operator numbers. Until there are no more relevant numbers, the end of the construction. The design of each component of the deep Q network will be described in detail later.

State: For the current number pair, according to the number pattern, extract a single number, between the number pairs, three kinds of features related to the problem, and whether the two numbers have participated in the construction of the expression tree, as the current state. The characteristics of single digit, number pair and problem correlation help the network to select the correct operator as the current action. Whether a number participates in the construction of the expression tree indicates the hierarchical position of the current number pair in the current expression tree.

Actions: Because this article deals with simple arithmetic word problems, we only consider the four operations of addition, subtraction, multiplication and division. When building a tree, for addition and multiplication, the different order of numbers between two numbers will not affect the result of the calculation, but the different order of subtraction and division will lead to different results. It is necessary to add reverse subtraction and reverse division because we want to determine the order of numbers. Therefore, a total of six operators, namely addition, subtraction, multiplication, division, reverse subtraction and division, are required to be learned in the deep Q network.

Reward function: In the training stage, the deep Q network selects the correct action according to the current two numbers and obtains the correct operator. The environment will feedback a positive value as a reward, or a negative value as a punishment.

Parameter learning: in this paper, a two-layer forward neural network is used to calculate the desired Q value in the deep Q network. The network parameter θ will update the learning based on the reward function of the environment feedback. In this paper, empirical replay memory is used to store transitions between states, and batch samples (S, A, S ‘,r) are taken from the empirical replay memory to update the network parameter θ. The loss function of the model is as follows:

The gradient value of the loss function is used to update the parameters to narrow the gap between the predicted Q value and the expected target Q value, and the formula is as follows:

The algorithm flow is as follows:


The experiment

This paper adopts AI2, IL and CC data sets for arithmetic word problems. AI2 has 395 questions, which contain unrelated numbers and only involve addition and subtraction. IL had 562 questions, which contained unrelated numbers and involved only a single step of addition, subtraction, multiplication and division; CC has 600 questions, which do not contain irrelevant numbers and involve two steps of addition, subtraction, multiplication and division.

The accuracy of the three data sets is shown below:

By observing the above experimental results, it is found that the method proposed in this paper achieves the best effect on AI2 and CC data sets. ALGES performed well on IL, but poorly on AI2 and CC datasets, which proved that our method has better versatility. The unit dependence graph proposed by UnitDep has no obvious effect on AI2 data set with only addition and subtraction operations. The Context feature added by UnitDep has an obvious effect on CC data set, but the effect is significantly reduced on AI2 data set, which shows the limitations of artificial features. For the method proposed in this paper, the improvement effect of reordering on CC data sets is obvious. Since AI2 only involves addition and subtraction operation, and IL only involves single step operation, the effect remains unchanged on these two data sets.

In addition, breakpoint analysis of single-step and multi-step is also carried out in this paper. Experimental results show that the method proposed in this paper performs very well in multi-step, and the experimental results are shown as follows:

The running time is shown below:

By observing the time required for solving a single problem, we can find that the multi-step operation data set CC obviously consumes more time. ALGES takes the most time by enumerating all possible candidate trees. The method proposed in this paper is only second to the efficiency of SVM to do the operator, and the relevant number classification of ExpTree.

The trend of average rewards and accuracy is shown below:


conclusion

In this paper, an enhanced learning framework for solving mathematical word problems is proposed for the first time.

In the future, we will continue to design automatic solvers for mathematical word problems along the line of deep learning and enhanced learning to avoid too many artificial features. At the same time, we try to solve equations word problems on a larger and more diversified data set.

Thesis Title: MathDQN: Using Deep Reinforcement Learning to Solve Arithmetic Word Problems

MathDQN: Solving ArithmeticWord Problems via Deep Reinforcement Learning

Paper URL:

Cfm.uestc.edu.cn/~zhangdongx…

Team: Afan Research Institute, University of Electronic Science and Technology of China, Peking University

Wang Lei, ZHANG Dongxiang, GAO Lianli, SONG Jingkuan, GUO Long, SHEN Hengtao

For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.