What is the focus of machine learning research? What are the latest developments in AI+ Finance? …
It is understood that the workshop focused on financial intelligence. At the workshop, AI experts gave speeches on the application practice of financial intelligence, “small data”, data privacy security and other focal issues, helping the industry to solve the technical problems in AI + financial integration innovation.
After the workshop, a large number of experts and scholars remained at the site and exchanged ideas enthusiastically.
In terms of papers, the ICML Alipay AI technical team contributed a number of paper research results, including proposing a generative adversarial user model to solve the problem of reinforcement learning with small samples, and applying this method in the optimization of recommendation system; Distributed gradient sequential interpolation learning is introduced, which opens up a new direction on the basis of distributed reinforcement learning. The particle flow Bayes’Rule algorithm is proposed to improve the accuracy of high-dimensional Bayesian inference.
Below we select 3 of them to introduce to you and share the latest research of Alipay AI in the field of financial services:
Adversarial User Model for Reinforcement Learning BasedRecommendation System
Introduction: The application of reinforcement learning (RL) in the recommendation system can better consider the long-term benefits of users, so as to maintain the long-term satisfaction and activity of users in the platform. However, reinforcement learning requires a large number of training samples. In this paper, we propose to use GAN User Model (GAN User Model) as the simulation environment of reinforcement learning. Firstly, we conduct offline training in this simulation environment, and then carry out real-time strategy update according to the feedback of online users, so as to greatly reduce the demand for online training samples.
Nonlinear distributional gradient temporal difference learning
Introduction: We introduce distributional gradient temporal difference Learnig in this paper. In recent years, distributed reinforcement learning, such as DeepMind’s C51 algorithm, has attracted extensive attention in academic circles. Compared with the traditional reinforcement learning algorithm, distributed reinforcement learning takes into account the distribution information of long term reward, which makes the learning process more stable and the convergence speed faster. However, the convergence of distributed reinforcement learning is still difficult to be guaranteed after it is combined with neural network and off-policy learning. Therefore, we combine Distributional Mean Squared Bellman Error with gradient time series interpolation learning, and propose Distributional Mean Squared Bellman Error as our optimization objective function. This study provides a theoretical guarantee for distributed reinforcement learning and opens up a new research direction on its basis.
Particle Flow Bayes Rule
In high-dimensional problems, the calculation of posterior probability is a major challenge due to a series of calculation and precision problems brought by high-dimensional integrals. In addition, in many practical problems, observation data (observations) arrive in sequence, and Bayesian inference needs to be used repeatedly: the posterior probability obtained after observing some data can be regarded as the new prior probability, and then the new posterior probability can be obtained according to the new data. This problem requires algorithms that can perform fast and efficient Bayesian updating online without storing large amounts of historical data. To solve this challenging problem, we propose the Particle flow Bayes’ Rule, which is a Bayesian operator based on ordinary differential equations (ODE). We demonstrate the validity and generalization ability of particle flowBayes’ Rule obtained through meta learning training in several classical and high-dimensional experiments. Especially in high latitude problems, the proposed algorithm has obvious advantages over the existing algorithms in terms of accuracy and efficiency in posterior estimation.