Amazon Cloud Technology Community Day is held in Shanghai, east China’s Zhejiang Province, June 26, 2018. Chief Developer Evangelist of Amazon Cloud Technology, Senior Data Scientist, Senior Application Scientist and Machine Learning Hero of Amazon Cloud Technology were all present to share and discuss the technical trend and practical projects of open source AI.
1. Yubo Wang: Amazon’s contribution and practice in open source machine learning
The concept of open source has been around since the 1980s, but in recent years, with the rise of machine learning and cloud computing, open source has become central to many developers’ conversations and its importance has increased significantly. Currently, four of the top five open source contributors are cloud vendors, and seven of the top 10 open source contributors are cloud vendors. Wang said cloud computing is an important driving force behind open source. Cloud computing leads open source forward, and open source promotes cloud computing further.
As a cloud computing service platform, Amazon, in line with the concept of user first, provides a series of integration and integration of cloud and open source tools to meet the needs of developers to use open source tools to carry out rapid production practice in the cloud. In addition, when developers want to implement new ideas through some new tools, Amazon will also take the initiative to build and contribute a series of open source code to help developers achieve a variety of needs.
According to Wang, the number of open source contributors and projects within Amazon’s cloud technology has been increasing year by year. Currently, Amazon has more than 2,500 open source repositories, covering data, analytics, security, machine learning and many other fields. Many projects revolve around open source, such as an open source analytics platform built on OpenSearch; Open source architecture based on container microservices. Amazon believes that the combination of the cloud and open source will empower developers more quickly and allow more interaction to help developers make the best use of open source in the cloud.
When it comes to the combination of open source and machine learning, Wang believes that it is not only important to pay attention to how open source leads the development of machine learning, but also more important to pay attention to the problems faced by developers in the actual production practice, so that more developers can learn to master open source technology and quickly build machine learning applications. He summarized Amazon’s efforts in building an open source machine learning ecosystem from four dimensions: product, research, empowerment, and community.
The first is products. There are a series of machine learning and artificial intelligence products on the Amazon cloud, many of which are built on the basis of open source projects. Amazon hopes to accelerate the rapid application of open source machine learning in production practices through these products.
The second is research. Amazon has many scientists engaged in artificial intelligence and machine learning research all over the world, who continue to make contributions in the academic field and publish many cutting-edge papers. Amazon hopes that these research can be combined with production practice, and quickly implemented to build a good environment for developers.
Amazon believes that artificial intelligence and machine learning should be in the hands of every developer. Through a series of products and capabilities, it can help everyone to get started and learn quickly, so that everyone can get more growth opportunities in open source and machine learning.
Finally, the community. By building the machine learning community, Amazon helps developers to have a deeper understanding of open source and machine learning, so that it can move forward and develop faster and better.
In view of these four points, Wang Yubo gave a detailed four-in-one introduction on the scene of Community Day.
Amazon’s machine learning offerings offer a complete stack of products and services in each area, from frameworks and platforms to SaaS applications, to help developers build quickly. All machine learning cloud services are built on a solid open source foundation built by Amazon.
Globally, Amazon is the preferred platform for developers to build applications using the open source framework TensorFlow and PyTorch. Amazon Sagemaker can help developers implement machine learning quickly. Amazon Sagemaker has two ways to extend machine learning, one is its own training script and the other is its own Docker container, both of which are very simple. Amazon Sagemaker itself uses a number of container technologies, but there is no particular need for Amazon Sagemaker users to understand or operate the underlying architecture. Developers can bring in their own training scripts, using almost the same code as in local or other environments, just passing parameters and generating a series of files, while pulling standard images from the container’s image repository. In this way, they can combine their own scripts with the container for fast and good training results. Amazon Sagemaker also supports bringing its own Docker container, integrating the scripts into its own container, publishing them in the container repository, and training them, which can also achieve very good results. For now, it’s very easy to use your own scripts. Developers can develop and test locally, conduct distributed training and deployment in the cloud, or make use of the functions of the cloud to quickly iterate, so as to build a better machine learning application.
In addition, Sagemaker itself has many capabilities, such as the automated tuning capability of Sagemaker, which can make rapid adjustments to superparameters, while the hosted SPOT approach can greatly save the cost of machine learning training models for developers.
Wang also introduced some of the open source machine learning projects initiated by Amazon.
The first is Gluon, an open source deep learning interface that makes it easier and faster for developers to build machine learning models without compromising performance. Amazon wants to help more developers quickly use leading algorithms and paper pre-training models through its toolkit and toolset. Amazon’s toolkits, GluonCV, GluonNLP, and Gluonts, reproduce SOTA results from top conferences in areas such as computer vision and natural language processing. Amazon is making these toolkits available to more customers and developers.
The second is the Deep Java Library, which many independent developers use for Deep learning development. Amazon hopes that with Deep Java Library, developers will be able to train and deploy machine learning using the Java language in a portable and efficient way. The Deep Java Library currently offers full engine support, as well as up to 70 pre-trained models.
In addition, Wang Yubo also introduced several other fields.
The first is Jupyter, which helps developers think with code and data, then build narratives around the code and data to convey those code – and data-driven insights to others. Amazon continues to optimize the Jupyter experience, such as offering laptop sharing for enterprise developers. Meanwhile, Amazon continues to contribute to the Jupyter community. Members of the Jupyter Steering Committee currently work at Amazon to help Jupyter further integrate open source and cloud.
The second is Amazon Sagemaker Clarify, which builds on an open source product to provide machine learning developers with more in-depth training data and models so they can identify and limit biases and interpret predictions.
The third is Penny Lane, which Amazon started to participate in at the end of last year. Penny Lane is now available in the cloud on Amazon Braket. Amazon hopes that the cloud will allow quantum computing to better integrate with machine learning.
Amazon also offers a variety of edutainment and hands-on tools to help you start your machine learning journey with open source solutions.
“Getting started is a crucial process for developers,” Wang said. “Through a series of technical guidance, technical guidance and technical lectures, Amazon can drive the overall developer community to flourish and develop, and stimulate a good atmosphere of technical discussion to provide developers with more help and influence.”
2. Wang Jinmin: Exploration and Research of Deep Graph in Artificial Intelligence
When it comes to the exploration and research of deep map in artificial intelligence, we must first clarify a concept — what is artificial intelligence? Wang believes that to achieve real artificial intelligence, there are two important points. The first is to understand why current artificial intelligence algorithms make mistakes, and the second is to explore the structural consistency between artificial intelligence algorithms and human brains.
“It is clear that the order of Chinese characters is not a shadow reading sound reading.” For example, when you read this sentence, only to find that the words in the present are all messed up. When people understand natural language, they do not understand it in a linear way, but in blocks to understand text. Many models, on the other hand, interpret text in a linear way.
From the perspective of image recognition, if the algorithm is used to identify a picture with a dog sitting on a motorcycle, it can only recognize that the picture itself is composed of a dog and a motorcycle, and can not obtain more structured information, while the human brain can feel the interesting picture.
Many data in life exist in the form of Graph structure, ranging from small molecules to large production and life. It is a very common requirement to complete machine learning tasks on the Graph.
In recent years, how to apply deep learning algorithms to graph data has become the focus of developers’ attention. Therefore, Graph Neural Networks (GNNs) were born. The so-called graph neural network is a kind of deep neural network used to learn the vector representation of points, edges or the whole graph, whose core idea is message passing. For example, to figure out which NBA team a person likes, you can use social media to know which team his friends like. If 80% of his friends like it, then he’s more likely to like it. When a point is modeled, information is collected from other neighboring points. This process is called information transfer.
The information of all adjacent nodes is collected together to make a cumulative sum. After a weighted cumulative sum message is obtained, the existing information of the node is updated through the update function. This is the most basic mathematical modeling of graph neural network.
Graph neural network is widely used in different fields.
Molecular medicine: the first is the prediction of molecular properties. Its input data is the molecular structure diagram. Then, through message passing modeling, graph neural network is used to obtain vector representation, which is input to the downstream classifier to judge the nature and toxicity of chemicals. The second is the generation of drug molecules. First, a coding model is constructed, and then it is transformed into a vector representation through the graph neural network. At the same time, some guidance is added to generate molecules that can meet the properties we need. Third, drug relocation. In this regard, Amazon has built a drug knowledge map DRKG, which is used to represent the relationship between drugs, disease proteins, compounds and other objects. When this data is modeled using the graph neural network, it is possible to predict the connections between the drug and the disease protein nodes, thereby predicting potential drugs for new diseases. At present, of the 41 drugs recommended by the graph neural network modeling, 11 have been used in clinical practice.
Knowledge Graph: Graph neural network can be used to complete many downstream tasks in the knowledge graph. Such as knowledge completion, task node classification and so on.
Recommendation system: The mainstream recommendation system is mainly based on the interaction data between users and commodities. If user A buys A product, the system leaves A purchase record. Through data analysis, if user B’s purchase record is similar to that of user A, then user B will also be interested in the product that user A buys with A high probability. At present, the graph-based neural network recommendation system has been implemented commercially.
Computer vision: input the scene image, model through the neural network of the picture, add the picture generator at the end, through this scene image can be reversed to generate a better picture.
Natural Language Processing: Graph structures are also ubiquitous in natural language processing. For example, TREELSTM, the sentence itself is not a linear structure, but has a grammatical structure. The use of the sentence grammar tree structure for training, a better analysis model can be obtained. And what’s hot right now is “Transformers,” another variant of Deep Graphic.
Graph neural network whether in academia or industry, there are some very good landing schemes. But there are many problems to be solved. If the scale is getting larger, how to model? How to extract and process structured data from unstructured data? This requires good tools to develop the model.
Using traditional deep learning frameworks (TensorFlow/ PyTorch /MXNet, etc.) to write graph neural networks is not easy. Message passing computation is a kind of fine-grained computation, while tensor programming interface needs to define coarse-grained computation, and the difference between coarse-grained and fine-grained makes it very difficult to write graph neural network. Amazon has developed DGL as a bridge to this challenge. Wang Jinmin introduced DGL from three aspects: programming interface design, low-level system optimization and open source community construction.
The first is programming interface design. Programming with the concept of graphs, the core idea is based on graphs. According to Wang, developers should first understand that graphs are “first class citizens” of graph neural networks. The “first class citizen” means that all DGL functions and NN modules can accept and return graph objects, including the core messaging API.
The second is the design optimization of the underlying system. Other graph-neural network frameworks (such as PyTorch Geometry (PYG)) often use Gather/Scatter primitives to support message transfer calculations, which generate a large number of redundant message objects and consume a large amount of memory bandwidth. However, DGL uses efficient sparse operator to accelerate graph neural network, which is 2~64 times faster than PYG and can save 6.3 times of memory, and is very friendly to large graphs.
Finally, Wang shared his experience on the construction of open source community. He mainly shared the following lessons.
First, code isn’t the only thing that matters. Documentation is half the size of an open source project. Amazon has designed different levels of documentation. For beginners, it has 120 minutes to get started with DGL. Just download it and run it, you can learn how to train hand by hand. For advanced users, there is a user guide, which covers design concepts, and a DGL interface manual, which takes the user from novice to expert in a step-wise way.
Second, the open source community needs to have a rich sample of GNN models. Community development is very fast, the response speed to keep up with the community development, need GNN to have many different application scenarios, through the model to cover them together. At present, there are about 70 classic GNN model examples in DGL, covering various fields and research directions.
Third, we need to focus on community interaction. Amazon has set up many community activities to organize developers to communicate with each other, such as regular GNN user group sharing meetings, inviting cutting-edge scholars or developers in academia and industry to share the achievements of GNN field, etc. In addition, user forums, Slack, and WeChat groups provide different channels for people to communicate.
3. Lei Wu: Application and Implementation of Large-scale Machine Learning in Computational Advertising
As a provider of advertising services to thousands of customers, FreeWheel is committed to creating a unified transaction platform that integrates buyers and sellers, connects media and advertisers, and provides comprehensive, quality, and cross-screen computing advertising services.
From the perspective of marketing appeal and purpose, computational advertising is divided into brand advertising and effect advertising. In the field of brand advertising, FreeWheel will use machine learning to calculate the inventory forecast and inventory recommendation of advertising. In the field of performance advertising, when FreeWheel participates in the market as the principal of SSP traffic, it will use machine learning to optimize the system. When FreeWheel participates in the market as the principal of DSP advertiser, it will use machine learning to build a prediction model based on the historical bidding records. The model can determine the win rate based on the price, or give the win rate to recommend the corresponding price. Combined with advertising inventory forecast, it can flexibly acquire the maximum ROI of traffic purchase according to the fluctuation of traffic and price in the market.
Inventory prediction plays an important role in the field of computational advertising. Whether it is brand advertising or effect advertising, inventory prediction lays a solid foundation for supply and demand planning and bidding strategy. The so-called inventory forecast refers to the prediction of the inventory of advertising inventory for a period of time in the future on different directional conditions. For advertisers, the biggest appeal is to reach the most relevant users with the lowest advertising budget. Therefore, we need to group different orientation conditions, such as gender, age, region and other dimensions, and then make predictions.
There are many directional conditions used to describe traffic in the field of computational advertising. The combination of different dimensions is the Cartesian product, and the number of combinations will explode exponentially with the increase of the diversity of dimensions and dimensions themselves. If there are 1 million combinations, then there are 1 million timings to predict. With traditional methods such as ARIMA, millions of models need to be trained and maintained, which is an unrealistic amount of engineering. In addition, in the actual scene, 2160 units in the future need to be predicted with the granularity of hours. For such a long time sequence, it is a great challenge to ensure its accuracy and prediction efficiency. In order to predict 2160 time units, in order to ensure accuracy, at least the same time length needs to be traced back. In FreeWheel, the number of newly added advertisement release logs every day is at the level of 1 billion, and the overall data volume is very large.
In summary, Wu Lei believes that inventory forecasting mainly faces four challenges, which are dimensionality explosion, engineering complexity, ultra-long time sequence and massive data samples.
In order to deal with these four challenges, FreeWheel designs and implements a customized depth model. This model is designed based on Wide and Deep proposed by Google in 2016.
First of all, in view of dimension explosion and engineering complexity, Freewheel extracts directional conditions and corresponding time sequence by using Wide and Deep respectively, so that a single model can cope with millions of different time sequence. Only training and maintenance of one model can greatly reduce the complexity of the project.
Secondly, in order to deal with the problem of ultra-long time sequence, Freewheel designed the loss function of Element Wise, which made the back propagation of 2160 time units independent of each other and independent of each other.
Finally, in view of the challenge of massive data, FreeWheel chose the Amazon Sagemaker service provided by Amazon Cloud Technology, and migrated its business from the data center to Amazon Cloud Technology. Compared to building and maintaining a distributed environment independently, this saves time and energy, said Wu. “This is in line with FreeWheel’s philosophy of leaving professional work to professional people,” he said.
For the effect of the model, the design and tuning of the model is of course important, but for the energy and time invested in the whole assembly line, it basically conforms to the 2/8 law. In the actual application and landing, 80% of the time and energy is usually spent on processing data, preparing features and training samples.
FreeWheel mainly uses Apache Spark to do sample engineering, feature engineering and related data processing. Wu Lei briefly introduced this process.
For the timing problem, the first thing we have to face is the problem of sample complement. User behavior is often not continuous in time, so if it is reflected in a time sequence, it will be found that some time is missing. At this time, it is necessary to make up the samples. Freewheel’s solution to this problem is to prepare all the combinations in advance and set the advertising exposure to zero at all time periods. Then from the online log summary of different combinations under different periods of “positive samples”, then, as long as the two tables do a left join, you can achieve the desired business effect. When the two tables were connected using Spark, the performance was very poor, taking nearly 7 hours on a Spark cluster of 10 EC2 machines. To reduce the execution time, the FreeWheel team tuned Spark’s performance by using hashes instead of the large and numerous Join Keys. After tuning, the execution time dropped to less than 20 minutes on the same cluster size.
After obtaining time series samples, feature engineering is needed. Feature engineering is mainly divided into two parts. The first part is to use the Spark Window operation to make a Window sliding operation of the Impression sorted by hours in advance, so that the timing samples are really created. The second part is the feature generation, such as the generation of various time features according to the time stamp. Since the data is ultimately fed into the TensorFlow deep model, all fields need to be encoded in advance.
After the sample is ready, the next step is model training and reasoning. First, training. In order to give consideration to model effect and execution efficiency, Freewheel referred to the idea of transfer learning, trained the model in advance with mass data to ensure model effect, and then fine-tuned the model parameters with incremental data every day. Secondly, inference. Because the model needs to serve different downstream, and some of them need batch prediction results, it is divided into the following four training and reasoning tasks in terms of task types.
After the model is put online, the effect can be guaranteed to control about 20% of the finest-grained MAPE, and less than 10% of the aggregated MAPE. In terms of execution efficiency, the time of offline cold start, that is, pre-training model, is 2 hours, the actual incremental training only takes 10 minutes, and the batch reasoning can be completed in 5 minutes.
4. Jian Zhang: Practical Application of Graph Neural Network and DGL
As a senior data scientist at Amazon Cloud Technology, an important part of Dr. Jian Zhang’s work is to use graph neural network and DGL as tools to help customers solve core business problems and enhance business value in actual customer scenarios. In this sharing, he introduced the challenges and thoughts of Graph Neural Network and DGL in the landing project from four aspects: data, model, speed and interpretation.
Does your diagram contain enough information? In the academic circle, many scholars will use open data to build models and enhance algorithms. The most commonly used datasets in the field of graph neural network research are CORA, Citeseer and PubMed. These graphs are usually highly connected, with nodes of the same kind clustered together. When these graphs are used to build models, the results of the graph neural network often perform well. However, in actual business scenarios, limited by the means of data collection, the way of data storage and the ability of data processing, the graph data constructed is sometimes very sparse, which leads to a lot of effort and time for model tuning, but the effect is not ideal. If the graph connectivity provided by the customer is so low that no matter what the graph neural network model is used, they will eventually degenerate into a common MLP. In addition, the business graphs provided by customers often have very little label data. In a graph of hundreds of millions of points, there are only a few hundred thousand nodes with labels, accounting for only 0.01% of the label data. This makes it difficult to find other tagged points through one tagged point to build a connection, thus greatly reducing the effectiveness of the graph neural network.
There is a saying among data scientists that data characteristics set an upper limit on model performance, and models just wirelessly approach that ceiling. Better to focus on the data than on the models. Since the information of the graph determines the upper bound, right? So what is the information of the graph? How do you measure “information”? Can information values guide GNN? Do you want to draw? These problems are often the ones that machine learning practitioners and even development engineers have to solve. Zhang Jian put forward these problems and hoped that everyone could pool their wisdom to solve them.
In what cases is the GNN model more advantageous? “I know there are various models for your graph neural network. What model would you like to use for our graph?” Industrial clients once asked Dr. Zhang Jian. And that’s a hard question to answer. First of all, the design of the model space is much larger than the option, second, different business scenarios corresponding to different business requirements, the inside of the business scenario model design or model selection for specific business, it is not easy to determine, in addition, a DGL at the core of the development mode is a messaging (MP), in the field of class diagram, and some problems have to be implemented without MP. We also see that in the field of graph machine learning, there is still no model like GPT in the field of NLP, which can solve most problems quickly.
Zhang Jian said, the heart is far more than these, but the customer directly questioned: “Dr. Zhang, you see our XGBoost and other models are better than this GNN effect ah!” Once there was a customer in the financial industry who used the knowledge graph of the financial industry to obtain various relationships among customers. Then he directly used LightgBM to directly capture the neural network model after combining the thousand-dimensional features. Although the graph neural network model surpasses the LightGBM model of this client through some subsequent technologies, it still leaves a lot of space for thinking. For example, how is the graph neural network model better than the traditional machine learning model? When is it better?
Zhang Jian believes that most traditional machine learning models are based on features, while in real business scenarios, not every point or feature can be obtained. Especially, with the enhancement of privacy protection regulations, big data regulation is becoming more and more strict, and data collection is becoming more and more difficult. But for the graph neural network model, although there is no feature, it can still establish the correlation relationship, which is the advantage of the graph neural network model.
Graph neural network model and traditional machine learning model are not either-or relationship. They need to decide how to choose according to business scenarios and business problems, and can even be combined to solve problems. What are the applicability of the different GNN models? How do you use the point/edge feature? Do I have to use GNN? How do I combine GNN with other models? Zhang Jian leaves these questions to ponder.
Graph models can make real time inferences? After the model has an effect, can you go online to make real-time inferences is a question customers often ask? The problem is twofold. There is a correlation between the data within the graph structure. Therefore, compared with traditional CV and NLP, data points are not independently and idenciously distributed. When doing graph inference, there are two modes, least squares and Inductive. At the training stage, the nodes/edges to be predicted already exist in the graph and the training node can “see” them. The problem with this mode is that when the prediction is needed, the nodes/edges must already exist and the graph has already been constructed. There is almost no way to achieve real-time. Because in order to be real time, the model has to deal with points in the future. In Inductive mode, the node to be predicted will not be shown on the diagram during the training period. Only when the node is inferred and used on a diagram can it be seen. When using the Inductive mode to infer invisibility, there are two situations. The first is to make batch prediction, such as anti-fraud, using the data of the past seven days to build a graph data training model. When detecting the user behavior that happens tomorrow, it is necessary to combine the data of tomorrow and the data of the previous seven days to form a graph, and then use the trained model to infer. This is batch inference, it’s not real-time inference. In order to achieve real real-time inference, it is necessary to add the node/edge that needs to be predicted to the existing graph in real time, and extract its n-hop subgraph to the trained model for inference.
According to Zhang Jian, not only the graph community, but also the whole machine learning community, including the big data community, have not designed real-time (such as streaming) graph data storage, extraction and query methods for graphs. The existing graph databases are often not fast enough to add and search, especially when a point/edge is used as the center point/edge for sampling, and the sampling speed of graph databases is not fast enough to meet the need for real-time inference. There is also no proven approach to real-time inference architecture, which is an issue that needs to be addressed and a big opportunity for developers.
How to explain the graph model results? Once the model went live, one of the problems faced was how to interpret the results of the model. You can see some research on this issue in academic circles, but you rarely see this kind of discussion in industry.
For example, after using the graph model to get a prediction for a node, the business person asks why? Tell him that because its “neighbors” have the most impact on it, the business people will not be able to accept that.
In addition, although the graph neural network model can identify some patterns through the graph structure, the points are all with features, which are finally some real numbers. After a series of linear transformations and nonlinear changes, the relationship between them has gone far beyond human cognition of cause and effect. How can the results of the graph model be interpreted? There is a long way to go for developers.
The landing of the map neural network faces multiple challenges, Zhang said, which are like supporting a rocket to the moon. The data is the fuel, the model is the engine, the problem with all the data pipeline and implementation architecture is the overall rocket design, and the interpretation of the model is like the need for a flight control center. Only to solve these four aspects of the problem, the rocket can really fly to the moon.
5. Write it at the end
Over the years, Amazon has accumulated many projects and practical experience in the field of artificial intelligence, and has been committed to co-creating with developers around the world, hoping to bring new vitality to the field of artificial intelligence. The Amazon Cloud Technology Summit China Shanghai will be officially opened on July 21. The conference will be titled “Building a New Pattern, Reshaping the Cloud Era”, and share the story of reshaping and building the cloud era with the leading technology practitioners in the cloud computing industry. At the same time, the Shanghai station is just the spearhead of this summit. Amazon Cloud Technology Summit China will continue to be held in Beijing in August and Shenzhen in September.
The summit covers more than one hundred technical sessions, with technical sub-forums in the field of artificial intelligence. It will focus on the construction of databases, big data and intelligent map storehouses and bring you hands-on operation, technical architecture and other content. At the same time, it will give you technical interpretation based on some customer cases and practices. In addition, the site also has a special open source sub-forum, will invite many big names to bring wonderful sharing for you. Scan the QR code below to learn more about the summit!