Deep Reinforcement Learning Lab Report

Source: AAAI – 2020

Author: DeepRL

AAAI 2020 has received a total of more than 8,800 valid papers submitted, of which 7,737 papers entered the evaluation process, and the final number of included papers was 1,591, with an inclusion rate of 20.6%. In the list of accepted papers, 52+ papers are for intensive learning, with an admission rate of about 3%. Google Brain, DeepMind, Tsinghua University,UCL,Tencent AI Lab,Peking University, IBM, FaceBook, etc. There are essays on reinforcement learning by Sutton (No. 48) and rising stars. This paper deals with environment, theoretical algorithm, application and multi-agent. Here is the detailed list:

[1]. Google Research Football: A Novel Reinforcement Learning Environment

Karol Kurach (Google Brain)*; Anton Raichuk (Google); Piotr Sta ń czyk (Google Brain); Micha ł Zaj ą c (Google Brain); Olivier Bachem (Google Brain); Lasse Espeholt (DeepMind); Carlos Riquelme (Google Brain); Damien Vincent (Google Brain); Marcin Michalski (Google); Olivier Bousquet (Google); Sylvain Gelly (Google Brain)

[2]. Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Xiaojian Ma (University of California, Los Angeles)*; Mingxuan Jing (Tsinghua University); Wenbing Huang (Tsinghua University); Chao Yang (Tsinghua University); Fuchun Sun (Tsinghua); Huaping Liu (Tsinghua University); Bin Fang (Tsinghua University)

[3]. Proximal Distilled Evolutionary Reinforcement Learning

Cristian Bodnar (University of Cambridge)*; Ben Day (University of Cambridge); Pietro Lio University of Cambridge

[4]. Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

Jie Wu (Sun Yat-sen University)*; Guanbin Li (Sun Yat-­sen University); si liu (Beihang University); Liang Lin (DarkMatter AI)

[5]. RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

Nan Jiang (Tsinghua University)*; Sheng Jin (Tsinghua University); Zhiyao Duan (Unversity of Rochester); Changshui Zhang (Tsinghua University)

[6]. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye (Tencent)*; Zhao Liu (Tencent); Mingfei Sun (Tencent); Bei Shi (Tencent AI Lab); Peilin Zhao (Tencent AI Lab); Hao Wu (Tencent); Hongsheng Yu (Tencent); Shaojie Yang (Tencent); Xipeng Wu (Tencent); Qingwei Guo (Tsinghua University); Qiaobo Chen (Tencent); Yinyuting Yin (Tencent); Hao Zhang (Tencent); Tengfei Shi (Tencent); Liang Wang (Tencent); Qiang Fu (Tencent AI Lab); Wei Yang (Tencent AI Lab); Lanxiao Huang (Tencent)

[7]. The Effect of Reinforcement Learning on the Emergence of Multi‐Agent Systems

Nicolas Anastassacos (The Alan Turing Institute)*; Steve Hailes (University College London); Mirco Musolesi (UCL)

[8]. Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents

Felipe Leno da Silva (University of Sao Paulo)*; Pablo Hernandez-Leal (Borealis AI); Bilal Kartal (Borealis AI); Matthew Taylor (Borealis AI)

[9]. MetaLight: Value-based Meta-reinforcement Learning for Traffic Signal Control

Xinshi Zang (Shanghai Jiao Tong University)*; Huaxiu Yao (Pennsylvania State University); Guanjie Zheng (Pennsylvania State University); Nan Xu (University of Southern California); Kai Xu (Shanghai Tianrang Intelligent Technology Co., Ltd); Zhenhui (Jessie) Li (Penn State University)

[10].Adaptive Quantitative Trading: an Imitative Deep Reinforcement Learning Approach

Yang Liu (University of Science and Technology of China)*; Qi Liu (” University of Science and Technology of China, China”); Hongke Zhao (Tianjin University); Zhen Pan (University of Science and Technology of China); Chuanren Liu (The University of Tennessee Knoxville)

[11]. Neighborhood Cognition Consistent Multi‐Agent Reinforcement Learning

Hangyu Mao (Peking University)*; Wulong Liu (Huawei Noah’s Ark Lab); Jianye Hao (Tianjin University); Jun Luo (Huawei Technologies Canada Co. Ltd.); Dong Li ( Huawei Noah’s Ark Lab); Zhengchao Zhang (Peking University); Jun Wang (UCL); Zhen Xiao (Peking University)

[12]. SMIX( ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Chao Wen (Nanjing University of Aeronautics and Astronautics)*; Xinghu Yao (Nanjing University of Aeronautics and Astronautics); Yuhui Wang (Nanjing University of Aeronautics and Astronautics, China); Xiaoyang Tan (Nanjing University of Aeronautics and Astronautics, China)

[13]. Unpaired Image Enhancement Featuring Reinforcement-­Learning-Controlled Image Editing Software

Satoshi Kosugi (The University of Tokyo)*; Toshihiko Yamasaki (The University of Tokyo)

[14]. Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach

Jun Wang (University of Science and Technology of China)*; Hefu Zhang (University of Science and Technology of China); Qi Liu (” University of Science and Technology of China, China”); Zhen Pan (University of Science and Technology of China); Hanqing Tao (University of Science and Technology of China (USTC))

[15]. Model and Reinforcement Learning for Markov Games with Risk Preferences

Wenjie Huang (Shenzhen Research Institute of Big Data)*; Hai Pham Viet (Department of Computer Science, School of Computing, National University of Singapore); William Benjamin Haskell (Supply Chain and Operations Management Area, Krannert School of Management, Purdue University)

[16]. Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Liang Tong (Washington University in Saint Louis)*; Aron Laszka (University of Houston); Chao Yan (Vanderbilt UNIVERSITY); Ning Zhang (Washington University in St. Louis); Yevgeniy Vorobeychik (Washington University in St. Louis)

[17]. Decentralized Deep Reinforcement Learning for Large‐Scale Traffic Signal Control

Chacha Chen (Pennsylvania State University)*; Hua Wei (Pennsylvania State University); Nan Xu (University of Southern California); Guanjie Zheng (Pennsylvania State University); Ming Yang (Shanghai Tianrang Intelligent Technology Co., Ltd); Yuanhao Xiong (Zhejiang University); Kai Xu (Shanghai Tianrang Intelligent Technology Co., Ltd); Zhenhui (Jessie) Li (Penn State University)

[18]. Deep Reinforcement Learning for Active Human Pose Estimation

Erik Gartner (Lund University) *; Aleksis Pirinen (Lund University); Cristian Sminchisescu (Lund University)

[19]. Be Relevant, Non‐redundant, Timely: Deep Reinforcement Learning for Real‐time Event Summarization

Min Yang ( Chinese Academy of Sciences)*; Chengming Li (Chinese Academy of Sciences); Fei Sun (Alibaba Group); Zhou Zhao (Zhejiang University); Ying Shen (Peking University Shenzhen Graduate School); Chenglin Wu (fuzhi.ai)

[20]. A Tale of Reinforcement Learning with the Tightest Finite‐Time Bound

Gal Dalal (Technion)*; Balazs Szorenyi (Yahoo Research); Gugan Thoppe (Duke University)

[21]. Reinforcement Learning with Perturbed Rewards

Jingkang Wang (University of Toronto); Yang Liu (UCSC); Bo Li, University of Illinois at Urbana — Champaign *

[22]. Exploratory Combinatorial Optimization with Reinforcement Learning

Thomas Barrett (University of Oxford)*; William Clements (Unchartech); Jakob Foerster (Facebook AI Research); Alexander Lvovsky (Oxford University)

[23]. Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction

Vishal Jain (Mila, McGill University)*; Liam Fedus (Google); Hugo Larochelle (Google); Doina Precup (McGill University); Marc G. Bellemare (Google Brain)

[24]. Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Xian Yeow Lee (Iowa State University)*; Sambit Ghadai (Iowa State University); Kai Liang Tan (Iowa State University); Chinmay Hegde (New York University); Soumik Sarkar (Iowa State University)

[25]. Modelling Sentence via Reinforcement Learning: An Actor Critic Approach to Learn the comedy-words

MAHTAB AHMED (The University of Western Ontario)*; Robert Mercer (The University of Western Ontario)

[26]. Transfer Reinforcement Learning using Output-­Gated Working Memory

Arthur Williams (Middle Tennessee State University)*; Joshua Phillips (Middle Tennessee State University)

[27]. Reinforcement-­Learning based Portfolio Management with Augmented Asset Movement Prediction States

Yunan Ye (Zhejiang University)*; Hengzhi Pei (Fudan University); Boxin Wang (University of Illinois at Urbana- Champaign); Pin-Yu Chen (IBM Research); Yada Zhu (IBM Research); Jun Xiao (Zhejiang University); Bo Li (University of Illinois at Urbana — Champaign)

[28]. Deep Reinforcement Learning for General Game Playing

Adrian Goldwaser (University of New South Wales)*; Michael Thielscher (University of New South Wales)

[29]. Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning

Jianwen Sun (Nanyang Technological University)*; Tianwei Zhang ( Nanyang Technological University); Xiaofei Xie (Nanyang Technological University); Lei Ma (Kyushu University); Yan Zheng (Tianjin University); Kangjie Chen (Tianjin University); Yang Liu (Nanyang Technology University, Singapore)

[30]. LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-­Based Games

Leonard Adolphs (ETHZ)*; Thomas Hofmann (ETH Zurich)

[31]. Induction of Subgoal Automata for Reinforcement Learning

Daniel Furelos-­Blanco (Imperial College London)*; Mark Law (Imperial College London); Alessandra Russo (Imperial College London); Krysia Broda (Imperial College London); Anders Jonsson (UPF)

[32]. MRI Reconstruction with Interpretable Pixel-­Wise Operations Using Reinforcement Learning

wentian li (Tsinghua University)*; XIDONG FENG (department of Automation,Tsinghua University); Haotian An (Tsinghua University); Xiang Yao Ng (Tsinghua University); Yu-­Jin Zhang (Tsinghua University)

[33]. Explainable Reinforcement Learning Through a Causal Lens

Prashan Madumal (University of Melbourne)*; Tim Miller (University of Melbourne); Liz Sonenberg (University of Melbourne); Frank Vetere (University of Melbourne)

[34]. Reinforcement Learning based Metapath Discovery in Large-­scale Heterogeneous Information Networks

Guojia Wan (Wuhan University); Bo Du (School of Compuer Science, Wuhan University)*; Shirui Pan (Monash University); Reza Haffari (Monash University, Australia)

[35]. Reinforcement Learning When All Actions are Not Always Available

Yash Chandak (University of Massachusetts Amherst)*; Georgios Theocharous (“Adobe Research, USA”); Blossom Metevier (University of Massachusetts, Amherst); Philip Thomas (University of Massachusetts Amherst)

[36]. Reinforcement Mechanism Design: With Applications to Dynamic Pricing in Sponsored Search Auctions

Weiran Shen (Carnegie Mellon University)*; Binghui Peng (Columbia University); Hanpeng Liu (Tsinghua University); Michael Zhang (Chinese University of Hong Kong); Ruohan Qian (Baidu Inc.); Yan Hong (Baidu Inc.); Zhi Guo (Baidu Inc.); Zongyao Ding (Baidu Inc.); Pengjun Lu (Baidu Inc.); Pingzhong Tang (Tsinghua University)

[37]. Metareasoning in Modular Software Systems: On-­the-­Fly Configuration Using Reinforcement Learning

Rich Contextual Representations Aditya Modi (Univ. of Michigan Ann Arbor)*; Debadeepta Dey (Microsoft); Alekh Agarwal (Microsoft); Adith Swaminathan (Microsoft Research); Besmira Nushi (Microsoft Research); Sean Andrist (Microsoft Research); Eric Horvitz (MSR)

[38]. Joint Entity and Relation Extraction with a Hybrid Transformer and Reinforcement Learning Based Model

Ya Xiao (Tongji University)*; Chengxiang Tan (Tongji University); Zhijie Fan (The Third Research Institute of the Ministry of Public Security); Qian Xu (Tongji University); Wenye Zhu (Tongji University)

[39]. Reinforcement Learning of Risk-­Constrained Policies in Markov Decision Processes

Tomas Brazdil (Masaryk University); Krishnendu Chatterjee (IST Austria); Petr Novotny (Masaryk University) *; Ji ř I Vahala (Masaryk University)

[40]. Deep Model-­Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Qi Zhou (University of Science and Technology of China); Houqiang Li (University of Science and Technology of China); Jie Wang (University of Science and Technology of China)*

[41]. Reinforcement Learning with Non-­Markovian Rewards

Maor Gaon (Ben-­Gurion University); Ronen Brafman (BGU)*

[42]. Modular Robot Design Synthesis with Deep Reinforcement Learning

Julian Whitman (Carnegie Mellon University)*; Raunaq Bhirangi (Carnegie Mellon University); Matthew Travers (CMU); Howie Choset (Carnegie Melon University)

[42]. BAR -­A Reinforcement Learning Agent for Bounding-­Box Automated Refinement

Morgane Ayle (American University of Beirut – AUB)*; Jimmy Tekli (BMW Group/Universite de Franche-Comte-UFC); Julia Zini (American University of Beirut – AUB); Boulos El Asmar (BMW Group/Karlsruher Institut fur Technologie-Kit); Mariette Awad (American University of Beirut- AUB)

[44]. Hierarchical Reinforcement Learning for Open-­Domain Dialog

Abdelrhman Saleh (Harvard University)*; Natasha Jaques (MIT); Asma Ghandeharioun (MIT); Judy Hanwen Shen(MIT); Rosalind Picard (MIT Media Lab)

[45]. Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning

Liqiang Xiao (Artificial Intelligence Institute, SJTU)*; Lu Wang (Khoury College of Computer Science, Northeastern University); Hao He (Shanghai Jiao Tong University); Yaohui Jin (Artificial Intelligence Institute, SJTU)

[46]. Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning

Xiang Ni (IBM Research); Jing Li (NJIT); Wang Zhou (IBM Research); Mo Yu (IBM T. J. Watson)*; Kun-­Lung Wu (IBM Research)

[47]. Actor Critic Deep Reinforcement Learning for Neural Malware Control

Yu Wang (Microsoft)*; Jack Stokes (Microsoft Research); Mady Marinescu (Microsoft Corporation)

[48]. Fixed-­Horizon Temporal Difference Methods for Stable Reinforcement Learning

Kristopher De Asis (University of Alberta)*; Alan Chan (University of Alberta); Silviu Pitis (University of Toronto); Richard Sutton (University of Alberta) ; Daniel Graves (Huawei)

[49]. Sequence Generation with Optimal-­Transport-­Enhanced Reinforcement Learning

Liqun Chen (Duke University)*; Ke Bai (Duke University); Chenyang Tao (Duke University); Yizhe Zhang (Microsoft Research); Guoyin Wang (Duke University); Wenlin Wang (Duke Univeristy); Ricardo Henao (Duke University); Lawrence Carin Duke (CS)

[50]. Scaling All-­Goals Updates in Reinforcement Learning Using Convolutional Neural Networks

Fabio Pardo (Imperial College London)*; Vitaly Levdik (Imperial College London); Petar Kormushev (Imperial College London)

[51]. Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Tian Tan (Stanford University)*; Zhihan Xiong (Stanford University); Vikranth Dwaracherla (Stanford University)

[52]. Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Sanket Shah (Singpore Management University)*; Arunesh Sinha (Singapore Management University); Pradeep Varakantham (Singapore Management University); Andrew Perrault (Harvard University); Milind Tambe (Harvard University)

For a full interpretation of the paper, see Github: \

Github.com/NeuronDance…

Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.

Highlights from the past2019Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part I) note: To join our wechat group or QQ group, please reply "add group" to join knowledge planet (4500+ user ID:92416895), please reply to knowledge PlanetCopy the code

Like articles, click Looking at the