I made a website for sharing machine learning content called “Don’t bother Python”. Recently, a lot of friends left comments on the website saying that they would like to see some practical tutorials on the knowledge they will learn. So my first reaction was to do an actual reinforcement learning and make the robot arm “learn by yourself”. So I wrote a live broadcast of my robotic arm here. Haha, but there is no subscription notification for the new python tutorial, seeing that I have a lot of fans on Zhihu, I shared the beginning of “mechanical arm” on my website on Zhihu.
And just so you know, the “Building a Robotic arm from scratch” tutorial includes.
- Building structure
- Write static environment
- Write dynamic environment
- Add reinforcement learning algorithms
- Perfect the test
This zhihu, which is the following, is the first of the five chapters above.”Building structureThe following four sections will not be written in Zhihu. If you are interested, you can view them directly in the link above.
Why do this practice
The main purpose of this practice is to let us live learning and use, from zero to build a reinforcement learning framework. Previously, we learned a lot about reinforcement Learning in the series of reinforcement Learning tutorials. We learned how various algorithms should be used, from the simplest Q-learning to DQN combined with neural networks, to DDPG for continuous action, and A3C and DPPO for distributed training. But we never really tried it out, because most of the time in that series we just focused on the algorithm itself. However, it is also important to set up simulation environment and adjust parameters. So we’re going to do that in this series, to really get you started on reinforcement learning.
How to make it
The practice is very simple, I use my own training code that I wrote a year ago to teach the robot arm to reach a certain preset point.
But this time, I’ve optimized the code structure to show you the exercise THAT I’ve done so that you can step by step know what to think about when you’re doing reinforcement learning, and how to create a reasonable environment. So I’m going to explain this from the following aspects.
Code master structure
When doing reinforcement learning, we had better plan how to break down the task first. In general we try not to put all of our code (environment, reinforcement learning algorithm, learning main loop) in one script. Splitting it into three scripts will be more efficient, easier to manage, and less distracting. This is how I’ve been showing you in my reinforcement learning series.
Specifically, these three aspects of the script could look like this:
- Environment script (env.py)
- Reinforcement Learning Script (RL.py)
- Main loop script (main.py)
We will import the environment and reinforcement learning methods in the main loop script, so the main loop script connects the two together. If you look at the code for this tutorial, you will see that I have packaged each step separately, part1, Part2… All three of the above script files are available in. We will add the necessary parts one by one in each part.
In this section, we’ll start with the basic main.py. This involves the main loop of the program, which is also the learning part. How can the learning framework be simplified as follows? I adopted the form of GYM module. Therefore, if you have used gym, you will find it very familiar.
Env import ArmEnv from part1.rl import DDPG MAX_EPISODES = 500 MAX_EP_STEPS = Env = state_dim a_dim = env.action_dim a_bound = env.action_bound Rl = DDPG(a_dim, s_dim, a_bound) # for range(MAX_EPISODES): S = env.reset() # For j in range(MAX_EP_STEPS): Env.render () # render a = rl.choose_action(s) # rl Rl. Store_transition (s, a, R, s_) if rl. Memory_full: Rl. Learn () # memory bank full, start learning s = s_ # change to next roundCopy the code
At this point, we know that we must have several functions and attributes in rl.py and env.py.
- rl.py
rl.choose_action(s)
rl.store_transition(s, a, r, s_)
rl.learn()
rl.memory_full
- env.py
env.reset()
env.render()
env.step(a)
env.state_dim
env.action_dim
env.action_bound
With these guidelines in hand, we can plan ahead in rl.py and env.py. So you can create another script for env.py and write the following ArmEnv class. Then add the functions mentioned above.
# env.py
class ArmEnv(object):
def __init__(self):
pass
def step(self, action):
pass
def reset(self):
pass
def render(self):
pass
Copy the code
Then create an rl.py script that holds the rL methods you want to use. Because I want the arm environment to be a continuous action environment (the Angle of the robot rotating the arm is a continuous value), I will use the DDPG algorithm. However, if the environment you want to set up is a discrete action (for example, the robot can only select up, down, left, and right keys), you may need to choose a different RL algorithm and react differently to the environment.
# rl.py
class DDPG(object):
def __init__(self, a_dim, s_dim, a_bound,):
pass
def choose_action(self, s):
pass
def learn(self):
pass
def store_transition(self, s, a, r, s_):
pass
Copy the code
Now that we have these frames, our main structure is complete, and we can start adding to it. See how to set up a simulation environment.
Actual combat: Build and train robot arms from scratch
- Building structure
- Write static environment
- Write dynamic environment
- Add reinforcement learning algorithms
- Perfect the test