Markov process-Markov reward process-Markov decision process

An overview of the

Markov Decision Processes (MDPs) is a mathematical description of reinforcement learning problems.

The environment is required to be fully observed.

Markov sex

“Just know that the present, future and past are conditionally independent, and you can throw away all information from the past.”

Definition: if the state at time TIf the following equation is satisfied, the state is calledMarkov state, that is, the state satisfies Markov property

Note:

stateContains all historical information, i.eAll previous information can be displayed in this state(It can replace all the previous states)
Therefore, the environment is required to be fully observed (if it is partially observed, the state information is missing).
Whether or not markov properties are satisfied is closely related to the definition of states

Example:

Playing chess
Tetris

With the Markov state:

State transition matrices can be defined
Ignore the time and focus on the present moment

Note: Whether a state satisfies Markov property is closely related to the definition of state.

State transition matrix

State transition probability

The probability of state transition refers to the probability of skipping from a Markov state S to its successor state S ‘. It’s the conditional probability distribution of the current state.

State transition matrix

If the states are discrete (finite) :

All the states form rows
All subsequent states form a column,

Get the state transition matrix

Is the number of states
Each row adds up to 1

State transition function

If the number of ** states is excessive, or infinite (continuous state) **, it is appropriate to use the function form of the uppermost expression in this section.

At this point,

Markov process

define

A Markov process (MP) is a random process without memory, that is, a sequence of Markov states.

A Markov process can be defined by a binary

: indicates the state set
: indicates the state transition matrix

Usually assume thatIs the existence and stability of whenWhen unstable, online learning, fast learning and other methods are adopted

An example of a Markov process

There are two termination states in markov process:
- Time to stop
- State of termination

(Episode)

Definition: reinforcement learning, from the initial stateTo terminate stateSequence process.

Markov reward process

define

On the basis of Markov process, markov reward process is obtained by assigning different reward values in the transfer relation.

Markov Reward Process (MRP) consists of a quad

S: set of states
: State transition matrix
: reward function,Describes the reward in state S,
: Attenuation factor

Return value

Bonus value: Evaluation of a status
Return value: The evaluation of a segment

Return value) is the cumulative decay reward starting at time T

Value functions in MRPs

Why value functions? The return value is the result of a fragment, there is a large sample bias and the return value is labeled T, and the value function is concerned with the state S

An MRP value function is defined as follows

The value function here is for the state S, so it’s called the state value function, also known as the V function

Behrman equation in MRPs (Emphasis)

The value function of the current state consists of two parts:

Number one: instant rewards
Second term: value function of subsequent state times attenuation coefficient

Since there may be multiple subsequent states, if the transition matrix is known, then

The matrix-vector form is:

Is essentially a linear equation that can be solved directly:

Direct solution only applies to small MRPs:

Computational complexity
For the known

Markov decision process

In both MP and MRP, we act as observers to observe state transitions and calculate returns. For an RL problem, we would prefer to change the state transition process to maximize the return value. Therefore, Markov Decision Processes (MDPs) are obtained by introducing decisions into MRP.

define

A Markov decision process (MDPs) consists of a quintuple

: A collection of actions
: State transition matrix

: reward function, representing the reward for doing action A in state S.

strategy

In MDPs, a Policy π is the probability distribution of actions in a given state

The strategy is time stable, it only depends on s, it doesn’t depend on time T
Is the ultimate goal of RL problems
If the distribution is one-hot, it is a deterministic strategy; otherwise, it is a random strategy

The relationship between MDPs and MRPs

If the MDP problem is given a policy, will degenerate into an MRP problem.

Value functions in MDPs

State value function (V function)
- Definition: Starting from state S, policies are usedThe expected return you get
State action value function (Q function)
- Definition: The state action value function in MDPs is the expected return value obtained by executing action A starting from state S and then using strategy π
  
  Action A doesn’t have to come from strategyYou actually follow the strategy after you’ve done action APerform action selection

Behrman expectation equation

Similar to MRP, the value function in MDPs can be divided into two parts: instant reward and subsequent state value function

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Reinforcement learning summary 02 Markov Decision process

An overview of the

Markov sex

State transition matrix

State transition probability

State transition matrix

State transition function

Markov process

define

An example of a Markov process

(Episode)

Markov reward process

define

Return value

Value functions in MRPs

Behrman equation in MRPs (Emphasis)

Markov decision process

define

strategy

The relationship between MDPs and MRPs

Value functions in MDPs

Behrman expectation equation

Reinforcement learning summary 02 Markov Decision process

An overview of the

Markov sex

State transition matrix

State transition probability

State transition matrix

State transition function

Markov process

define

An example of a Markov process

(Episode)

Markov reward process

define

Return value

Value functions in MRPs

Behrman equation in MRPs (Emphasis)

Markov decision process

define

strategy

The relationship between MDPs and MRPs

Value functions in MDPs

Behrman expectation equation

Related Posts

DeepDelta: A way to automatically fix compilation errors through deep learning

[Optimization solution] Particle swarm optimization algorithm to solve economic scheduling MATLAB source code

Speech Recognition Based on MATLAB Vector quantization (VQ) speaker recognition