**Comparison Between Reinforcement Learning and Deep Learning**

Before we jump to the comparison between Reinforcement learning and deep learning, I would like to briefly introduce the concept of Reinforcement learning and deep learning.

Reinforcement learning is an area of machine learning other than Supervised learning and Unsupervised learning. Reinforcement learning explores how an agent should interact with its environment to maximize the total reward. The purpose of an Reinforcement learning is to learn a function that determines what action to take in a given state such that the total expected future reward is maximized. Such a function is called optimal policy. Achieving the optimal policy requires exploration and experimentation in the given environment.

The most common theoretical framework(environment) of Reinforcement learning is Markov Decision Process. At each time step t, an agent in a state $S_t$ chooses an action $A_t$, and transition into a new state $S_{t+1}$ and receives a reward $R_{t+1}$, where $S_t$ belongs to a set of states $S$, which is the possible configuration of the environment, $A_t$ belongs to a set of actions $A$, which is the possible possible actions the environment can apply to various joints, $P(s^{'}|s,a)$(may be stochastic) which is a transition function that give the probability of transitioning from state $s$ to state $s^'$ when taking action a and finally a reward function $R_a(s,s^{'})$, the random or deterministic reward received when transitioning to from state $s$ to state $s^{'}$ after taking action $a$.

Our goal is to train an optimal policy $\pi$ mapping from the state space $S$ to the probability space of taking actions. The optimal policy will give a distribution over possible actions in a given state that maximizes the expected discounted future reward $G_t=E[\sum_{k=1}^{\infty}\gamma^{k-1}R_{t+k}]$, where $\gamma$ is a discount factor ranging from 0 to 1. The choice of $\gamma$ depends on how the algorithm values present reward and future reward.

As a part of a broader family of machine learning, Deep learning in artificial intelligence(AI) has gained more and more attention these days. It is mainly a method that imitates the functions and workings of the human brain in data processing and later on in decision makings through inductive patterns. It is also known as Deep neutral learning or Deep neural network.

When humans make decisions, hundreds of neuron nodes are participating in the process. It is similar for Deep learning which uses a hierarchical level of Artificial Neural Networks (ANN). In ANN implementations, each connection will generate “signal” as real numbers. Then, non-linear functions can be computed using the output of each neuron. Throughout the training process, each neuron node will be given its optimal connection weight and optimal threshold.

I would like to demonstrate the difference between Reinforcement learning and Deep learning in three aspects. First, the native or motivation for Reinforcement learning and Deep learning are quite different. Reinforcement learning is to train a function to give optimal action given a certain state, while the Deep learning is a complicated function approximation to mimic the human brain and how the information is processed in neurons and passed to the next neuron.

Second, one is popular in some fields where the other may not be practical. For example, DeepMind's AlphaGo is a very classic instance for Reinforcement Learning. The Reinforcement learning algorithm is trained by tons of experiments to beat Go masters.

AlphaGo has no prior knowledge about the game and only simple basic rules as input. It only took three days for AlphaGo Zero to surpass the ability of AlphaGo Lee, the version that beat world champion 4 out of 5 games. It only takes 40 days to become the best Go player in the world only by self-playing without historical data and human intervention.

Another example of Reinforcement Learning is a walking robot where the actions will be defined by how large the step size, how high the robot should lift the lag. The reward may be subject to not keep balancing. The Reinforcement learning algorithm will be trained until an optimal policy which describes how to walk without falling.

Moreover, Reinforcement Learning is popular for autonomous vehicles to drive safely, handle emergencies on the road and obey the traffic rules. Reinforcement Learning enables agents to be capable of learning from the system of rewards and penalties.

An example of using Deep learning is the Amazon credit card fraud system. The team built neural networks by using the data obtained from hundred thousands of online credit card purchases. Since the number of observations are really big, this is so-called big data analyzing. Independent variables involved in building neural networks are like retailers, credit score, IP address and so on.

Deep learning algorithms can not only create patterns from these transitions, it can also know when a pattern is showing the need for a fraudulent investigation.

Deep learning techniques such as Convolutional neural networks are very helpful for image recognition and classification. CNN can be used to classify the chest x-rays in order to diagnose patients with COVID-19, viral pneumonia, and bacterial pneumonia from images of chest X-rays. Transferring thousands of png format chest x-rays into matrices as input for predetermined layers enables the neural network algorithm to get the optimal parameters.

Third, another difference between Reinforcement learning and Deep learning is that the Deep learning algorithm is learning from some training data set and then applying the trained model to other test data, while Reinforcement learning interacts with the environment, seeking ways to maximize the reward by dynamically learning from feedback. In other words, Reinforcement learning may not require a data set for training, while Deep learning may require a lot of data set and considerable computation power.

In conclusion, there are several discrepancies between Reinforcement learning and Deep learning, but Deep learning and Reinforcement learning aren’t mutually exclusive. In the case of Deep Reinforcement learning, one can neural network to store the experiences in order to improve performance. There is no absolute boundary between them.

**Written by Kailun Chen & Edited by Alexander Fleiss**