Big data abstracts
By Emil Wallner
Gao Ning, Happen, Chen Ling, Alieen
The wave of deep learning began five years ago. With the explosion of computing power and several success stories, deep learning has generated a lot of hype. Deep learning can be used to drive cars, play Atari games against humans, and diagnose cancer.
When I started learning neural networks, I spent two weeks exploring, choosing the right tools, comparing different cloud services and searching online courses. But in retrospect, I wish I could have created neural networks from day one, which is the purpose of this article.
You don’t need any prior knowledge. Of course, a basic understanding of Python, the command line, and Jupyter Notebook will help.
Deep learning, a branch of machine learning, has proved to be an effective way to find fixed models in raw data, such as a picture or a sound. If you want to classify images of cats and dogs. Without specific programming, it first finds edges in the picture and then builds patterns from the different edges. Next, it detects the nose, tail and PAWS. In this way, neural networks can eventually classify dogs and cats.
But structured data can be used with better machine learning algorithms. For example, if you have ordered customer data in your Excel sheet and you want to be able to predict their next order, you can do it the old-fashioned way, using a simpler machine learning algorithm.
The core logic
Imagine a machine with lots of randomly adjusted gears. The gears are stacked on many layers and interact with each other. At first, the machine didn’t work properly. Then adjust the gears at random until they give the correct output.
Then an engineer will start looking at all the gears and mark which ones are causing errors. He would start with the last gear, because this is the cumulative result of all the errors. Once he finds the error in the last layer, he starts looking at the previous layer. In this way he could calculate the contribution of each gear to the error. We call this process back propagation.
The engineer then began to adjust each gear based on the errors he had found before, and then restarted the whole machine. Run the machine, calculate the errors, adjust the gears, and repeat the process until the machine gives the correct output.
Prediction — Calculation error — Adjustment of predicted value (training period)
Neural networks work the same way, with inputs and outputs, and then adjust the gears to find the relationship between the inputs and outputs. Given an input, the output is predicted by adjusting the gear, and then the predicted value is compared with the real value.
The neural network seeks to minimize the error (the difference between the predicted value and the actual value) by adjusting the gears until the difference between the predicted value and the actual value is as small as possible.
One of the best ways to minimize the error is gradient descent, which is the error function, or cost function, to calculate the error.
Shallow neural network
Many people think of artificial neural networks as digital replicas of our neocortex. This is a wrong idea.
We don’t know how the brain can make such claims. This is just one source of inspiration for Frank Rosenblatt, the inventor of neural networks.
Shallow neural network input – weight – sum – judgment – > (predicted value – actual value) * learning rate
Play with a neural network simulator for an hour or two and you’ll get a feel for it.
We’ll start by implementing a simple neural network to understand the syntax in TFlearn. Let’s start with the classic 101 problem, the OR operator. While neural networks are better suited to other types of data, this is a good question to ask to understand how they work.
All deep learning programs follow the same core logic:
1. Load the library first, then load the data and clean it. Whether it’s photo, audio or perceptual data, all input is converted to digital. These long lists of numbers are inputs to our neural network. 2. Now design the neural network. Choose the type and number of layers in your neural network 3. Then it will enter the learning process. 4. Neural networks know inputs and outputs, and look for relationships between them. And then it gives you a predictive value from the trained model.
Here is the neural network program: The outputThe first line Lines beginning with “#” indicate remarks and are generally used to explain code
The second line loads the TFlearn library and with this statement we can use Google Tensorflow’s deep learning function
Lines 5 and 6 store the data from the above table in a list
The dot at the end of each number maps integers to floating point numbers. It stores numbers with decimal values to make calculations more accurate.
Line 7 initializes the neural network and specifies the dimensions or dimensions of the input data
All OR operations are performed in a pair of data, so the dimension is 2.
The null value is the default and indicates the size of the batch
Line 8 output layer
The activation function maps the results of the process to the output layer
In this example, we use the Sigmoid function to map it to the (0,1) interval range
The eleventh line applies regression and uses optimizer to select the appropriate algorithm to minimize the cost function. The learning rate determines the adjustment speed of neural network, while the loss variable determines how to calculate the error.
Line 12 selects which neural network to use
It can also be used to specify where to store training logs
Line 13 train your neural network and model
Select your input data (OR) and the actual label (Y_truth)
Epochs determines the number of times your neural network runs data in cycles
If you set snapshot=True, model validation takes place after each loop
Lines 14 to 18 make predictions using trained models
In this case, it returns the probability that the result is 1/True
Output layer
The first result indicates that the combination of [0.]&[0.] has a 4% chance of being true, and so on. “Training Step” shows you how many batches you’ve trained.
All data will be trained once in each batch, similar to the Epoch. If the data is large relative to memory, you need to stop segmenting. The loss function calculates the number of errors per iteration.
SGD refers to stochastic gradient descent method and minimizing cost function method.
Iter refers to the sum of the current data index and input items.
You can find the above logic and syntax in most TFlearn neural networks. The best way to learn this code is to modify it and make some errors.
The loss curve shows the number of errors per training
You can use Tensorboard to visualize each experiment and see how each parameter affects the training.
Here are some suggestions for examples you can run. I recommend that you spend a few hours practicing these examples to get comfortable with the runtime environment and the parameters in TFlearn.
The experiment
-
Increase the number of training and iteration
-
Try adding or changing arguments to each function mentioned in the documentation
For example, g = tfLearn. Fullyconnected (g, 1, activation= ‘sigmoid’) change to TFLearn. Bias =False) increments an integer to the input
-
Change the shape of the input layer
-
Change the activation function for the output layer
-
Different gradient descent methods are used
-
Change the way neural networks calculate costs
-
Replace “AND” AND “NOT” logical operations with X AND Y
-
For example, change the last item Y_truth from [1.] to [0.]. For this to work, you need to add a layer to the network.
-
Make it learn faster
-
Try to make each step longer than 0.1 seconds
The new
Using Tensorflow in conjunction with Python is the most common tool in deep learning.
TFlearn is a high-level framework that runs on top of Tensorflow.
Another common framework is Keras. This is a more robust library, but I find TFlearn’s syntax more concise and understandable.
Both are high-level frameworks that run on top of Tensorflow.
You can use your computer’s CPU to run simple neural networks. But most experiments take hours or even weeks to run. This is why most people use modern GPU cloud services for deep learning.
The simplest GPU cloud services solution is FloydHub (https://www.floydhub.com/). If you master basic command-line skills, FloydHub will take less than five minutes to deploy.
Use the FloyHub documentation to install floyd-CLI command line tools. FloydHub also provides in-house customer service support for customers who encounter problems.
Let’s use TFlearn, Jupyter Notebook, and Tensorboard to run your first neural network in FloyHub.
Install FloydHub and log in to download the required files in this guide.
Open the terminal and enter the following command:
Go to the folder and initialize FloydHub:
FloydHub will open a Web panel in your browser and prompt you to create a new project called 101. When finished, return to the terminal and enter the initialization command again.
Now you can run your neural network tasks on FloydHub.
You can do different Settings with the “Floyd run” command. In our case, we want to:
O Add an uploaded public data set to FloydHub
O data emilwallner/datasets/cifar – 10/1: data, specified data directory. You can view this dataset (and many other public ones) on FloydHub.
O GPU Gpu cloud computing is used
O TensorBoard Activates the TensorBoard
O Mode Jupyter Jupyter Notebook Running tasks in this mode
OK, to run our task:
After initializing Jupyter in your browser, click on the “start-here.ipnyb” file.
Start-here. ipnyb contains a simple neural network to understand the TFlearn syntax. It learns the “OR” logic and then explains all the combinations.
Click “Restart & Run All” under “Kernel” in the menu bar. If you can see the message, it’s working and you can do something else.
Go to your FloydHub project and find the Tensorboard link.
Deep neural network
A deep neural network is one that contains more than one hidden layer. There are many detailed tutorials on how CNN (convolutional Neural Network) works.
Therefore, we will focus on higher-level concepts that apply to more neural networks.
Note: This diagram is not a deep neural network. It requires more than one hidden layer.
You want to train neural networks to predict untrained data. It requires the ability to generalize. It’s a balance between learning and forgetting.
You want it to learn how to separate the signal from the noise, but at the same time forget the signal that only appears in the training data.
If the neural network is not fully learned, it will appear underfitting phenomenon. The opposite is the phenomenon of overfitting. It refers to learning too much from training data.
Regularization is a method of reducing overfitting by forgetting specific signals from training.
To further understand these concepts, we conducted experiments on the CIFAR-10 dataset. The dataset contains 60,000 images in 10 categories, such as cars, trucks and birds. The goal is to predict which category a new image will fall into.
Example images from CIFAR
Often we need to mine data, clean up data, and filter images. But to simplify things, let’s just focus on the neural network. You can be in Jupyter notebook running in the installation of all examples (https://github.com/emilwallner/Deep-Learning-101).
Input layer The output layer divides the images into 10 categories. The hidden layer is a mixture of the convolution layer, pooling layer and connection layer.
Select the layer number of
Let’s compare the difference between a neural network with only one layer and a neural network with three layers. Each layer contains the convolution layer, the pool layer, and the association layer.
You can Run these scripts by clicking Kernel > Restart & Run All in the menu bar. Then glance at the training notes in the Tensorboard. You’ll notice that there are many layers and it’s 15% more accurate. Fewer layers have a lower degree of fit – proof that it doesn’t learn enough.
You can run the same example from the folder you downloaded earlier, including the following experiment.
Let’s take a look at accuracy and validation set accuracy. The best practice in deep learning is to split the data set in two, with one part used to train the neural network and the rest used to validate it. This can tell us how well a neural network can predict new data, or how well it can do by analogy.
As you can see, the training data is more accurate than the validation data set. This neural network contains background noise and other details that hinder the prediction of new images.
To solve the overfitting problem, you can punish complex equations and add noise to the neural network. Common regularization techniques for solving this problem include discarding layers and punishing complex equations.
Discarding the layer
We can think of it as a contrast to discarding regularization: some powerful neurons do not determine the final outcome, but rather assign power to them.
Neural networks are forced to learn some independent performance. When it comes to final prediction, it has a few different models to learn from.
The following is an example of a neural network with a discard layer.
In this comparison, the two neural networks are the same except one has a discarding layer and the other doesn’t.
At each layer in the neural network, neurons become more dependent on each other. Some neurons are more influential than others. The discarding layer randomly discards some neurons. Thus, each neuron needs to make a different contribution to the final output.
Another popular way to prevent overfitting is to apply REGULAR L1 or L2 equations in each layer.
L1&l2 regular equations
If you want to describe a horse, for example, if you describe it too carefully, you will rule out too many different types of horses. But if it’s too general it could include a lot of other animals. L1 and L2 regularization helps our network make this distinction.
If we compare it with a similar experiment, we get similar results.
Neural networks with regularization equations performed better than those without.
The regularization equation L2 punishment equation is too complex. It measures the contribution of each equation to the final output, and then the equation with the large penalty coefficient.
Batch size
Another important parameter is batch size, the amount of data in each training step. The following is a comparison of a large batch of data with a small batch of data.
As you can see, large batches require fewer cycles but are more precise in training. Small batches, by contrast, are more random but require more steps to compensate.
Large quantities do not require many learning steps. However, you need more storage and time to calculate each step.
vector
The final experiment compares networks with large, medium and small learning rates.
Learning rate is regarded as the most important parameter because of its influence. It regulates how to adjust for predicted changes at each step of the learning process. If the learning rate is too high or too low, it cannot converge, just like the large learning rate in the figure above.
There is no particular way to design a neural network. A lot will be determined by experimentation. See how others add layers and adjust higher-order parameters.
If you have strong computing power, you can design a program and adjust higher-order parameters.
When you’re done running, you should slow down your GPU cloud for example by undoing the FloydHub web dashboard.
subsequent
In the official sample of TFlearn (https://github.com/tflearn/tflearn/tree/master/examples/images), you can feel some outstanding convolution neural network. Try to use some of these methods to improve the validation set accuracy of CIFAR-10 database. The current optimal result is 96.53% (Graham, 2014).
It is well worth learning Python’s syntax and becoming familiar with its command statements. This effectively reduces unnecessary cognitive load and allows you to focus on deep learning concepts. Start with the Python lessons at Codecademy, then do some command language exercises. If you only do this one thing, you can master it in less than three days.
Original link:
https://medium.freecodecamp.org/deep-learning-for-developers-tools-you-can-use-to-code-neural-networks-on-day-1-34c4435a e6b
The Super popular American ARTIFICIAL intelligence conference praised by Forbes has landed in China.
Past lecturers include Fei-fei Li, Andrew Ng, Peter Norvig, Yann LeCun, etc.