Browser-side machine learning TensorFlowJS (5) data processing

In order to realize the performance advantages of Tensorflow.js, data needs to be converted into tensors, and some processes, such as shuffling and normalization, need to be performed at the same time. See the detailed procedures in the code.

/** * The input data can be fed into tensors */ function convertToTensor(data) {// The input data is packaged in a tidy function. return tf.tidy(() => { // Step 1. Random sorting of data, which facilitates multi-round training tf.util.shuffle(data); Tensor const inputs = data.map(d => D. horsepower) const labels = data.map(D => D. MPG); const inputTensor = tf.tensor2d(inputs, [inputs.length, 1]); const labelTensor = tf.tensor2d(labels, [labels.length, 1]); Const inputMax = inputten.max (); const inputMin = inputTensor.min(); const labelMax = labelTensor.max(); const labelMin = labelTensor.min(); const normalizedInputs = inputTensor.sub(inputMin).div(inputMax.sub(inputMin)); const normalizedLabels = labelTensor.sub(labelMin).div(labelMax.sub(labelMin)); return { inputs: normalizedInputs, labels: NormalizedLabels, // Return minimum/maximum bounds inputMax, inputMin, labelMax, labelMin,}}); }Copy the code

Data shuffle

tf.util.shuffle(data);
Copy the code

We need to rearrange the order of the data before each training session. Because during training, data sets are often divided into smaller subsets, known as batches, the model is being trained to vary the data each time it sees it.

Convert to tensors

const inputs = data.map(d => d.horsepower)
const labels = data.map(d => d.mpg);

const inputTensor = tf.tensor2d(inputs, [inputs.length, 1]);
const labelTensor = tf.tensor2d(labels, [labels.length, 1]);
Copy the code

There are two dimensions to the data, that is, there are two characteristics, one is the input (horsepower) and the other is the true value (called the tag in machine learning) which is also a feature of the data.

Each array is then converted to a 2D tensor. The tensor is of the shape [num_examples, num_features_per_example]. Inputs. Length is the number of samples, each of which has 1 input characteristic (horsepower).

Data standardization

const inputMax = inputTensor.max();
const inputMin = inputTensor.min();
const labelMax = labelTensor.max();
const labelMin = labelTensor.min();

const normalizedInputs = inputTensor.sub(inputMin).div(inputMax.sub(inputMin));
const normalizedLabels = labelTensor.sub(labelMin).div(labelMax.sub(labelMin));
Copy the code

Normalized processing of data can also be called standardized processing. Scale the data so that all values are in the 0-1 range. Normalization is actually important, because it eliminates the differences between different values, and the model prefers decimals like 0 to 1 or -1 to 1.

return {
  inputs: normalizedInputs,
  labels: normalizedLabels,
  // Return the min/max bounds so we can use them later.
  inputMax,
  inputMin,
  labelMax,
  labelMin,
}
Copy the code

The reason why regression scaling is used in maximum and minimum values. There are two uses. The first is that we can use these values to restore the data, and the other is that we can use these values to normalize the new data in the forecast. g’te

Browser-side machine learning TensorFlowJS (5) data processing

Data shuffle

Convert to tensors

Data standardization

Related Posts

How to write a regular expression that verifies complex rules?

Cyclic linked list of Algorithms in November

Delete duplicate item II from an ordered array