takeaway

When I first read the Recode command, I thought it was just a dab hand at changing missing values. We often see it like this:

recode a (.=0)
Copy the code

It works as follows:

replace a=0 if a=.
Copy the code

Recode again, this little command is not so easy! Master recode, dealing with numerical variables is not generally slip! Good things need to be shared, so I’m going to give you a full breakdown of the recode command.

Code words are not easy to nonsense two sentences: need python learning materials or technical questions to exchange “click”

A, grammar,

The full syntax of the recode command is as follows:

code varlist (erule) [(erule) . . .] [if] [in] [, options]
Copy the code

Where varList is one or more variables that we want to change; Erule is our specified change rule of the form new value = old value. Multiple rules can be specified at the same time. Each rule is enclosed in parentheses and separated by Spaces. If and in are conditional and range filter statements that can be added freely depending on the purpose; Finally, there are some important options, which will be described in detail later.

(1) General rule form

recode x (3=1// Change the observed value of the variable x3All is changed to1
recode x y (8=1) (7=2) (6=3// Add the observed values of variables x and y8Instead of1.7Instead of2.6Instead of3
recode x (2. =0// Change the observed value of the variable x2And missing values instead0
recode x (1/5=4// set the value of x to1to5All the real numbers in the closed interval4
Copy the code

Note: 1/5 in the recode command is different from 1/5 in the Stata sequence representation. The list of numbers means the five integers 1, 2, 3, 4, 5, but in recode 1/5 includes all the real numbers in the closed range 1 through 5.

In addition, we can use min and Max to represent values, on both the left and right sides of the equal sign.

recode x (min/3=max// Set the minimum value to3All values in this interval are maximizedCopy the code

(2) The form of rules containing keywords:

To better illustrate the form of rules with keywords, we first use the input command to enter some variables and values:

clear
input v
1
2.1
3.2
.
5
6
7
8
end
Copy the code

And for the sake of illustration, here are two simple but very important options:

The first option is generate(), the variable name used to generate a new variable. Next is the prefix() option, which is used to add new variable prefixes. Without these two options, the recode command would have changed the initial data, causing a lot of inconvenience, so it is highly recommended to add one of these options when using the Recode command.

For example, change 1 to 5 in the original data to 0 and store the changed value in a new variable prefixed with “new_” from the original variable name.

recode v (1/5=0),prefix(new_)
Copy the code

At this point, the data set looks like the figure below:

Looking at the new variable, we notice that the values outside the rule remain the same. At this point, we can continue to specify general rules, or we can specify rules with keywords. The keyword rule is set for values that have not been changed by the previous rule and must come after the general rule. There are four main keywords: nonmissing, missing, else and *.

recode v (1/5=0) (nonmissing=1),generate(v1) // Change the non-missing value that has not been changed by the previous rule1
recode v (1/5=0) (missing=2),generate(v2) // Change the missing value not changed by the previous rule to2
recode v (1/5=0) (else=1),generate(v3) // Change all values not changed by the previous rule1
recode v (1/5=0(* =)1), the generate (v4) / / * andelseRecode V (1/5=0) (nonmissing=1) (missing=2),generate(v5)
Copy the code

Looking at the value set again, I believe you now have a basic understanding of the keyword rules.

Else /* cannot be used with nonmissing or missing, otherwise you will get the following error:

Other options and cases

(1) Test () option

The rule specification in the ecode command is ordered. In a left-to-right rule, once a value has already been specified to make some kind of change, specifying it again in a later rule will be ignored. Take the data just released:

recode v (1/5=0) (5=1),generate(v6)
Copy the code

Here, the value 5 is specified twice: the first is to change 5 to 0; The second time I want to change 5 to 1. Looking at the set of values, we can see that 5 becomes 0, and the instruction to change 5 to 1 is ignored:

But the resulting screen does not give an obvious hint:

Such missteps are hard to spot when it comes to big data. Add a test() option to the result screen and you will be prompted to repeat the rule:

In addition, the test() option tests whether all rules are applied to the recode command and displays an alert if any rule is not used. There is no value 10 in the dataset, so the rule specification must be invalid, and the text() option will indicate that at least one rule is invalid:

(2) Copyrest () option

In recode commands, if you add an if or in statement that limits the range of observations to be changed, the value outside the range will become the missing value in the resulting new variable. For example, specify the first three rows to change accordingly:

recode v (1/5=0) in 1/3,gen(v9)
Copy the code

Looking at the data set, you can see that all values outside the first three lines become missing values. Adding the copyrset option at this point copies the out-of-range observations as they are to the new variable.

recode v (1/5=0) in 1/3,gen(v10) copyrest
Copy the code

(3) Label () option

The label() option is the soul of the Recode command, and it is because of the label() option that Recode deserves the title of classification wizard. In previous tweets to keep your data at a glance — the label command introduction, we showed you how to use the label command to label values. Generally, it is divided into two steps. First, label define is used to define value labels, and then label value is used to label labels. With recode, you can do both with a single line of command!

Here’s a look at the label() option in recode using the network data set provided by Stata:

webuse fullauto, clear
Copy the code

In the variable window, we can see that rep77 and rep78 share the value label REPAIR.

View the contents of the repair value label.

label list repair
Copy the code

We found that the value label of repair has 5 items:

Now, we think that 5 classes are too much, we want to change to 3 classes, and regroup the corresponding data, that is, implement the following operation:

Just add the new label content in double quotation marks after each rule specification and name the new label with the label() option at the end:

recode rep77 rep78 (1 2=1 "Below average") (3=2 "Average") (4 5=3 "Above average"),prefix(new) label(newrep)
list *rep* in 1/5,nolabel
label list repair newrep
Copy the code

At this point, the new label content and part of the data set are as follows:

I have spent three days in compiling a set of Python learning tutorials, from the most basic Python scripts to Web development, crawlers, data analysis, data visualization, machine learning, etc. These materials have wanted to friends: Click to pick it up