When the project was deployed that day, I encountered a very strange phenomenon, and I suspected that my career was over. Later, I found out that it was the format problem of the newline of the shell script under Linux and Windows, which led to the script being unable to execute. (So much for the headline accident… 🙂
Seems like a stupid question, slip :)… But Shakespeare said the best way to solve a problem is not to do it again. So let’s dig deep and get to the bottom of what a newline character is in a computer system.
Historical sources
Before computer appear, there is a kind of communication equipment (Teletype Mode), called Teletype machine output 10 characters per second, 0.1 seconds per character, but there is a problem, you have typed one line of line, is to consume 0.2 seconds, in this time period, if there is a new character, will be lost, so some people in order to solve this problem, Add two characters after each line to indicate the end of the line.
There are two actions involved:
- Pull the print probe (like the head on the scale above) from the right back to the left; this is a carriage Return
- It’s not enough to pull the probe back to the left, but also to wrap the probe (corresponding to the distance of the paper up one line). This is the line feed.
The latter concept was transferred to computer systems, But where there were people there was disagreement, and systems began to show up.
Windows
In the system, each line ends with a carriage return + line feed (same as the two actions mentioned above),\r\n
;Unix
System, only newlines at the end of each line\n
;Mac
System, only newlines at the end of each line\n
;
PS: the old MAC system used \r, now all MAC use \n, and Unix consistent.
The difference in newline rules between different systems leads to inconsistency between files on different systems. For example, when a file on Unix/Mac is opened on Windows, all the text will be on one line for obvious reasons.
ASCII
binary | The decimal system | hexadecimal | Character/abbreviation | explain |
---|---|---|---|---|
00001001 | 9 | 09 | HT (Horizontal Tab) | Horizontal TAB character |
00001010 | 10 | 0A | LF/NL(Line Feed/New Line) | linefeed |
00001011 | 11 | 0B | VT (Vertical Tab) | Vertical TAB character |
00001100 | 12 | 0C | FF/NP (Form Feed/New Page) | The page key |
00001101 | 13 | 0D | CR (Carriage Return) | The enter key |
00001110 | 14 | 0E | SO (Shift Out) | Without switching |
Practice is the book
Step by step, the test environments are Windows and Linux.
Windows
Under a newwin.txt
File, write a sentence in plain English. For those of you who are up there, the newline is zeroCRLF
.
talk is cheap,
show me your code.
Copy the code
- in
Linux
The system usevim
Open the newly created file
Doesn’t that seem normal? That’s what it looks like on Windows. Note that vim checks for newlines when viewing the file. If all newlines are CRLF, it automatically displays the text in DOS format, as shown in [DOS] at the bottom.
DOS (Disk Operating System) uses CRLF like Windows
- use
cat -A
Option to view all characters of text
^M$= M ^M = M ^M$= M ^M = M ^M$= M ^M = M ^M$= M ^M $is not a newline character and can be understood as a Symbol used to indicate the end of text EOF in Linux.
- use
cat -v
Option displays non-print characters
- Remove the carriage return on the first line and see
sed -i '1s/^M//' win.txt
Copy the code
You see that the first row to the M is missing, but the second row is still there.
- Repeat step 1 to see what happens
The result is that vim has the ^M symbol in it and the [DOS] logo is missing in the lower left corner, which echoes the phenomenon mentioned in the second point, indicating that Vim uses Linux to display text.
So far, we have confirmed the difference between different line breaking rules in Windows and Linux environments.
Verify that the carriage return character really exists
See a friend in the community to do this attempt, then reference to do, can intuitively reflect what is “enter”.
- Go back to the example above,
cat -A
View all characters sed -i '1s/.*/& ypm/g' win.txt
Add at the end of the first lineypm
- Looking again, we find that the command executed above works
. *
Entire lines are matched, excluding newlines^M
, soypm
It’s added at the end of the first line - with
cat
When I was looking at the text normally, I noticed something strange,ypm
coverstalk
How to explain?
We need to know that cat in normal mode, when we print text, will read ^M as a carriage return.
So that explains why, when you hit a carriage return, it’s like the print probe goes back to the left from right, and the four characters in ypm just cover the four characters in talk. So that gives you an intuition of what carriage returns are.
How to avoid
In terms of how to avoid this difference, there are actually different approaches in different directions, such as for ^M, cast text format, but personally, the fundamental solution is to pay attention to this format problem between systems, when editing code, every IDE has its own solution.
- Remove carriage return
cat -v win.txt | tr -d '^M'> Linux. TXT or cat win. TXT | tr-d '\ 015'> Linux. TXT or cat win. TXT | tr-d '\r' > linux.txt
Copy the code
vimEnter :%s/^M//g or:set fileformat=unix
Copy the code
- Terminal command conversion
dos2unix win.txt
Copy the code
conclusion
- A carriage return
\r
: CR (carriage return) - A newline
\n
: LF (Line feed) Windows
The system follows the original rule, that is, carriage return + newline must be satisfied, and neither the missing nor the sequential transposition is allowed, that is\r\n
Unix
A newline character was encountered in the system\n
It’s going to do carriage return plus line feed, and carriage return\r
Is used as a special character^M
According to
Did not expect such a simple problem, pulled so much, in fact, many partners have encountered this problem, the community is often because of the newline character caused by the “bloody case”, I hope that did not lead to the program monkey brother hand error caused by tens of millions of losses. In operation and maintenance, DB and other fields, these “small problems” may be magnified, so it still needs to be paid attention to, although there are many ways to avoid, but in the early stage of code writing, we should form a habit to pay attention to this, when troubleshooting problems this is the same direction.
PS: There may be some places that are not rigorous, but also welcome everyone to discuss, light spray…
reference
www.ruanyifeng.com/blog/2006/0…
www.cnblogs.com/linuxnote/p…
Blog.csdn.net/zhangguangy…