When the project was deployed that day, I encountered a very strange phenomenon, and I suspected that my career was over. Later, I found out that it was the format problem of the newline of the shell script under Linux and Windows, which led to the script being unable to execute. (So much for the headline accident… 🙂

Seems like a stupid question, slip :)… But Shakespeare said the best way to solve a problem is not to do it again. So let’s dig deep and get to the bottom of what a newline character is in a computer system.

Historical sources

Before computer appear, there is a kind of communication equipment (Teletype Mode), called Teletype machine output 10 characters per second, 0.1 seconds per character, but there is a problem, you have typed one line of line, is to consume 0.2 seconds, in this time period, if there is a new character, will be lost, so some people in order to solve this problem, Add two characters after each line to indicate the end of the line.

There are two actions involved:

  • Pull the print probe (like the head on the scale above) from the right back to the left; this is a carriage Return
  • It’s not enough to pull the probe back to the left, but also to wrap the probe (corresponding to the distance of the paper up one line). This is the line feed.

The latter concept was transferred to computer systems, But where there were people there was disagreement, and systems began to show up.

  • WindowsIn the system, each line ends with a carriage return + line feed (same as the two actions mentioned above),\r\n;
  • UnixSystem, only newlines at the end of each line\n;
  • MacSystem, only newlines at the end of each line\n;

PS: the old MAC system used \r, now all MAC use \n, and Unix consistent.

The difference in newline rules between different systems leads to inconsistency between files on different systems. For example, when a file on Unix/Mac is opened on Windows, all the text will be on one line for obvious reasons.

ASCII

binary The decimal system hexadecimal Character/abbreviation explain
00001001 9 09 HT (Horizontal Tab) Horizontal TAB character
00001010 10 0A LF/NL(Line Feed/New Line) linefeed
00001011 11 0B VT (Vertical Tab) Vertical TAB character
00001100 12 0C FF/NP (Form Feed/New Page) The page key
00001101 13 0D CR (Carriage Return) The enter key
00001110 14 0E SO (Shift Out) Without switching

Practice is the book

Step by step, the test environments are Windows and Linux.

  1. WindowsUnder a newwin.txtFile, write a sentence in plain English. For those of you who are up there, the newline is zeroCRLF.
talk is cheap,
show me your code.
Copy the code
  1. inLinuxThe system usevimOpen the newly created file

Doesn’t that seem normal? That’s what it looks like on Windows. Note that vim checks for newlines when viewing the file. If all newlines are CRLF, it automatically displays the text in DOS format, as shown in [DOS] at the bottom.

DOS (Disk Operating System) uses CRLF like Windows

  1. usecat -AOption to view all characters of text

^M$= M ^M = M ^M$= M ^M = M ^M$= M ^M = M ^M$= M ^M $is not a newline character and can be understood as a Symbol used to indicate the end of text EOF in Linux.

  1. usecat -vOption displays non-print characters

  1. Remove the carriage return on the first line and see
sed -i '1s/^M//' win.txt
Copy the code

You see that the first row to the M is missing, but the second row is still there.

  1. Repeat step 1 to see what happens

The result is that vim has the ^M symbol in it and the [DOS] logo is missing in the lower left corner, which echoes the phenomenon mentioned in the second point, indicating that Vim uses Linux to display text.

So far, we have confirmed the difference between different line breaking rules in Windows and Linux environments.

Verify that the carriage return character really exists

See a friend in the community to do this attempt, then reference to do, can intuitively reflect what is “enter”.

  1. Go back to the example above,cat -AView all characters
  2. sed -i '1s/.*/& ypm/g' win.txtAdd at the end of the first lineypm
  3. Looking again, we find that the command executed above works. *Entire lines are matched, excluding newlines^M, soypmIt’s added at the end of the first line
  4. withcatWhen I was looking at the text normally, I noticed something strange,ypmcoverstalkHow to explain?

We need to know that cat in normal mode, when we print text, will read ^M as a carriage return.

So that explains why, when you hit a carriage return, it’s like the print probe goes back to the left from right, and the four characters in ypm just cover the four characters in talk. So that gives you an intuition of what carriage returns are.

How to avoid

In terms of how to avoid this difference, there are actually different approaches in different directions, such as for ^M, cast text format, but personally, the fundamental solution is to pay attention to this format problem between systems, when editing code, every IDE has its own solution.

  1. Remove carriage return
cat -v win.txt | tr -d '^M'> Linux. TXT or cat win. TXT | tr-d '\ 015'> Linux. TXT or cat win. TXT | tr-d '\r' > linux.txt
Copy the code
vimEnter :%s/^M//g or:set fileformat=unix
Copy the code
  1. Terminal command conversion
dos2unix win.txt
Copy the code

conclusion

  • A carriage return\r: CR (carriage return)
  • A newline\n: LF (Line feed)
  • WindowsThe system follows the original rule, that is, carriage return + newline must be satisfied, and neither the missing nor the sequential transposition is allowed, that is\r\n
  • UnixA newline character was encountered in the system\nIt’s going to do carriage return plus line feed, and carriage return\rIs used as a special character^MAccording to

Did not expect such a simple problem, pulled so much, in fact, many partners have encountered this problem, the community is often because of the newline character caused by the “bloody case”, I hope that did not lead to the program monkey brother hand error caused by tens of millions of losses. In operation and maintenance, DB and other fields, these “small problems” may be magnified, so it still needs to be paid attention to, although there are many ways to avoid, but in the early stage of code writing, we should form a habit to pay attention to this, when troubleshooting problems this is the same direction.

PS: There may be some places that are not rigorous, but also welcome everyone to discuss, light spray…

reference

www.ruanyifeng.com/blog/2006/0…

www.cnblogs.com/linuxnote/p…

Blog.csdn.net/zhangguangy…