1. Encountered a list data format similar to the following when climbing 58.com data:

a = ['\r\n \r\n '.'\r\ N critical -\r\n 龒ð©§¯龤.'yuan/month \r\n']
Copy the code

As you can see, there are many unnecessary symbols: \r\n and Spaces!! Simply using some ‘.join(),.strip() methods is no longer possible!!

1. 2. Smart mattress

a = ' '.join(a)

with open('test.txt'.'w', encoding='utf-8') as f:
    f.write(a)
Copy the code

The output is:

c = re.sub('\s'.' ',a)
print(c)
Copy the code

\s can match any of the whitespace characters, such as space, TAB, page feed, etc.

Output result: