1. Encountered a list data format similar to the following when climbing 58.com data:

a = ['\r\n \r\n '.'\r\ N critical -\r\n 龒𩧯龤.'yuan/month \r\n']
Copy the code

As you can see, there are many unnecessary symbols: \r\n and Spaces!! Simply using some ‘.join(),.strip() methods is no longer possible!!

1. 2. Smart mattress

Step 1: Join the list elements into a string using the.join() method!

a = ' '.join(a)

with open('test.txt'.'w', encoding='utf-8') as f:
    f.write(a)
Copy the code

The output is:

Step 2: Replace the newlines, tabs, and Spaces in the string after the first step with the sub() function in the regular expression!

c = re.sub('\s'.' ',a)
print(c)
Copy the code

\s can match any of the whitespace characters, such as space, TAB, page feed, etc.

Output result:

Perfect solution!!