1. Encountered a list data format similar to the following when climbing 58.com data:
a = ['\r\n \r\n '.'\r\ N critical -\r\n 龒𩧯龤.'yuan/month \r\n']
Copy the code
As you can see, there are many unnecessary symbols: \r\n and Spaces!! Simply using some ‘.join(),.strip() methods is no longer possible!!
1. 2. Smart mattress
Step 1: Join the list elements into a string using the.join() method!
a = ' '.join(a)
with open('test.txt'.'w', encoding='utf-8') as f:
f.write(a)
Copy the code
The output is:
Step 2: Replace the newlines, tabs, and Spaces in the string after the first step with the sub() function in the regular expression!
c = re.sub('\s'.' ',a)
print(c)
Copy the code
\s can match any of the whitespace characters, such as space, TAB, page feed, etc.
Output result: