CSV files have the advantages of simple format, quick access, and good compatibility. Many engineering, financial, and commercial data files are saved and processed using CSV files. CSV has also been used in data processing in my work. Briefly summarize my experience in using CSV, especially those that differ from official documents due to local compatibility.

The format of a CSV file is very simple. It is similar to a text document. Each line contains one piece of data, and each piece of data in the same line is separated by comma (or TAB). Python comes with the CSV module, which is specifically used to read and archive CSV files.

In CSV module, there are mainly two ways to access CSV files: function method; Class methods.

Dialect: Dialect: Dialect: Dialect: Dialect: Dialect: Dialect: Dialect: Dialect

List ();

Csv.reader (f [, dialect=’excel’][optional kwargs]) # Return a CSV reader(essentially an iterator with __next__(), __iter__() methods) that iterates the contents of a CSV file. F is the file object to open the CSV file. Note that the actual file should be opened in text mode (ror RT), not the binary mode of the official document. iterator should return strings, not bytes (did you open the file in text mode?) ). Dialect specifies the set of attributes to be used for CSV file parsing. You can also modify the specified attributes using the keyword parameter.

Csv.writer (f [, dialect=’excel’][optional kwargs]) # csv.writer(f [, dialect=’excel’][optional kwargs]) # csv.writer(f [, dialect=’excel’][optional kwargs]) F is the CSV file object to be written to. Similarly, the measured file should be opened in text mode (W or WT), not binary mode of the official document.

l = [] with open('test1.csv','rt') as f: cr = csv.reader(f) for row in cr: List with open('1.csv','wt') as f2: print(row) l.append(row) # Csv. writer(f2) # use writerow() for item in l: Cw.writerow (item) # write each element of the list to a CSV file line # or use the writerows() method #cw.writerows(L) # write the contents of the nested list to a CSV file, with each outer element as one line and each inner element as one dataCopy the code
test1.csv
1.csv

Note: 1. In CSV, there is a blank line between each line of data, which is caused by the lineterminator attribute. The correction method is introduced below.

Class methods (dict methods)

Csv.dictreader (f, fieldNames =None, restkey=None, restVal =None, dialect=’excel’, *args, ** KWDS) F is the file object to open the CSV file. Similarly, the measured file should be opened in text mode (R or RT). Fieldnames (list, etc.) are dictionary keys (column headings). Each key corresponds to a column in the CSV file. If this parameter is not specified, the first line of the file is read as a key by default.

DictWriter(f, fieldnames, Restval =”, extrasAction =’raise’, dialect=’excel’, *args, ** KWDS) Has writeHeader (), writerRow (Rowdict), writerows(Rowdicts) methods. F is a file type object, which can be opened in CSV file (w or WT mode). Fieldnames (list, etc.) are dictionary keys (column headings). Each key corresponds to a column in the CSV file and must be specified.

l = [] with open('test2.csv','rt') as f: cr = csv.DictReader(f) for row in cr: Print (row) l.append(row) # print(row) l.append(row) # print(row) l.append(row) # Value with open('2.csv','wt') as f2: Cw = csv.dictwriter (f2,fieldnames=[' title %d' % I for I in range(1,7)]) cw.writeheader() # write fieldnames to header line # writerow() method for rowdict in l: Cw.writerow (rowdict) # write each dict element (dict) to a line in a CSV file according to the corresponding key/value pair. Each dict is a row, and each value is a dataCopy the code
test2.csv
2.csv

Note: 2. CSV also has a blank line problem.

3. CSV file attribute set

Dialect (also known as dialect) is used to write and read CSV files. It includes:

  • Delimiter (most important). Used to separate data on the same line, usually with a comma (‘,’), but also with TAB (‘\t’).
  • The line feed terminator (also important). Used to distinguish between different rows of data, usually ‘\r\n’ (for operating system reasons, this newline may cause an extra blank line between rows of data when written to CSV, as shown in the previous example), but also ‘\n’.
  • The quotechar. Used to refer to areas that contain special characters (such as delimiters, references, newlines), usually in double quotes (‘”‘).
  • Escapechar. Used to skip the delimiter after an escape (that is, the delimiter is not quoted as a delimiter, but is read as content) when the quoting is set to QUOTE_NONE, and to skip the quotation after an escape (that is, the quotation is not quoted, but is read as content) when the doublequote is set to False. Similar to Python ‘\’, can be set to any character (see the examples below).
  • Quoting mode. Specifies the representation of a special character.
    • 0 stands for Csv.quote_minimal (minimal reference, CSV generated by Excel).
    • 1 means Csv.quote_all (references all, UNIX-generated CSV).
    • 2 means QUOTE_NONNUMERIC (by reference to non-numbers).
    • 3 indicates csv.quote_none (no reference, indicating that special characters are not surrounded by references).
  • The reference character refers to the pattern doubleQuote. Specifies how to refer to the referrer itself.
    • True indicates the double reference form (‘”” “‘).
    • False takes the form of an escape character (such as ‘\”‘) preceded by a reference character. Note: If escapechar is not set at the same time, an error will be reported.
  • Indicates whether strict mode is enabled. If True, an error is reported if the CSV file format is poor. The default is usually False.

The CSV module has built in three attribute sets:

  • Csv. excel: CSV files in Excel format separated by commas
    • delimiter = ‘,’
    • doublequote = True
    • lineterminator = ‘\r\n’
    • quotechar = ‘”‘
    • quoting = 0
    • skipinitialspace = False
    • escapechar = None
  • Csv. excel_tab: CSV files in Tab-separated Excel format
    • Delimiter = ‘\t’ # Inherits from the Csv.excel class except the delimiter
  • Csv. unix_Dialect: CSV in Unix format separated by commas
    • delimiter = ‘,’
    • doublequote = True
    • lineterminator = ‘\n’
    • quotechar = ‘”‘
    • quoting = 1
    • skipinitialspace = False
    • escapechar = None

In addition, users themselves can register a new set of attributes through csv.register_dialect() or, more simply, specify attributes directly through keyword arguments when calling CSV read/write commands.

4. CSV write interval blank line problem (newline example)

Going back to the first and second sections, CSV files were written with empty lines between the data lines. This is because the default Csv.excel attribute set is used and the newline character is ‘\r\n’. Due to operating system reasons, this newline character may cause an extra blank line between the data lines when written to the CSV.

The solution to this problem is simply to specify the newline character as ‘\r’ or ‘\n’ when calling the CSV writer.

l = [] with open('test1.csv','rt') as f: cr = csv.reader(f) for row in cr: List with open('1a.csv','wt') as f2: print(row) l.append(row) # print(row) l.append(row) # Cw = csv.writer(f2, lineterminator = '\n') Cw.writerow (item) # write each element of the list to a CSV file line # or use the writerows() method #cw.writerows(L) # write the contents of the nested list to a CSV file, with each outer element as one line and each inner element as one dataCopy the code
1a.csv

5. Escape character instances

The third section introduces the functions of escape characters, as demonstrated by an example.

l = []
with open('test1.csv','rt') as f: 
   cr = csv.reader(f)
   print(cr.__next__())
Copy the code

[‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’].

l = []
with open('test1.csv','rt') as f: 
   cr = csv.reader(f, escapechar = '3')
   print(cr.__next__())
Copy the code

The above application, sets the characters’ 3 ‘to escape characters, the output: [‘ 1’, ‘2’, ‘, 4 ‘, ‘5’, ‘6’].

As you can see, the escape character ‘3’ is no longer output by itself, and the delimiter ‘,’ immediately following it, is output as content and no longer as a delimiter.