In previous episodes, we covered Python strings and codec in detail, which is essentially the basis for file manipulation. In today’s video, we’re going to be talking about file manipulation.

To warm up, let’s look at a simple example of opening a file using the open function:

myfile = open('myfile.txt','w')
myfile = open('myfile.txt','r')
Copy the code

There should be many modes of file reading and writing, such as read only, read and write, etc.

As you can see, when using the built-in open function to operate on a file, the first argument is the file name and the second argument is the processing mode. Typical usage mode parameters are: r to open the file in read-only mode, w to open the file in output mode, A to open the file by appending to the end of the file, and b to the end of the mode string for binary data processing.

The built-in open function creates a Python file object as an interface to file operations.

One thing to keep in mind is that the content of a file is a string. The data read from the file is returned by the function as a string. If the string is not what you want, such as if you really want a floating-point number, you need to convert the string to a floating-point type. When writing data to the file, you must also pass a formatted string to the write method.

That’s the same old story. Here are some real examples:

OK, let’s look at an example of using a file in action: we write two lines of a string (including a newline) to a file and then read it out using several different methods, first writing data:

myfile = open('myfile.txt','w')
myfile.write('hello text file\n')
myfile.write('goodbyt text file\n')
myfile.close()
Copy the code

The readline method is used first, manually reading one line at a time, and finally returning an empty string, indicating that the bottom of the file has been reached

myfile = open('myfile.txt','r')
print(myfile.readline())
print(myfile.readline())
print(myfile.readline())

hello text file
goodbyt text file
Copy the code

Second, you can use the read method to read the entire file at once

myfile = open('myfile.txt','r')
print(myfile.read())

hello text file
goodbyt text file
Copy the code

Finally, a more Python approach can automatically scan files line by line

myfile = open('myfile.txt','r')
for line in myfile:
    print(line, end='')

hello text file
goodbyt text file
Copy the code

This method refers to the concept of a file iterator. The file object myFile created by the open method will automatically read in and return a new row of data at each iteration of the loop. This form is usually easy to write, uses memory well, and runs fast.

We’ll cover the concept of iterators later, but just remember that file iterators are the most convenient way to read data line by line.

Let’s talk about reading and writing binary files. Remember that we must use bytes strings to process binaries. Since we get a bytes object when we read a binary data file, the binary file does not perform any conversion on the data.

As a reminder, you cannot open binary files in text mode because text files are encoded in Unicode. Attempting to decode the contents of binary files in Unicode is obviously pointless and may fail. We talked about this in the last section, but we won’t talk about it here, we’ll just review one example,

myfile = open('data.bin','wb')
myfile.write(b'abcdefg')
myfile.close()

data = open('data.bin', 'rb').read()
print(data)
print(list(data))

b'abcdefg'
[97, 98, 99, 100, 101, 102, 103]
Copy the code

Let’s talk a little bit about file closing and refreshing

File closure. Calling the file close method terminates the link to the external file, that is, closing the file manually. If the file is no longer in use, the memory space of the file object is reclaimed. Although Python also has a feature to automatically close the file, manual closing is the safest method. The context manager for file objects, which we’ll focus on later, can automatically close files.

By default, files are always buffered, which means that written text may not be automatically converted from memory to hard disk immediately. Closing a file, or running flush, forces the cached data to the hard disk immediately.

Instead of string files, we’ll end with a special kind of file storage: object storage

The pickle module is an advanced tool that allows us to store almost any Python object directly in a file. It does not require us to convert strings to and fro. It is a general purpose data formatting and parsing tool, for example, storing a dictionary object and a list object in a file

import pickle
D = {'a': 1, 'b': 2, 'c': 3}
L = [3, 4, 5]
with open('datafile.pkl', 'wb') as file:
    pickle.dump(D, file)
    pickle.dump(L, file)
Copy the code

This makes it easy to store the two objects in the specified file. To access these objects, you simply need to reconstruct them

with open('datafile.pkl', 'rb') as file:
    print(pickle.load(file))
    print(pickle.load(file))

{'b': 2, 'a': 1, 'c': 3}
[3, 4, 5]
Copy the code

The object serialization performed by the Pickle module is essentially the conversion between pickled internal dictionary objects, list objects, and byte strings.

There is also a struct tool that deals with packaged binaries, and just to give you an idea, struct tools can construct and parse packaged binaries. In a sense, it is also a data conversion tool.

Let’s start by looking at how to package the data into binary data and store it in a file. The first argument is a format string. > represents the format of the first byte, a 4-byte integer, a 5-byte string, and a floating-point number.

import struct F = open('data.bin', 'wb') data = struct.pack('>i5sf', 8, b'abcde', F = open('data.bin') F = open('data.bin', 'rb') data = F.read() print(data) values = struct.unpack('>i5sf', data) print(values) b'\x00\x00\x00\x08abcde@\x89\x99\x9a' b'\x00\x00\x00\x08abcde@\x89\x99\x9a' (8, b'abcde', 4.300000190734863)Copy the code

The second part is easy to understand: just read the byte string from the file, extract it in the same format, and Python converts it directly into a normal Python object

However, I would like to say that in general the binary processing mode is used to work with simpler binaries, such as images and audio files, without decompressing their contents. And if you want to store data, use a database instead.

Well, so far today, we’ve looked at the main data types in Python: lists, dictionaries, tuples, and strings. On the basis of basic data types, we further understand the advanced concepts in containers – iteration and list parsing, as well as the difficulties in strings – character encoding and file access. To be good at sorting out review ah ~

This article is from the Python Enthusiast Community, a partner of the Cloud computing community. For more information, you can follow the Python Enthusiast Community.