Read – write-files-Python my blog: Zen Programmer
The most common task you do with Python is reading and writing files. Whether it’s writing simple text files, reading complex server logs, or analyzing raw bytes of data. All of these cases require a file to be read or written.
In this tutorial, you will learn:
- What constitutes a file and why is this important in Python
- The basics of reading and writing files in Python
- Some scenarios for reading and writing files in Python
This tutorial is intended for beginner to intermediate Python developers, but there are some tips that more advanced programmers can benefit from.
What is a file?
Before we dive into how to use files in Python, it’s important to understand what files really are and how modern operating systems handle some aspects of them.
Essentially, a file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complex as a program executable. Finally, these byte files are translated into binary files 1,0 for easier processing by the computer.
Files on most modern file systems are made up of three main parts:
- ** Metadata about the content of the file (filename, size, type, etc.)
- ** Data: ** The content of a file written by the creator or editor
- ** End of file (EOF) : ** indicates the special character at the end of the file
What the data represents depends on the format specification used and is usually represented by an extension. For example, a file with the.gif extension is most likely to comply with the Graphics interchange format specification. There are hundreds, if not thousands, of file extensions. For this tutorial, you will only deal with.txt or.csv file extensions.
The file path
File paths are required to access files on the operating system. The file path is a string representing the location of the file. It is divided into three main parts:
- ** Folder path: ** Folder location on the file system, followed by a forward slash
/
(Unix) or backslash\
(Windows) separation - File name: the actual name of the file
- ** Extension: ** The file path has a pre-set period at the end (
.
) to indicate the file type
This is a simple example. Suppose you have a file in a file structure like this:
/ │ ├ ─ ─ the path / | │ │ ├ ─ ─ to / │ │ └ ─ ─ cats. GIF │ │ │ └ ─ ─ dog_breeds. TXT | └ ─ ─ animals. The CSVCopy the code
Suppose you want to access the cats.gif file and your current location is level with path in the folder. To access the file, you need to browse the path folder, then look at the to folder, and finally reach the cats.gif file. The folder path is path/to/. The file name is cats. The file extension is.gif. So the full path is path/to/cats.gif.
Now assume that your current location or current working directory (CWD) is in the TO folder of our sample folder structure. Instead of referring to the full path of cats.gif, path/to/cats.gif, you can simply refer to the file cats.gif by filename and extension.
/ │ ├ ─ ─ the path / | │ | ├ ─ ─ to/please Your current working directory (CWD) is here | │ └ ─ ─ cats. GIF please Accessing this file | │ | └ ─ ─ dog_breeds. TXT | └ ─ ─ animals. The CSVCopy the code
But how about access to dog_CHYmps.txt? How would you access it if you didn’t use the full path? You can use the special character double dot (..) To move a directory forward. This means that you can use the to directory.. / dog_chymps. TXT Reference the dog_chymps. TXT file.
/ │ ├ ─ ─ the path/please Referencing this parent folder | │ | ├ ─ ─ to/please the Current working directory (CWD) | │ └ ─ ─ cats. GIF | │ | └ ─ ─ dog_breeds. TXT please Accessing this file | └ ─ ─ animals. The CSVCopy the code
Double dot (..) Can be joined together to traverse multiple directories that precede the current directory. For example, to access animals.csv in the to folder, you would use.. /.. / animals. CSV.
Line at the end
A common problem when working with file data is the presentation of new lines or line endings. Line ending originated in the Morse code era, where a specific symbol was used to indicate the end of a transmission or line.
Later, the International Organization for Standardization (ISO) and the American Standards Institute (ASA) standardized teletypes. The ASA standard states that carriage return (sequence CR or \r) and newline (LF or \ N) characters (CR+LF or \r\n) should be used at the end of the line. However, the ISO standard allows CR+LF characters or LF characters only.
Windows uses CR+LF characters to represent new lines, while Unix and newer Mac versions use only LF characters. This can lead to some complications when you work with files from different operating systems. This is a simple example. Suppose we examine the file dog_CHYmps.txt created on a Windows system:
Pug\r\n Jack Russel Terrier\r\n English Springer Spaniel\r\n German Shepherd\r\n Staffordshire Bull Terrier\r\n Cavalier King Charles Spaniel\r\n Golden Retriever\r\n West Highland White Terrier\r\n Boxer\r\n Border Terrier\r\nCopy the code
The same output will be interpreted differently on Unix devices:
Pug\r
\n
Jack Russel Terrier\r
\n
English Springer Spaniel\r
\n
German Shepherd\r
\n
Staffordshire Bull Terrier\r
\n
Cavalier King Charles Spaniel\r
\n
Golden Retriever\r
\n
West Highland White Terrier\r
\n
Boxer\r
\n
Border Terrier\r
\n
Copy the code
This can lead to repeated problems with each line, which you may want to consider.
A character encoding
Another common problem you may face is encoding byte data. Encoding is the conversion from byte data to human-readable characters. This is usually done by specifying the format of the encoding. The two most common encodings are the ASCII and UNICODE formats. ASCII can only store 128 characters, while Unicode can contain up to 1,114,112 characters.
ASCII is actually a subset of Unicode (UTF-8), which means that ASCII and Unicode share the same numeric character values. It is important to note that parsing files with incorrect character encodings can result in character conversion failures and errors. For example, if the file is created using UTF-8 encoding and you try to parse it using ASCII encoding, an error will be raised if there are more characters than these 128 values.
Open and close files in Python
When you want to use a file, the first thing to do is open it. This is done by calling the open() built-in function. Open () has one required argument, which is the path to the file. Open () returns a file object for this file:
file = open('dog_breeds.txt')
Copy the code
Once you’ve opened the file, the next thing you need to learn is how to close it.
** Warning: ** You should always ensure that open files are closed correctly.
It is important to remember that closing the file is your responsibility. In most cases, the file will eventually be closed when the application or script terminates. But there is no guarantee of what will actually happen. This can lead to unnecessary behavior, including resource leakage. This is also a best practice in Python to ensure that your code runs in a well-defined way and reduces any unwanted behavior.
When you manipulate files, there are two ways to make sure they close properly, even if you encounter errors. The first way to close a file is to use a try-finally block:
reader = open('dog_breeds.txt')
try:
# Further file processing goes here
finally:
reader.close()
Copy the code
If you are not familiar with the contents of try-finally blocks, see Python Exceptions: An Introduction.
The second way to close a file is to use the following with statement:
with open('dog_breeds.txt') as reader:
# Further file processing goes here
Copy the code
With the with statement, the system will automatically close the file as soon as it leaves the with block or even in error. I strongly recommend that you use the with statement whenever possible, because the code is clearer and makes it easier to handle any unexpected errors.
Most likely, you will also want to use the second positional argument mode. This parameter is a string containing multiple characters to indicate how you want to open the file. The default and most common is ‘r’, which means to open the file as a text file in read-only mode:
with open('dog_breeds.txt'.'r') as reader:
# Further file processing goes here
Copy the code
See the online documentation for other patterns, but the most common patterns are as follows:
model | meaning |
---|---|
‘r’ | Open in read-only mode (default) |
“W” | If write mode is on, the file is overwritten |
'rb' 或 'wb' |
Open in binary mode (read/write with byte data) |
Let’s go back and talk about file objects. The file object is:
“Objects that expose file-oriented apis (using methods such as read()or write()) to the underlying resource.” (source)
There are three different types of file objects:
- Text file
- Buffered binary files
- Raw binary
Each of these file types is defined in the IO module. These three types are briefly described here.
Text file
Text files are the most common files you will encounter. Here are some examples of how to open these files:
open('abc.txt')
open('abc.txt'.'r')
open('abc.txt'.'w')
Copy the code
For files of this type, open() returns a TextIOWrapper file object:
>>> file = open('dog_breeds.txt')
>>> type(file)
<class '_io.TextIOWrapper'>
Copy the code
This is the file object returned by open() by default.
Buffer binary file type
The buffer binary type is used to read and write binaries. Here are some examples of how to open these files:
open('abc.txt'.'rb')
open('abc.txt'.'wb')
Copy the code
For files of this type, open() returns a BufferedReader or BufferedWriter file object:
>>> file = open ('dog_breeds.txt' , 'rb' )
>>> type(file)
<class'_io.BufferedReader'>
>>> file = open ('dog_breeds.txt' , 'wb' )
> >> type(file)
<class'_io.BufferedWriter'>
Copy the code
Original file type
The original file type is:
“Often used as low-level building blocks for binary and text streams.” (source)
Therefore, it is usually not used.
Here is an example of how to open these files:
open('abc.txt'.'rb', buffering=0)
Copy the code
For files of this type, open() returns a FileIO file object:
>>> file = open('dog_breeds.txt'.'rb', buffering=0)
>>> type(file)
<class '_io.FileIO'>
Copy the code
Read and write to open files
After opening the file, you will need to read or write the file. First, let’s read a document. Multiple methods can be called on file objects:
methods | describe |
---|---|
.read(size=-1) |
This will be based onsize The number of bytes read from the file. If no parameter orNone or- 1 Then the entire file is read. |
.readline(size=-1) |
This will read the most from that linesize Number of characters. Until you get to the end of the line, and then the next line. If no arguments are passed orNone or- 1 , the whole line (or the rest of the line) is read out. |
Using the dog_CHYmps.txt file used above, let’s look at some examples of how to use these methods. Here is an example of how to open and read an entire file using the.read() command:
>>> with open('dog_breeds.txt'.'r') as reader:
>>> # Read & print the entire file
>>> print(reader.read())
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Copy the code
Here is an example of how to use.readline() to read 5 bytes at a time in a line:
>>> with open('dog_breeds.txt'.'r') as reader:
>>> # Read & print the first 5 characters of the line 5 times
>>> print(reader.readline(5))
>>> # Notice that line is greater than the 5 chars and continues
>>> # down the line, reading 5 chars each time until the end of the
>>> # line and then "wraps" around
>>> print(reader.readline(5))
>>> print(reader.readline(5))
>>> print(reader.readline(5))
>>> print(reader.readline(5))
Pug
Jack
Russe
l Ter
rier
Copy the code
The first call to reader.readline(5) actually prints Pug\r\n, so you can see that there is a newline output
Here is an example of reading the entire file as a list using.readlines() :
>>> f = open('dog_breeds.txt')
>>> f.readlines() # Returns a list object
['Pug\n', 'Jack Russel Terrier\n', 'English Springer Spaniel\n', 'German Shepherd\n', 'Staffordshire Bull Terrier\n', 'Cavalier King Charles Spaniel\n', 'Golden Retriever\n', 'West Highland White Terrier\n', 'Boxer\n', 'Border Terrier\n']
Copy the code
The above example can also be done by creating a list from a file object using list() :
>>> f = open('dog_breeds.txt')
>>> list(f)
['Pug\n'.'Jack Russel Terrier\n'.'English Springer Spaniel\n'.'German Shepherd\n'.'Staffordshire Bull Terrier\n'.'Cavalier King Charles Spaniel\n'.'Golden Retriever\n'.'West Highland White Terrier\n'.'Boxer\n'.'Border Terrier\n']
Copy the code
Iterate over each line in the file
A common thing to do when reading a file is to iterate over each line. Here is an example of how to perform this iteration using.readline() :
>>> with open('dog_breeds.txt'.'r') as reader:
>>> # Read and print the entire file line by line
>>> line = reader.readline()
>>> whileline ! =' ': # The EOF char is an empty string
>>> print(line, end=' ')
>>> line = reader.readline()
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Copy the code
Another way to iterate over each line in a file is to use the.readlines() file object. Remember that.readlines() returns a list where each element in the list represents a line in the file:
>>> with open('dog_breeds.txt'.'r') as reader:
>>> for line in reader.readlines():
>>> print(line, end=' ')
Pug
Jack Russell Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Copy the code
However, the above example can be further simplified by iterating over the file object itself:
>>> with open('dog_breeds.txt'.'r') as reader:
>>> # Read and print the entire file line by line
>>> for line in reader:
>>> print(line, end=' ')
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Copy the code
The final method is more Pythonic and can be faster and more efficient. Therefore, it is recommended that you use it instead.
** Note: ** Some of the examples above include print(‘some text’, end= “). This end= “is to prevent Python from adding extra newlines to the text being printed and printing only what is read from the file.
Now let’s dig into the file. As with reading files, file objects can be used to write files in several ways:
methods | describe |
---|---|
.write(string) | Writes a string to a file. |
.writelines(seq) | Writes the sequence to a file. No terminator is appended to each sequence item. It’s up to you to add the appropriate sign-off. |
Here is a simple example using.write() and.writelines() :
with open('dog_breeds.txt'.'r') as reader:
# Note: readlines doesn't trim the line endings
dog_breeds = reader.readlines()
with open('dog_breeds_reversed.txt'.'w') as writer:
# Alternatively you could use
# writer.writelines(reversed(dog_breeds))
# Write the dog breeds to the file in reversed order
for breed in reversed(dog_breeds):
writer.write(breed)
Copy the code
Using byte
Sometimes, you may need to process files using byte strings. You can do this by adding a ‘b’ character to the mode parameter. All the same methods apply to file objects. However, each method expects and returns a bytes object:
>>> with open(`dog_breeds.txt`, 'rb') as reader:
>>> print(reader.readline())
b'Pug\n'
Copy the code
Using the B flag to open text files is not that interesting. Suppose we have a lovely picture of Jack Russell Terrier (jack_russell.png) :
You can open the file in Python and examine its contents! As defined by the.png file format, the file title is 8 bytes, as shown below:
value | describe |
---|---|
0x89 | A “magic” number, indicating that this is aPNG The beginning of the |
0x50 0x4E 0x47 | PNG ASCII |
0x0D 0x0A | DOS style line ends\r\n |
0x1A | Dos-style EOF characters |
0x0A | A UNIX-style line end\n |
When you open the file and read the bytes individually, you can see that this is indeed a.png header file:
>>> with open('jack_russell.png'.'rb') as byte_reader:
>>> print(byte_reader.read(1))
>>> print(byte_reader.read(3))
>>> print(byte_reader.read(2))
>>> print(byte_reader.read(1))
>>> print(byte_reader.read(1))
b'\x89'
b'PNG'
b'\r\n'
b'\x1a'
b'\n'
Copy the code
A complete example:dos2unix.py
Let’s wrap things up and look at a complete example of how to read and write files. Here is a Dos2UNIx-like tool that converts it to a file, changing its line ending \r\n to \n.
The tool is divided into three main parts. The first is that str2unix() converts the string from the end of line \\r\\n to line \\n. The second is that dos2UNIX () converts a string containing \r\n characters to \n. Dos2unix () calls str2UNIX (). Finally, there is a __main__ block, which is called only when the file is executed as a script.
"""
A simple script and library to convert files or strings from dos like
line endings with Unix like line endings.
"""
import argparse
import os
def str2unix(input_str: str) -> str:
r"""\ Converts the string from \r\n line endings to \n Parameters ---------- input_str The string whose line endings will be converted Returns ------- The converted string """
r_str = input_str.replace('\r\n'.'\n')
return r_str
def dos2unix(source_file: str, dest_file: str):
"""\ Coverts a file that contains Dos like line endings into Unix like Parameters ---------- source_file The path to the source file to be converted dest_file The path to the converted file for output """
# NOTE: Could add file existence checking and file overwriting
# protection
with open(source_file, 'r') as reader:
dos_content = reader.read()
unix_content = str2unix(dos_content)
with open(dest_file, 'w') as writer:
writer.write(unix_content)
if __name__ == "__main__":
# Create our Argument parser and set its description
parser = argparse.ArgumentParser(
description="Script that converts a DOS like file to an Unix like file".)# Add the arguments:
# - source_file: the source file we want to convert
# - dest_file: the destination where the output should go
# Note: the use of the argument type of argparse.FileType could
# streamline some things
parser.add_argument(
'source_file',
help='The location of the source '
)
parser.add_argument(
'--dest_file',
help='Location of dest file (default: source_file appended with `_unix`',
default=None
)
# Parse the args (argparse automatically grabs the values from
# sys.argv)
args = parser.parse_args()
s_file = args.source_file
d_file = args.dest_file
# If the destination file wasn't passed, then assume we want to
# create a new file based on the old one
if d_file is None:
file_path, file_extension = os.path.splitext(s_file)
d_file = f'{file_path}_unix{file_extension}'
dos2unix(s_file, d_file)
Copy the code
Tips and tricks
Now that you’ve mastered the basics of reading and writing files, here are some tips and tricks to help you improve your skills.
__file__
The __file__ attribute is a special attribute of the module, similar to __name__. It is:
“If loaded from a file, it is the pathname of the file that loaded the module,” (source)
Note: __file__ returns the path relative to the original Python script that was called. If you need the full system path, you can use os.getcwd() to get the current working directory of the executing code.
This is a real example. In my past job, I did a lot of testing on hardware devices. Each test is written using a Python script with the test script file name as the title. These scripts are then executed and their state printed using the __file__ special attribute. Here is an example folder structure:
Project / | ├ ─ ─ tests / | ├ ─ ─ test_commanding. Py | ├ ─ ─ test_power. Py | ├ ─ ─ test_wireHousing. Py | └ ─ ─ test_leds. Py | └ ─ ─ main.pyCopy the code
Running main.py produces the following:
>>> python main.py
tests/test_commanding.py Started:
tests/test_commanding.py Passed!
tests/test_power.py Started:
tests/test_power.py Passed!
tests/test_wireHousing.py Started:
tests/test_wireHousing.py Failed!
tests/test_leds.py Started:
tests/test_leds.py Passed!
Copy the code
Append file contents
Sometimes, you may want to append to a file or start writing at the end of an existing file. This can be done by appending the ‘a’ character to the mode argument:
with open('dog_breeds.txt'.'a') as a_writer:
a_writer.write('\nBeagle')
Copy the code
When you check dog_CHYmps.txt again, you will see that the start of the file has not changed and that Beagle has now been added to the end of the file:
>>> with open('dog_breeds.txt'.'r') as reader:
>>> print(reader.read())
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Beagle
Copy the code
Use two files at once
Sometimes you might want to read a file and write to another file at the same time. If you use the example shown when you learned how to write to a file, it can actually be incorporated into the following:
d_path = 'dog_breeds.txt'
d_r_path = 'dog_breeds_reversed.txt'
with open(d_path, 'r') as reader, open(d_r_path, 'w') as writer:
dog_breeds = reader.readlines()
writer.writelines(reversed(dog_breeds))
Copy the code
Create your own context manager
Sometimes, you may want to have better control over file objects by putting them in custom classes. When you do this, you can no longer use the with statement unless you add some magic methods: By adding __enter__ and __exit__, you create what is called a context manager.
Called when __enter__() calls the with statement. Called when __exit__() exits from the with block.
Here is a template that can be used to create custom classes:
class my_file_reader(a):
def __init__(self, file_path):
self.__path = file_path
self.__file_object = None
def __enter__(self):
self.__file_object = open(self.__path)
return self
def __exit__(self, type, val, tb):
self.__file_object.close()
# Additional methods implemented below
Copy the code
Now that you have a custom class with a context manager, you can use it as you would with the built-in open() :
with my_file_reader('dog_breeds.txt') as reader:
# Perform custom class operations
pass
Copy the code
This is a good example. Remember when we had the cute Jack Russell character? Maybe you want to open other.png files, but don’t want to parse the header file every time. Here’s an example of how to do that. This example also uses custom iterators. If you’re not familiar with them, check out the Python iterator:
class PngReader(a):
# Every .png file contains this in the header. Use it to verify
# the file is indeed a .png.
_expected_magic = b'\x89PNG\r\n\x1a\n'
def __init__(self, file_path):
# Ensure the file has the right extension
if not file_path.endswith('.png') :raise NameError("File must be a '.png' extension")
self.__path = file_path
self.__file_object = None
def __enter__(self):
self.__file_object = open(self.__path, 'rb')
magic = self.__file_object.read(8)
ifmagic ! = self._expected_magic:raise TypeError("The File is not a properly formatted .png file!")
return self
def __exit__(self, type, val, tb):
self.__file_object.close()
def __iter__(self):
# This and __next__() are used to create a custom iterator
# See https://dbader.org/blog/python-iterators
return self
def __next__(self):
# Read the file in "Chunks"
# See https://en.wikipedia.org/wiki/Portable_Network_Graphics#%22Chunks%22_within_the_file
initial_data = self.__file_object.read(4)
# The file hasn't been opened or reached EOF. This means we
# can't go any further so stop the iteration by raising the
# StopIteration.
if self.__file_object is None or initial_data == b'':
raise StopIteration
else:
# Each chunk has a len, type, data (based on len) and crc
# Grab these values and return them as a tuple
chunk_len = int.from_bytes(initial_data, byteorder='big')
chunk_type = self.__file_object.read(4)
chunk_data = self.__file_object.read(chunk_len)
chunk_crc = self.__file_object.read(4)
return chunk_len, chunk_type, chunk_data, chunk_crc
Copy the code
You can now open.png files and parse them properly using the custom context manager:
>>> with PngReader('jack_russell.png') as reader:
>>> for l, t, d, c in reader:
>>> print(f"{l:05}, {t}, {c}")
00013, b'IHDR', b'v\x121k'
00001, b'sRGB', b'\xae\xce\x1c\xe9'
00009, b'pHYs', b'(<]\x19'
00345, b'iTXt', b"L\xc2'Y"
16384, b'IDAT', b'i\x99\x0c('
16384, b'IDAT', b'\xb3\xfa\x9a$'
16384, b'IDAT', b'\xff\xbf\xd1\n'
16384, b'IDAT', b'\xc3\x9c\xb1}'
16384, b'IDAT', b'\xe3\x02\xba\x91'
16384, b'IDAT', b'\xa0\xa99='
16384, b'IDAT', b'\xf4\x8b.\x92'
16384, b'IDAT', b'\x17i\xfc\xde'
16384, b'IDAT', b'\x8fb\x0e\xe4'
16384, b'IDAT', b')3={'
01040, b'IDAT', b'\xd6\xb8\xc1\x9f'
00000, b'IEND', b'\xaeB`\x82'
Copy the code
Don’t duplicate the wheel
You may encounter common situations when working with files. Most cases can be handled using other modules. Two common file types that you might want to use are.csv and.json. Real Python has put together some great articles on how to deal with this:
- Read and write CSV files in Python
- Use JSON data in Python
There are also built-in libraries that you can use to help you:
- Wave: Read and write WAV files (audio)
- Aifc: Read and write AIFF and AIFC files (audio)
- Sunau: Reads and writes Sun AU files
- Tarfile: Reads and writes to the tar archive
- Zipfile: Use ZIP archive
- Configparser: Easily create and parse configuration files
- Xml.etree.elementtree: Creates or reads xmL-based files
- Msilib: Reads and writes the Microsoft Installer file
- plistlib: Generates and parses Mac OS X
.plist
file
There’s more. In addition, PyPI has more third-party tools available. Some popular ones are the following:
- PyPDF2: PDF tool package
- Xlwings: Reads and writes Excel files
- Pillow: Image reading and manipulation
conclusion
You now know how to use Python to process files, including some advanced techniques. Using files in Python is now easier than ever, which is a helpful feeling when you start doing so.
In this tutorial, you have learned:
- What is a file
- How to open and close a file correctly
- How to read and write files
- Some advanced techniques for working with files
- Some libraries use common file types
Pay close attention to the public number < code and art >, learn more foreign high-quality technical articles.