The project for this code reading comes from the 500Lines subproject web-Server. 500 Lines or Less is not only a project, but also a book of the same name, with source code and text introduction. The project consists of several separate chapters, each of which is an attempt by a domain leader to introduce the reader to a simple implementation of a feature or requirement in 500 lines of code or less. This paper includes the following parts:
- takeaway
- Project Structure Introduction
- Simple HTTP service
- Echo service
- File service
- File directory services and CGI services
- Service refactoring
- summary
- tip
takeaway
We’ve been digging through the source code for a dozen projects, so it’s time to talk about how to read the source code.
There are many Python projects, and many good ones. Learning the source code of these projects can let us have a deeper understanding of the API, understand the implementation principle and details of the project. Just knowing how to use the project API is not for you and me who are progressive minded. Personally feel reading, do and repeat according to the wheel, are not as good as source reading. The process of learning is from imitation to creation, looking at good source code, imitating it, so as to surpass it.
Choosing the right project also requires certain skills, here is my method:
- Projects that are smaller and start with limited skills and small amounts of code are easier to read. In the initial stage of the project, it is recommended to try to be less than 5000 lines.
- The project runs vertically in a certain direction, gradually breaking through the whole chain. For example, around the different phases of HTTP services, we read Gunicorn, WSGI, HTTP-Server, Bottle, Mako. From services to WSGI specifications, from Web frameworks to template engines.
- Project lines can be compared, such as the CLI section, getopt and Argparse; Such as the difference between Blinker and Flask/Django-signal.
After selecting the project, it’s how to read the source code. Our previous approach to code reading is what I call a generalization. Specifically, according to the main functions of the project, only the core implementation is analyzed. Auxiliary functions and enhanced functions can be ignored for the time being to avoid falling into too many details. Take a simple example: “After studying the table, it is clear that the order of Chinese characters is not fixed. For example, after reading this sentence, you find that the characters here are all confused.” We understand the main functions of the project, so that we can preliminarily achieve the goal.
Ha ha, happy April Fool’s Day
The problem with general pronouncing is that we know the code is implemented this way, but we can’t figure out why. So it’s time to introduce another way to read code: historical comparison. Historical comparison focuses on comparing the requirement changes and release history of the code to learn how the requirements were implemented. In a typical project, a Gitlog type of commit-message is used to present history and requirements. The evolution example is provided directly in this 500Lines-WebServer project, which is a perfect example for demonstrating historical comparison.
The project structure
The version used for this code reading is FBA689D1. The project directory structure is as follows:
directory | describe |
---|---|
00-hello-web | Simple HTTP service |
01-echo-request-info | The REQUESTED HTTP service can be displayed |
02-serve-static | Static file service |
03-handlers | HTTP file service that supports directory presentation |
04-cgi | cgi |
05-refactored | Reconstructing the HTTP Service |
Simple HTTP service
The HTTP service is very simple, starting the service like this:
serverAddress = ('', 8080)
server = BaseHTTPServer.HTTPServer(serverAddress, RequestHandler)
server.serve_forever()
Copy the code
Handler that only responds to GET requests:
class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
...
def do_GET(self):
self.send_response(200)
self.send_header("Content-type", "text/html")
self.send_header("Content-Length", str(len(self.Page)))
self.end_headers()
self.wfile.write(self.Page)
Copy the code
The effect of the service can be matched with the following request example:
# curl -v http://127.0.0.1:8080 * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0) > GET/HTTP/1.1 > Host: 127.0.0.1:8080 > user-agent: curl/7.64.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK < Server: BaseHTTP/0.3 Python/2.7.16 < Date: Wed, 31 Mar 2021 11:57:03 GMT < Content-type: text/html < Content-Length: 49 < <html> <body> <p>Hello, web! </p> </body> </html> * Closing connection 0Copy the code
This article is not intended to go into the details of the IMPLEMENTATION of the HTTP protocol. If you want to learn more about the details of the HTTP protocol, please refer to the second blog post or my previous [Python HTTP source code Reading].
Echo service
The Echo service evolved over a simple HTTP service to echo requests to users. So let’s compare the two files to see what has changed:
The focus of the change is on the do_GET implementation, the picture may not be too clear, I will paste the code below:
# hello
def do_GET(self):
self.send_response(200)
...
self.wfile.write(self.Page)
# echo
def do_GET(self):
page = self.create_page()
self.send_page(page)
Copy the code
Echo do_GET calls create_page and send_page. In just two lines of code, the difference between Echo and Hello is very clear. Because Echo takes the client request and outputs it as is, the fixed page must be sufficient. You need to create the page using the template before sending the page to the user. The implementation of hello’s do_GET method reconstructs the body of send_page, and create_page is very simple:
def create_page(self):
values = {
'date_time' : self.date_time_string(),
'client_host' : self.client_address[0],
'client_port' : self.client_address[1],
'command' : self.command,
'path' : self.path
}
page = self.Page.format(**values)
return page
Copy the code
Looking at echo’s code alone, it feels bland. By comparing the difference between Hello and Echo, you can feel the craftsmanship of the master. The code shows how to write readable code and how to implement new requirements:
- The create-page and send-page function names are legible and readable.
- Create and send are logically equal. As a counterexample, if you change the function names to create_page and _do_GET, you’ll feel awkward.
- The five lines implementing the do_GET function in Hello are exactly the same, except that the new send_page function is reconstituted. From a testing perspective, you only need to add a test case to the changed part (create_page).
Py 01-echo-request-info/server.py can also be used as a comparison tool provided by the IDE.
File service
File services can display local HTML pages of the service:
# Classify and handle request. def do_GET(self): try: # Figure out what exactly is being requested. Full_path = os.getcwd() + self.path os.path.exists(full_path): Raise ServerException("'{0}' not found".format(self.path)) # Raise ServerException("'{0}' not found".format(self.path)) self.handle_file(full_path) ... Exception as MSG: self.handle_error(MSG)Copy the code
Handling of files and exceptions:
def handle_file(self, full_path):
try:
with open(full_path, 'rb') as reader:
content = reader.read()
self.send_content(content)
except IOError as msg:
msg = "'{0}' cannot be read: {1}".format(self.path, msg)
self.handle_error(msg)
def handle_error(self, msg):
content = self.Error_Page.format(path=self.path, msg=msg)
self.send_content(content)
Copy the code
The directory also provides a version of status-code for comparison:
If the file does not exist, a 404 error should be reported according to the HTTP protocol specification:
def handle_error(self, msg):
content = ...
self.send_content(content, 404)
def send_content(self, content, status=200):
self.send_response(status)
...
Copy the code
Taking advantage of python’s default support for function arguments, send_content is stabilized so that it doesn’t have to be modified if there are subsequent 30x/50x errors.
File directory services and CGI services
File services need to be upgraded to support file directories. Normally, if a directory has index.html in it, the file is displayed; Without this file, the directory list is displayed, which is convenient for users to view without manually entering the file name.
I also compared the iterations to the following, which shows the changes to RequestHandler:
Do_GET handles three types of logic: HTML files, directories, and errors. Continuing with if-else would have made the code ugly and hard to extend, so here we extend it using the policy pattern:
Cases = [case_no_file(), case_existing_file(), case_always_fail()] # Classify and handle request. def do_GET(self): try: # Figure out what exactly is being requested.self.full_path = os.getcwd() + self.path # select tactics for case in self.cases: if case.test(self): case.act(self) break # Handle errors. except Exception as msg: self.handle_error(msg)Copy the code
HTML, file nonexistence, and exception 3 policy implementations:
class case_no_file(object):
'''File or directory does not exist.'''
def test(self, handler):
return not os.path.exists(handler.full_path)
def act(self, handler):
raise ServerException("'{0}' not found".format(handler.path))
class case_existing_file(object):
'''File exists.'''
def test(self, handler):
return os.path.isfile(handler.full_path)
def act(self, handler):
handler.handle_file(handler.full_path)
class case_always_fail(object):
'''Base case if nothing else worked.'''
def test(self, handler):
return True
def act(self, handler):
raise ServerException("Unknown object '{0}'".format(handler.path))
Copy the code
Directory implementation is simple, extending the case_directory_index_FILE and case_directory_no_index_file policies; Cgi support also adds a case_cgi_file policy.
class case_directory_index_file(object):
...
class case_directory_no_index_file(object):
...
class case_cgi_file(object):
...
Copy the code
Service refactoring
After implementing the functionality, the authors refactor the code once:
The refactored RequestHandler code is much cleaner, containing only the details of the HTTP protocol. Handle_error handle exceptions, return 404 error; Send_content Generates an HTTP response.
class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
# Classify and handle request.
def do_GET(self):
try:
# Figure out what exactly is being requested.
self.full_path = os.getcwd() + self.path
# Figure out how to handle it.
for case in self.Cases:
if case.test(self):
case.act(self)
break
# Handle errors.
except Exception as msg:
self.handle_error(msg)
# Handle unknown objects.
def handle_error(self, msg):
content = self.Error_Page.format(path=self.path, msg=msg)
self.send_content(content, 404)
# Send actual content.
def send_content(self, content, status=200):
self.send_response(status)
self.send_header("Content-type", "text/html")
self.send_header("Content-Length", str(len(content)))
self.end_headers()
self.wfile.write(content)
Copy the code
The request processing policy has also been reconfigured to build a base_case parent class, specify processing templates and steps, and provide a default way to read HTML files.
class base_case(object):
'''Parent for case handlers.'''
def handle_file(self, handler, full_path):
try:
with open(full_path, 'rb') as reader:
content = reader.read()
handler.send_content(content)
except IOError as msg:
msg = "'{0}' cannot be read: {1}".format(full_path, msg)
handler.handle_error(msg)
def index_path(self, handler):
return os.path.join(handler.full_path, 'index.html')
def test(self, handler):
assert False, 'Not implemented.'
def act(self, handler):
assert False, 'Not implemented.'
Copy the code
The HTML file processing function is very simple, the implementation of the judgment function and the implementation of the function, which is the implementation of the parent or reuse the HTML processing function.
class case_existing_file(base_case):
'''File exists.'''
def test(self, handler):
return os.path.isfile(handler.full_path)
def act(self, handler):
self.handle_file(handler, handler.full_path)
Copy the code
The longest policy is the directory where the index. HTML page does not exist:
class case_directory_no_index_file(base_case):
'''Serve listing for a directory without an index.html page.'''
# How to display a directory listing.
Listing_Page = '''\
<html>
<body>
<ul>
{0}
</ul>
</body>
</html>
'''
def list_dir(self, handler, full_path):
try:
entries = os.listdir(full_path)
bullets = ['<li>{0}</li>'.format(e) for e in entries if not e.startswith('.')]
page = self.Listing_Page.format('\n'.join(bullets))
handler.send_content(page)
except OSError as msg:
msg = "'{0}' cannot be listed: {1}".format(self.path, msg)
handler.handle_error(msg)
def test(self, handler):
return os.path.isdir(handler.full_path) and \
not os.path.isfile(self.index_path(handler))
def act(self, handler):
self.list_dir(handler, handler.full_path)
Copy the code
List_dir Dynamically generates an HTML file with a list of file directories.
summary
We used the historical comparison method together, read the code evolution process of 500Lines-WebServer, clearly understand how to implement a file directory service step by step.
- RequestHandler’s do_GET method handles HTTP requests
- Output response using send_content, including the status code, response header, and body.
- Read HTML files to display HTML pages
- Show directory
- Supports cgi
Along the way, we also learned additional examples of how to extend code, write maintainable code, and refactor code. I hope you learned as much as I did.
tip
As described earlier, requests are processed using the policy pattern. Take a look at the implementation of the policy pattern from the Python-Patterns project:
class Order: def __init__(self, price, discount_strategy=None): self.price = price self.discount_strategy = discount_strategy def price_after_discount(self): if self.discount_strategy: discount = self.discount_strategy(self) else: discount = 0 return self.price - discount def __repr__(self): fmt = "<Price: {}, price after discount: {}>" return fmt.format(self.price, self.price_after_discount()) def ten_percent_discount(order): The return order. Price * 0.10 def on_sale_discount (order) : the return order. Price * 0.25 + 20 def main () : """ >>> Order(100) <Price: 100, price after discount: 100> >>> Order(100, discount_strategy=ten_percent_discount) <Price: 100, price after discount: 90.0> >>> Order(1000, discount_strategy=on_sale_discount) <Price: 1000, Price after discount: 730.0> ""Copy the code
Ten_percent_discount is 10% off, on_SALe_discount is 75% off and 20% off. Different discount modes can be used for different orders, such as the following:
order_amount_list = [80, 100, 1000]
for amount in order_amount_list:
if amount < 100:
Order(amount)
break;
if amount < 1000:
Order(amount, discount_strategy=ten_percent_discount)
break;
Order(amount, discount_strategy=on_sale_discount)
Copy the code
The corresponding business logic is:
- There is no discount for orders less than 100
- There is a 10% discount for orders less than 1000
- Orders over 1000 will be given a 25% discount and a 20% discount
If we implement discount terms and discount methods in a class, it is similar to web-server:
Class case_discount(object): def test(self, handler): Def act(self, handler):Copy the code
Refer to the link
- Github.com/aosabook/50…
- Github.com/HT524/500Li…
- Shuhari. Dev/blog / 2020/0…