This article focuses on the code in the application.py module of the web.py library. In summary, this module mainly implements WSGI-compatible interfaces so that applications can be invoked by the WSGI application server. WSGI stands for Web Server Gateway Interface. See WSGI’s Wiki page for details

Use of interfaces

Use the HTTP Server that comes with web.py

Here’s an example from the official Hello World document. This code is usually the code for the application entry:

import web urls = ("/.*", "hello") app = web.application(urls, globals()) class hello: def GET(self): return 'Hello, world! ' if __name__ == "__main__": app.run()

The above example describes the most basic elements of a web.py application:

  • URL routing table

  • A web.application instance app

  • Call app. The run ()

The call to app.run() initializes the various WCGI interfaces and starts a built-in HTTP server to interface with them. The code is as follows:

def run(self, *middleware):
    return wsgi.runwsgi(self.wsgifunc(*middleware))

Docking with the WSGI application server

If your application is going to interface with a WSGI application server, such as UWSGI, Gunicorn, etc., then the application entry code should be written differently:

import web class hello: def GET(self): return 'Hello, world! ' urls = ("/.*", "hello") app = web.application(urls, globals()) application = app.wsgifunc()

In this scenario, the application code does not need to start the HTTP server, but instead implements a WSGI-compatible interface for the WSGI server to invoke. The web.py framework implements such an interface for us by calling application = app.wsgifunc(), and the resulting application variable is the WSGI interface (you’ll see after analyzing the code later).

Implementation analysis of the WSGI interface

The analysis revolves around the following two lines of code:

app = web.application(urls, globals())
application = app.wsgifunc()

The web application instantiation

Initializing this instance requires passing two parameters: the URL routing tuple and the result of globals().

You can also pass in a third variable, autoreload, which specifies whether or not you want to automatically reimport a Python module. This is useful for debugging purposes, but we can ignore it when analyzing the main process.

The initialization code for the Application class is as follows:

class application:
    def __init__(self, mapping=(), fvars={}, autoreload=None):
        if autoreload is None:
            autoreload = web.config.get('debug', False)
        self.init_mapping(mapping)
        self.fvars = fvars
        self.processors = []
        
        self.add_processor(loadhook(self._load))
        self.add_processor(unloadhook(self._unload))
        
        if autoreload:
            ...

The code related to the AutoReload function is omitted. The other code mainly does the following things:

  • Self. Init_mapping (mapping) : Initializes the URL routing mapping relationship.

  • Self.add_processor () : Two processors were added.

Initialize the URL route mapping relationship

def init_mapping(self, mapping):
    self.mapping = list(utils.group(mapping, 2))

This function also calls a utility function, which looks like this:

urls = ("/", "Index",
        "/hello/(.*)", "Hello",
        "/world", "World")

If the tuple passed when the user initializes looks like this, then after calling init_mapping:

self.mapping = [["/", "Index"],
                ["/hello/(.*)", "Hello"],
                ["/world", "World"]]
                

This list is then traversed by the framework as the URL is routed.

Add processor

    self.add_processor(loadhook(self._load))
    self.add_processor(unloadhook(self._unload))

These two lines of code add two handlers: self._load and self._unload, and also decorate the two functions. The handler handler is used before and after the HTTP request is processed. It is not used to actually process an HTTP request, but can be used for some extra work. For example, in the official tutorial, it is used to add a session to a child application, using the handler:

def session_hook():
    web.ctx.session = session

app.add_processor(web.loadhook(session_hook))

The definition and use of the processor are more complex, which will be covered later.

Wsgifunc function

The result of WSGiFunc’s execution is to return a WSGI-compatible function that implements functions such as URL routing internally.

def wsgifunc(self, *middleware):
    """Returns a WSGI-compatible function for this application."""
    ...
    for m in middleware: 
        wsgi = m(wsgi)

    return wsgi

Aside from the definition of the internal functions, the definition of WSGIFunc is as simple as that, and if no middleware is implemented, it simply returns its internally defined WSGI functions.

Wsgi function

This function implements WSGI-compatible interfaces, as well as URL routing and other functions.

def wsgi(env, start_resp): # clear threadlocal to avoid inteference of previous requests self._cleanup() self.load(env) try: # allow uppercase methods only if web.ctx.method.upper() ! = web.ctx.method: raise web.nomethod() result = self.handle_with_processors() if is_generator(result): result = peep(result) else: result = [result] except web.HTTPError, e: result = [e.data] result = web.safestr(iter(result)) status, headers = web.ctx.status, web.ctx.headers start_resp(status, headers) def cleanup(): self._cleanup() yield '' # force this function to be a generator return itertools.chain(result, cleanup()) for m in middleware: wsgi = m(wsgi) return wsgi

Let’s take a closer look at this function:

    self._cleanup()
    self.load(env)
    

Self._cleanup() calls utils.threadeddict. Clear_all () internally to clear all Thread Local data and avoid memory leaks (because a lot of the web.py framework’s data is stored in Thread Local variables).

Self. Load (env) initializes the web.ctx variable using the parameters in env. These variables contain the information of the current request that we might use in an application such as web.ctx.fullpath.

try: # allow uppercase methods only if web.ctx.method.upper() ! = web.ctx.method: raise web.nomethod() result = self.handle_with_processors() if is_generator(result): result = peep(result) else: result = [result] except web.HTTPError, e: result = [e.data]

This section mainly calls self.handle_with_processors(), which routes the requested URL to find the appropriate class or subapplication to handle the request, as well as calls the added processor to do some other work (more on processors later). There are three possible ways to return the result of the processing:

  • Returns an iterable object, then the safe iteration processing is performed.

  • Returns other values, and creates a list object to store.

  • If an HttpError exception is thrown (such as when we use Raise Web.ok (” Hello, World “) to return the result), the data in the exception, e.Data, is encapsulated as a list.

    result = web.safestr(iter(result))

    status, headers = web.ctx.status, web.ctx.headers
    start_resp(status, headers)
    
    def cleanup():
        self._cleanup()
        yield '' # force this function to be a generator
                    
    return itertools.chain(result, cleanup())

The following code strings the list result returned earlier to get the body part of the HTTP Response. Then do the following two things according to the WSGI specification:

  • Call the start_resp function.

  • Converts the result to an iterator.

Now you can see that the application = app.wsgifunc() we mentioned earlier assigns the WSGI function to the application variable so that the application server can interface with our application using the WSGI standard.

Handling HTTP requests

The code analyzed earlier has shown how the web.py framework implements the WSGI-compatible interface, meaning that we have seen the flow of HTTP requests arriving at the framework and returning from the framework to the application server. So how does the framework call our application code internally to implement a request processing? This requires a detailed analysis of the processor addition and invocation process that was omitted.

LoadHook and UnloadHook decorators

These two functions are decorator functions of the real handler’s functions (although their use does not take the decorator’s @ operator), and the resulting handler is before the request processing (loadHook) and after the request processing (unloadHook), respectively.

loadhook

def loadhook(h):
    def processor(handler):
        h()
        return handler()
        
    return processor

This function returns the processor function, which will make sure to call your supplied processor function h before calling the subsequent handler function.

unloadhook

def unloadhook(h):
    def processor(handler):
        try:
            result = handler()
            is_generator = result and hasattr(result, 'next')
        except:
            # run the hook even when handler raises some exception
            h()
            raise

        if is_generator:
            return wrap(result)
        else:
            h()
            return result
            
    def wrap(result):
        def next():
            try:
                return result.next()
            except:
                # call the hook at the and of iterator
                h()
                raise

        result = iter(result)
        while True:
            yield next()
            
    return processor

This function also returns a processor, which calls the handler you passed in the arguments before calling the handler function you provided.

Handle_with_processors function

def handle_with_processors(self):
    def process(processors):
        try:
            if processors:
                p, processors = processors[0], processors[1:]
                return p(lambda: process(processors))
            else:
                return self.handle()
        except web.HTTPError:
            raise
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            print >> web.debug, traceback.format_exc()
            raise self.internalerror()
    
    # processors must be applied in the resvere order. (??)
    return process(self.processors)

This function is quite complex, and at its core it is implemented recursively (I have a feeling that it can do the same thing without recursion). In order to illustrate clearly, an example is used to illustrate.

As mentioned earlier, when we initialize our application instance, we add two processors to self.processors:

    self.add_processor(loadhook(self._load))
    self.add_processor(unloadhook(self._unload))

So, here’s what self.processors look like:

Self.processors = [loadHook (self._load), unloadHook (self._unload)] # For further explanation, we abbreviate:  self.processors = [load_processor, unload_processor]

When the framework starts implementing handle_with_processors, it executes each of these processors one by one. Let’s look at code decomposition again, and first simplify the handle_with_processors function:

def handle_with_processors(self): def process(processors): try: if processors: P, processors = processors[0], processors[1:] return p(lambda: process(processors)) # 3 else: Return self.handle() # except web.HttpError: raise... # processors must be applied in the resvere order. (??) Return process(self.processors) # Position 1
  1. The starting point of the function’s execution is position 1, calling its internally defined function Process (processors).

  2. If position 2 determines that the processor list is not empty, it goes inside the if.

  3. At position 3, the handler function to be executed this time is called, taking a lambda function and returning.

  4. If position 2 determines that the list of handlers is empty, we execute self.handle(), which actually calls our application code (described below).

For the example above, there are currently two processors:

self.processors = [load_processor, unload_processor]

After entering the code from position 1, at position 2 it will determine that there is another processor to execute, and it will go to position 3, where the code to execute looks like this:

return load_processor(lambda: process([unload_processor]))

The load_processor function is a loadhook-decorated function, so its definition at execution looks like this:

Def load_processor(lambda: process([unload_processor])): self._load() return process([unload_processor]) # is the parameter of the lambda function

It will execute self._load(), and then proceed to the process function, still going to position 3, where the code to execute looks like this:

return unload_processor(lambda: process([]))

The unload_processor function is a function decorated with unloadhooks, so its definition at execution looks like this:

def unload_processor(lambda: process([])): try: Result = process([]) # is_generator = result and hasattr(result, 'next') except: # run the hook even when handler raises some exception self._unload() raise if is_generator: return wrap(result) else: self._unload() return result

Now we execute the process([]) function and go to position 4 (where we call self.handle()) to get the result of the application, and then we call the handler function self._unload().

To summarize the order of execution:

  • self._load()

    • self.handle()

  • self._unload()

If there are more processors, execute in the same way, adding first for loadHook decorated processors and adding last for unloadHook decorated processors.

Handle function

So that’s all I’ve got to do to get to the point where I’m actually going to call the code that we wrote. After all the load handlers have executed, the self.handle() function is executed, and the application code we wrote is called internally. It might return a hello, a world, something like that. The definition of self.handle is as follows:

def handle(self):
    fn, args = self._match(self.mapping, web.ctx.path)
    return self._delegate(fn, self.fvars, args)

The first line calls self._match to route to the corresponding class or subapplication, and the second line calls self._delegate to the class or pass the request to the subapplication.

_match function

The _match function is defined as follows:

def _match(self, mapping, value): for pat, what in mapping: if isinstance(what, application): If value. StartsWith (pat): f = lambda: self._delegate_sub_application(pat, what) return f, None else: Continue elif isinstance(what, basestring): # position 2 what, result = utils.re_subm('^' + pat + '$', what, value) else: # result = utils.re_compile('^' + pat + '$').match(value) if result: # it's a match return what, [x for x in result.groups()] return None, None

The function parameter mapping is self. Mapping, is the URL routing mapping table; Value is Web.ctx. path, which is the path of this request. This function iterates over self.mapping, processing according to the type of object being processed in the mapping relationship:

  • (1) If the object is an Application instance (i.e., a child of the application), an anonymous function will be returned, which will be handled by calling self._delegate_sub_application.

  • In position 2, if the processing object is a string, call utils.re_subm to process it, replacing the part of value (i.e., web.ctx.path) that matches pat with what (i.e., the processing object string of a URL pattern that we specified). It then returns the result of the replacement and the matching item (which is an instance of Re.MatchObject).

  • Position 3, in other cases, such as directly specifying a class object as the processing object.

If result is not empty, we return the processing object and a list of parameters passed to functions such as GET that we implement.

_delegate function

The result returned from _match is passed as an argument to _delegate:

fn, args = self._match(self.mapping, web.ctx.path)
return self._delegate(fn, self.fvars, args)

Among them:

  • Fn: is the object to handle the current request, typically a class name.

  • Args: is the parameter to be passed to the request processing object.

  • Self. fvars: This is the global namespace in which the application is instantiated and will be used to find the processing object.

The _delegate function is implemented as follows:

def _delegate(self, f, fvars, args=[]): def handle_class(cls): meth = web.ctx.method if meth == 'HEAD' and not hasattr(cls, meth): meth = 'GET' if not hasattr(cls, meth): raise web.nomethod(cls) tocall = getattr(cls(), meth) return tocall(*args) def is_class(o): return isinstance(o, (types.ClassType, type)) if f is None: raise web.notfound() elif isinstance(f, application): return f.handle_with_processors() elif is_class(f): return handle_class(f) elif isinstance(f, basestring): if f.startswith('redirect '): url = f.split(' ', 1)[1] if web.ctx.method == "GET": x = web.ctx.env.get('QUERY_STRING', '') if x: url += '? ' + x raise web.redirect(url) elif '.' in f: mod, cls = f.rsplit('.', 1) mod = __import__(mod, None, None, ['']) cls = getattr(mod, cls) else: cls = fvars[f] return handle_class(cls) elif hasattr(f, '__call__'): return f() else: return web.notfound()

This function does different things depending on the type of the parameter f:

  • If f is empty, 302 Not Found is returned.

  • F is an Application instance and calls handle_with_processors() to the child application.

  • F is a class object, and the internal function handle_class is called.

  • F is a string, then redirect, or call handle_class after getting the name of the class to process the request (the code we write is usually called under this branch).

  • F is a callable object, called directly.

  • Otherwise, 302 Not Found.