Werkzeug is a comprehensive WSGI Web application library. It started as a simple collection of various WSGI utility tools and has become one of the most advanced WSGI utility libraries, the project behind Flask. Werkzeug is a German word meaning tool. This word is a little hard for me to pronounce (and probably one of the reasons it’s not so well known), but it just so happens that the official logo is a hammer, so I’ll just call it “The German Hammer” for short. This article is divided into two parts. The first part introduces the implementation of 1) Serving && WSgi 2) Request && Response 3) local, and the second part is also divided into three parts:

  • middleware
  • routing && urls
  • datastructures

middleware

The following six examples are provided in Middleware:

The name of the function
shared_data Static files
http_proxy Proxy for HTTP connections
profiler Performance testing
proxy_fix X-Forwarded-For
dispatcher More app support
lint WSGI Protocol Linter

SharedDataMiddleware

SharedDataMiddleware can support static files and directories such as CSS and image. The common methods are as follows:

app = SharedDataMiddleware(app, {
    '/static': os.path.join(os.path.dirname(__file__), 'static')
})
Copy the code

As you can guess from the example, SharedDataMiddleware automatically turns HTTP paths into files for reading, which is basically what http.server does. And SharedDataMiddleware is a class decorator that passes in app and returns app. Class decorators are basically init and Call methods.

class SharedDataMiddleware:
    def __init__(
        self,
        app: "WSGIApplication",
        exports: t.Union[
            t.Dict[str, t.Union[str, t.Tuple[str.str]]],
            t.Iterable[t.Tuple[str, t.Union[str, t.Tuple[str.str]]]],
        ],
        disallow: None = None,
        cache: bool = True,
        cache_timeout: int = 60 * 60 * 12,
        fallback_mimetype: str = "application/octet-stream".) - >None:
        self.app = app
        self.exports: t.List[t.Tuple[str, _TLoader]] = []
        self.cache = cache
        self.cache_timeout = cache_timeout

        if isinstance(exports, dict):
            exports = exports.items()

        for key, value in exports:
            ...
            if isinstance(value, str) :if os.path.isfile(value):
                    loader = self.get_file_loader(value)
                else:
                    loader = self.get_directory_loader(value)
            ...
            self.exports.append((key, loader))
        ...
Copy the code

The SharedDataMiddleware constructor takes both app and exports arguments. Exports can be a dictionary or an iterable that generates a file loader for the file path in export. Note that the file is not loaded immediately, but only when there is a real call.

The call method is responsible for responding to requests:

def __call__(
        self, environ: "WSGIEnvironment", start_response: "StartResponse"
    ) -> t.Iterable[bytes]:
        path = get_path_info(environ)
        file_loader = None

        for search_path, loader in self.exports:
            if search_path == path:
                real_filename, file_loader = loader(None)

                if file_loader is not None:
                    break
                ...
       
        guessed_type = mimetypes.guess_type(real_filename)  # type: ignore
        mime_type = get_content_type(guessed_type[0] or self.fallback_mimetype, "utf-8")
        f, mtime, file_size = file_loader()

        headers = [("Date", http_date())]

        if self.cache:
            timeout = self.cache_timeout
            etag = self.generate_etag(mtime, file_size, real_filename)  # type: ignore
            headers += [
                ("Etag", f'"{etag}"'),
                ("Cache-Control", f"max-age={timeout}, public"),
            ]

            if not is_resource_modified(environ, etag, last_modified=mtime):
                f.close()
                start_response("304 Not Modified", headers)
                return []

            headers.append(("Expires", http_date(time() + timeout)))
        else:
            headers.append(("Cache-Control", "public"))

        headers.extend(
            (
                ("Content-Type", mime_type),
                ("Content-Length", str(file_size)),
                ("Last-Modified", http_date(mtime)),
            )
        )
        start_response("200 OK", headers)
        return wrap_file(environ, f)
Copy the code
  • Load files according to the path of request(wsgi.environ)
  • Generate HTTP headers for the file, including Date, Content-Type, last-modified…
  • Returns the wrapper for the file directly

Here are two small details:

  1. After the correct file is matched, it is returned directly without app processing
  2. The browser’s local cache is supported by default, controlled through HTTP headers such as Etag, cache-control, and Expires.

ProxyMiddleware

ProxyMiddleware is used as follows:

App = ProxyMiddleware (app, {"/static/" : {" target ":" http://127.0.0.1:5001/ ",}}Copy the code

Proxy the /static/ URL to the http://127.0.0.1:5001 service from the way it is used. The main implementation process of HTTP proxy is as follows:

from http import client con = client.HTTPConnection( host, target.port or 80, timeout=self.timeout ) con.connect() remote_url = url_quote(remote_path) querystring = environ["QUERY_STRING"] if querystring: remote_url = f"{remote_url}? {querystring}" con.putrequest(environ["REQUEST_METHOD"], remote_url, skip_host=True) for k, v in headers: con.putheader(k, v) con.endheaders() stream = get_input_stream(environ) while True: data = stream.read(self.chunk_size) if not data: break if chunked: con.send(b"%x\r\n%s\r\n" % (len(data), data)) else: con.send(data) resp = con.getresponse() start_response( f"{resp.status} {resp.reason}", [ (k.title(), v) for k, v in resp.getheaders() if not is_hop_by_hop_header(k) ], ) def read() -> t.Iterator[bytes]: while True: try: data = resp.read(self.chunk_size) except OSError: break if not data: break yield data return read()Copy the code
  • The proxy creates HTTP connections for remote services
  • The proxy sends the HTTP header to the remote service
  • Read the body part of the client request and forward it to the remote service
  • The proxy gets the response from the remote service
  • Gets the REMOTE service HTTP status code and response header information, and returns the response requested by the client
  • The body reading method that wraps the remote service is returned to the caller

Learn ProxyMiddleware to implement a simple HTTP proxy service, and you’ll understand the logic of the web.

ProfilerMiddleware

ProfilerMiddleware shows you how to test your code for performance. We use the profile.runcall method, which returns no value, so we use a temporary list of response_body and catching_start_response to relay.

def __call__(
    self, environ: "WSGIEnvironment", start_response: "StartResponse"
) -> t.Iterable[bytes]:
    
    response_body: t.List[bytes] = []

    def catching_start_response(status, headers, exc_info=None):  # type: ignore
        start_response(status, headers, exc_info)
        return response_body.append

    def runapp() -> None:
        app_iter = self._app(
            environ, t.cast("StartResponse", catching_start_response)
        )
        response_body.extend(app_iter)

    profile = Profile()
    start = time.time()
    profile.runcall(runapp)
    body = b"".join(response_body)
    elapsed = time.time() - start
    ...
    return [body]
Copy the code

Without going into the details of the other Middleware models, let’s take a closer look at the Middleware model: the Onion

HTTP requests are like peeling an onion, arriving layer by layer at the core of the application and then wrapping the response back layer by layer. The following decorator call makes sense:

@cache@count_calls def Fibonacci (num): if num < 2: Return num return Fibonacci (num -1) + Fibonacci (num -2)Copy the code

The target function is called layer by layer by decorator, and each decorator layer can process reqUST and Response once each.


routing

Routring is a very important module. Here is an example of routing:

from werkzeug.routing import Map, Rule, NotFound, RequestRedirect

url_map = Map([
    Rule('/', endpoint='blog/index'),
    Rule('/<int:year>/', endpoint='blog/archive'),
    Rule('/<int:year>/<int:month>/', endpoint='blog/archive'),
    Rule('/<int:year>/<int:month>/<int:day>/', endpoint='blog/archive'),
    Rule('/<int:year>/<int:month>/<int:day>/<slug>', endpoint='blog/show_post'),
    Rule('/about', endpoint='blog/about_me'),
    Rule('/feeds/', endpoint='blog/feeds'),
    Rule('/feeds/<feed_name>.rss', endpoint='blog/show_feed')
])
...
def application(environ, start_response):
    urls = url_map.bind_to_environ(environ)
    try:
        endpoint, args = urls.match()
    except HTTPException, e:
        return e(environ, start_response)
    response =  =getattr(self, f"on_{endpoint}")(request, **args)
    return response(environ, start_response)
Copy the code
  • All routing rules for an application are managed using a Map object whose main argument is an array of rules.
  • Rule includes url rules and endpoint endpoint.
  • Each HTTP request uses the Map object’s bind_to_environ to get a set of urls(MapAdapter objects).
  • Use the match method of urls to match the parameters defined in the URL of the endpoint and rule, e.g. /int:year/int:month/ yields a tuple of (year, month) parameters.
  • Use the endpoint endpoint to find the corresponding Handerl function (front-control mode).
  • .

The constructor of the Rule object, as in the example, mainly takes the string definition of the Rule and the endpoint endpoint of the listener function:

class Rule(RuleFactory):
    
    def __init__(
        self,
        string: str,
        defaults: t.Optional[t.Mapping[str, t.Any]] = None,
        subdomain: t.Optional[str] = None,
        methods: t.Optional[t.Iterable[str]] = None,
        build_only: bool = False,
        endpoint: t.Optional[str] = None,
        strict_slashes: t.Optional[bool] = None,
        merge_slashes: t.Optional[bool] = None,
        redirect_to: t.Optional[t.Union[str, t.Callable[..., str]]] = None,
        alias: bool = False,
        host: t.Optional[str] = None,
        websocket: bool = False,
    ) -> None:
        self.rule = string
        ...
        self.endpoint: str = endpoint  # type: ignore
        ...
        self.arguments = set()
        ...
Copy the code

Moving on to the Map object constructor:

class Map:
    def __init__(
        self,
        rules: t.Optional[t.Iterable[RuleFactory]] = None,
        default_subdomain: str = "",
        charset: str = "utf-8",
        strict_slashes: bool = True,
        merge_slashes: bool = True,
        redirect_defaults: bool = True,
        converters: t.Optional[t.Mapping[str, t.Type[BaseConverter]]] = None,
        sort_parameters: bool = False,
        sort_key: t.Optional[t.Callable[[t.Any], t.Any]] = None,
        encoding_errors: str = "replace",
        host_matching: bool = False,
    ) -> None:
        self._rules: t.List[Rule] = []
        ...
        self.converters = self.default_converters.copy()
        ...
        for rulefactory in rules or ():
            self.add(rulefactory)
Copy the code

The highlight is the add method on the Map object:

def add(self, rulefactory: RuleFactory) -> None:
    """Add a new rule or factory to the map and bind it.  Requires that the
    rule is not bound to another map.

    :param rulefactory: a :class:`Rule` or :class:`RuleFactory`
    """
    for rule in rulefactory.get_rules(self):
        rule.bind(self)
        self._rules.append(rule)
        self._rules_by_endpoint.setdefault(rule.endpoint, []).append(rule)
    self._remap = True
Copy the code

/

/

/

/


A MapAdapter object is generated by parsing path, method, and query_string from environ.

def bind_to_environ(
    self,
    environ: "WSGIEnvironment",
    server_name: t.Optional[str] = None,
    subdomain: t.Optional[str] = None,
) -> "MapAdapter":
    ...
    path_info = _get_wsgi_string("PATH_INFO")
    query_args = _get_wsgi_string("QUERY_STRING")
    default_method = environ["REQUEST_METHOD"]
    server_name = server_name.lower()
    try:
        server_name = _encode_idna(server_name)  # type: ignore
    except UnicodeError:
        raise BadHost()
    return MapAdapter(
        self,
        server_name,
        script_name,
        subdomain,
        url_scheme,
        path_info,
        default_method,
        query_args,
    )
Copy the code

Then call the match method of the MapAdapter object:

def match(
    self,
    path_info: t.Optional[str] = None,
    method: t.Optional[str] = None,
    return_rule: bool = False,
    query_args: t.Optional[t.Union[t.Mapping[str, t.Any], str]] = None,
    websocket: t.Optional[bool] = None,
    ) -> t.Tuple[t.Union[str, Rule], t.Mapping[str, t.Any]]:
    ...
    for rule in self.map._rules:
        try:
            rv = rule.match(path, method)
        except RequestPath as e:
            raise RequestRedirect(
                self.make_redirect_url(
                    url_quote(e.path_info, self.map.charset, safe="/:|+"),
                    query_args,
                )
            )
        except RequestAliasRedirect as e:
            raise RequestRedirect(
                self.make_alias_redirect_url(
                    path, rule.endpoint, e.matched_values, method, query_args
                )
            )
        if rv is None:
            continue
       ...
    return rule.endpoint, rv
Copy the code

The match process is simple: loop through all the rules and use the rule math method to determine if they match path and method:

def match(
    self, path: str, method: t.Optional[str] = None
) -> t.Optional[t.MutableMapping[str, t.Any]]:
    m = self._regex.search(path)
    if m is not None:
        groups = m.groupdict()
        ...
        result = {}
        for name, value in groups.items():
            try:
                value = self._converters[name].to_python(value)
            except ValidationError:
                return None
            result[str(name)] = value
        return result
Copy the code
  • Check whether path is matched using the regular expression
  • The matched rule parses query_string into the rule argument, which is handled by Converter because urls are filled with strings that need to be converted to a specific type, such as int.

The types of Converter are as follows:

type The name of the
default UnicodeConverter
string UnicodeConverter
any AnyConverter
path PathConverter
int IntegerConverter
float FloatConverter
uuid UUIDConverter

A brief introduction to NumberConverter, mainly its to_Python method, determines whether the limit is met and then forcibly converts to int:

class NumberConverter(BaseConverter): regex = r"\d+" num_convert: t.Callable = int def to_python(self, value: str) -> t.Any: if self.fixed_digits and len(value) ! = self.fixed_digits: raise ValidationError() value = self.num_convert(value) if (self.min is not None and value < self.min) or ( self.max is not None and value > self.max ): raise ValidationError() return value ...Copy the code

Short_id =1001 = /1001 = / / / / / / / / / / / / /

# /1001
# Rule("/<short_id>", endpoint="follow_short_link"),
def on_follow_short_link(self, request, short_id):
    link_target = self.redis.get(f"url-target:{short_id}")
    if link_target is None:
        raise NotFound()
    self.redis.incr(f"click-count:{short_id}")
    return redirect(link_target)
Copy the code

There is an alternative to HTTP routing that uses prefix trees, which is more efficient than the one-pass algorithm of complexity N used here, and will be covered later in the GIN framework.


datastructures

There are a lot of datastructures in datastructures. I simply sorted out the following classes, and the rest of the classes are based on the following classes and combined:

Datastructures are mainly used to process the data parsed by the request, such as Header and Accept, which are immutable data, so as to ensure that the data used by business will not be misoperated. Immutable operations are implemented via the is_immutabl function:

def is_immutable(self): raise TypeError(f"{type(self).__name__! r} objects are immutable")Copy the code

In fact, it is very simple, if you want to change the data, throw an exception, so that the data is immutable.

ImmutableList&ImmutableDict

ImmutableList uses the Mixin mode. The main code of ImmutableListMixin is as follows:

class ImmutableListMixin:

    _hash_cache = None

    def __hash__(self):
        if self._hash_cache is not None:
            return self._hash_cache
        rv = self._hash_cache = hash(tuple(self))
        return rv

    def __delitem__(self, key):
        is_immutable(self)
    ...
    def append(self, item):
        is_immutable(self)
    ...
    def sort(self, key=None, reverse=False):
        is_immutable(self)
Copy the code
  • The hash method of ImmutableListMixin is overridden, and the hash value comes from a meta-ancestor object, which is immutable, so that the hash of the object is deterministic. Because it is immutable, it only needs to be computed once, and then it will use cache.
  • All operations that alter data, including magic functions, append, and even in-place sorting, are is_immutable.

ImmutableList requires only a combination of ImmutableListMixin and list. No additional implementation is required. It is very simple:

class ImmutableList(ImmutableListMixin, list):
    ...
Copy the code

ImmutableDict is similar to ImmutableList, but replaces list with dict.

TypeConversionDict

TypeConversionDict mainly converts data types:

class TypeConversionDict(dict): def get(self, key, default=None, type=None): try: rv = self[key] except KeyError: Return default if type is not None: try: # Rv = type(RV) except ValueError: rv = default return RVCopy the code

Combined with the example, it is very easy to understand:

>>> d = TypeConversionDict(foo='42', bar='blub')
>>> d.get('foo', type=int)
42
>>> d.get('bar', -1, type=int)
-1
Copy the code

MultiDict

MultiDict is a dictionary whose values are stored using lists. So a key can have multiple values. Here are its constructors and add methods:

class MultiDict(TypeConversionDict):
    
    def __init__(self, mapping=None):
        if isinstance(mapping, MultiDict):
            dict.__init__(self, ((k, l[:]) for k, l in mapping.lists()))
        elif isinstance(mapping, dict):
            tmp = {}
            for key, value in mapping.items():
                if isinstance(value, (tuple, list)):
                    if len(value) == 0:
                        continue
                    value = list(value)
                else:
                    value = [value]
                tmp[key] = value
            dict.__init__(self, tmp)
        else:
            tmp = {}
            for key, value in mapping or ():
                tmp.setdefault(key, []).append(value)
            dict.__init__(self, tmp)
    
    def add(self, key, value):
        dict.setdefault(self, key, []).append(value)
        
Copy the code

Get a feel for this with an example of MultiDict:

>>> d = MultiDict([('a', 'b'), ('a', 'c')])
>>> d
MultiDict([('a', 'b'), ('a', 'c')])
>>> d['a']
'b'
>>> d.getlist('a')
['b', 'c']
>>> 'a' in d
True
Copy the code

You may still be wondering, what is the use of such a dictionary? I’ll post request-headers for an HTTP Request:

. accept-language: en,zh; Q = 0.9, useful - TW; Q = 0.8, useful - CN; Q = 0.7...Copy the code

_accept-language_ contains multiple parameters and needs to be stored using data structures such as MultiDict.

Other datastructures in datastructures, most of which are derived from the above classes, will not be described here.


summary

In this chapter, we learned that the core middleware mechanism of “German Hammer” comes from decorators, and we briefly looked at the implementation of static files, HTTP proxies, and performance analysis. Understand how to realize the routing by the single loop traversal of the regular matching, how to parse the routing parameters; You learned some details about using specific data structures to handle HTTP headers.

tip

The iter_multi_items function in dataStructures iterates over data using the yield keyword and the yield from statement:

def iter_multi_items(mapping):
    if isinstance(mapping, MultiDict):
        yield from mapping.items(multi=True)
    elif isinstance(mapping, dict):
        for key, value in mapping.items():
            if isinstance(value, (tuple, list)):
                for v in value:
                    yield key, v
            else:
                yield key, value
    else:
        yield from mapping
Copy the code

Here’s a quick overview of those two points. The yield keyword can be understood simply as the pause of a function. Usually a function is returned by a return after execution and cannot be changed. Yield gives you the ability to pause and interact with the outside world:

def unlimit_generator():
    i = 0
    while i is not None:
        yield i
        i+=1
Copy the code

The infinite generator above, for example, can print 0 and any positive integer before the function returns, something you can’t do with range.

For iterators that also generate data, the following are outputs 0 to 20, using two iterators respectively:

def generator2():
    for i in range(10):
        yield i

def generator3():
    for j in range(10, 20):
        yield j
Copy the code

To use only the yield keyword, write the implementation like this:

def generator():
    for i in generator2():
        yield i
    for j in generator3():
        yield j
Copy the code

Using the yield from statement, the code is pretty neat:

def generator():
    yield from generator2()
    yield from generator3()
Copy the code

Yield from is also used in python3 coroutines, so you can get a feel for that

Refer to the link

  • How to Use Generators and yield in Python realpython.com/introductio…
  • Python 3: Using “yield from” in Generators simeonvisser.com/posts/pytho…