Day 7 use the urllib basic library (parse links)

This is the 15th day of my participation in the First Challenge 2022.

I wish you all a happy and prosperous New Year

We learned a little bit about urllib cookies and exception handling yesterday. Today we will learn how to parse links using urllib

The urllib parse module defines a standard interface for handling urls, such as extracting, merging, and connecting parts of urls. Ref :code

urlparse

Here’s an example:

from urllib.parse import urlparse

result = urlparse("https://baidu.com/index.html; user? id=5#comment")
print(type(result))
print(result)
Copy the code

Results obtained:

<class 'urllib.parse.ParseResult'>
ParseResult(scheme='https', netloc='baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
Copy the code

As you can see, we use urlparse to parse it, which is a ParseResult object and we can click on it to see his URlparse method

The first argument is obviously asking me to give you a URL
The second scheme pattern is that when your URL doesn’t specify something like HTTPS it will use the arguments you pass in
Allow-fragments Whether to ignore fragments I take this as an argument, and setting it to False will make the fragment section empty

urlunparse

The previous one is parsing, I will interpret it as a construction, and then give an example:

from urllib.parse import urlunparse

data = ["https"."www.baidu.com"."index.html"."user"."a=6"."comment"]

print(urlunparse(data))
Copy the code

Results: https://www.baidu.com/index.html; user? a=6#comment

Note that it must be six parameters, otherwise the exception will be raised, the list type, or the tuple type, so that the successful implementation of the URL construction

urlsplit

With my rudimentary English, this should be the URL split method, not the params part, so only five results will be returned. See the examples:

from urllib import parse

result = parse.urlsplit("https://www.baidu.com/index.html; user? id=5#comment")
print(result)
Copy the code

SplitResult(scheme=’ HTTPS ‘, netloc=’www.baidu.com’, path=’/index.html ‘; user’, query=’id=5′, fragment=’comment’)

This is a tuple, and we can get the value either by attribute name or by index

print(result.netloc,result[1])
Copy the code

www.baidu.com www.baidu.com

urlunsplit

Instead, there’s nothing to tell

from urllib import parse

result = parse.urlsplit("https://www.baidu.com/index.html; user? id=5#comment")
print(result)

print(result.netloc,result[1Unresult = parse.urlunsplit(result) unresult = parse. Urlunsplit (result)print(unresult)
Copy the code

OK New Year, learn to remember points on the line code can be found here

Day 7 use the urllib basic library (parse links)

urlparse

urlunparse

urlsplit

urlunsplit

Related Posts

Buffer pool (buffer pool)

Use Python to crawl the underwear information of Yanxuan girls of netease and explore girls’ preferences

State machine selection of SpringStateMachine