This is the 15th day of my participation in the First Challenge 2022.
I wish you all a happy and prosperous New Year
We learned a little bit about urllib cookies and exception handling yesterday. Today we will learn how to parse links using urllib
The urllib parse module defines a standard interface for handling urls, such as extracting, merging, and connecting parts of urls. Ref :code
urlparse
Here’s an example:
from urllib.parse import urlparse
result = urlparse("https://baidu.com/index.html; user? id=5#comment")
print(type(result))
print(result)
Copy the code
Results obtained:
<class 'urllib.parse.ParseResult'>
ParseResult(scheme='https', netloc='baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
Copy the code
As you can see, we use urlparse to parse it, which is a ParseResult object and we can click on it to see his URlparse method
- The first argument is obviously asking me to give you a URL
- The second scheme pattern is that when your URL doesn’t specify something like HTTPS it will use the arguments you pass in
- Allow-fragments Whether to ignore fragments I take this as an argument, and setting it to False will make the fragment section empty
urlunparse
The previous one is parsing, I will interpret it as a construction, and then give an example:
from urllib.parse import urlunparse
data = ["https"."www.baidu.com"."index.html"."user"."a=6"."comment"]
print(urlunparse(data))
Copy the code
Results: https://www.baidu.com/index.html; user? a=6#comment
Note that it must be six parameters, otherwise the exception will be raised, the list type, or the tuple type, so that the successful implementation of the URL construction
urlsplit
With my rudimentary English, this should be the URL split method, not the params part, so only five results will be returned. See the examples:
from urllib import parse
result = parse.urlsplit("https://www.baidu.com/index.html; user? id=5#comment")
print(result)
Copy the code
SplitResult(scheme=’ HTTPS ‘, netloc=’www.baidu.com’, path=’/index.html ‘; user’, query=’id=5′, fragment=’comment’)
This is a tuple, and we can get the value either by attribute name or by index
print(result.netloc,result[1])
Copy the code
www.baidu.com www.baidu.com
urlunsplit
Instead, there’s nothing to tell
from urllib import parse
result = parse.urlsplit("https://www.baidu.com/index.html; user? id=5#comment")
print(result)
print(result.netloc,result[1Unresult = parse.urlunsplit(result) unresult = parse. Urlunsplit (result)print(unresult)
Copy the code
OK New Year, learn to remember points on the line code can be found here