(Introduction to Python crawler)

Python training

First, basic introduction

1. The industry

2. Set up the development environment

Python3.6 must be checked to add the Python environment to Path 2 and install PyCharm

3. How do I create a Python project using PyCharm

1. Select Create New Project 2 and change the project storage path. The last path is the project nameCopy the code

4. How to create Python files in Pycharm Python project

1, right-click project 2, select New 3, and select Python File

5. The code

Print (" input ") print(" input ") print(" input ") print(" input ") print(" input ") print(" input ") Do not use numbers, special symbols start with 4, data type You can use type to get the data type of a variable, such as: Num = 10 type (num) 5, data type conversion STR () int (), float (CRH) (6), the arithmetic operation +, -, *, /, % logic operations <, >, < =, > =, = =,! = and or not =,+=,-=,*=,/=,%= 7, branch statements Branch statements in Python have the following syntax:Copy the code
 		ifConditional expression1): when a conditional expression1If true, execute the code hereelifConditional expression2): when a conditional expression2If true, execute the code hereelifConditional expression3) when a conditional expression3If true, execute this code...else: When all the above conditions are false, execute this codeCopy the code

For loop 10, random number 1, guide package 2, get a random number

2. String

1. The string

str

2. Data storage

  • Temporary storage: disappears when a program is closed

    variable

    Class object

    List etc.

    Temporary storage in Java

    An array of

    object

    ​ list

    ​ set

    ​ map

    Temporary storage in Python

    variable

    object

    Tuple: Similar to an array in Java, defined with parentheses and immutable

    List: Similar to an ArraryList in Java, defined with brackets and variable content

    Dictionary: similar to a Map in Java, defined with curly braces and corresponding key values.

  • Persistent storage: Does not disappear with the program

    File database network storage

Third, object-oriented

1. The concept

Object orientation is a kind of thinking

As opposed to process-oriented, loading an elephant into a refrigerator takes several steps

Open door and close doorCopy the code

Object oriented thinking

1. What objects exist in thingsCopy the code

2. Consider the relationship between objects

Object: a real thing

Class: A concept that forms a class of behaviors with the same attributes

Real life: Objects come before classes

In code: You need a class to create an object

Subclass -> parent class, no risk

Superclass -> subclass has type conversion exception inheriting polymorphic encapsulation

Code 2.

__init__ is equivalent to the Java constructor __variable name #, which defines private properties that cannot be accessed directly outside the classCopy the code
Self # is similar to the Java constructor, and self is similar to the Java thisCopy the code
Class C(A,B) # Multiple inheritance # Pay attention to the parentheses in the order of the parent classes. If the parent class has the same method name that is not specified when the child class is used, # Python searches left to right for methods that are not found in the child class.Copy the code

3. Methods (functions)

  • Java method

    Access modifier [modifier] Return type Method name ([parameter list]) {method body return; }

  • Java method use

    Method name ([argument list])

  • Python method

    Def method name ([parameter list]) :

    Method body

    ​ return;

  • Use of Python methods

    Method name ([argument list])

  • Arguments in Python can be passed in the order of their parameters

  • Python does not pass parameters that have default assignments

  • Methods that have return value types can be received using variables

  • Multiple return values can be received

  • benefits

    Methods:

    1. Encapsulate code

    The advantages of this method are as follows: 1. protect internal code; 2. facilitate external calls; 3. reduce code coupling

File I/O, mail, GUI

1. File operation

OS: package

Check whether the file exists

Os.path.exists (File path)

If the return value is true, the file exists; otherwise, it does not

How do I create multilevel folders

Os.makedirs (” folder path “)

The/between each word is represented as a subfolder

For example, if a/ B/C is used, create folder A. Under folder A, create folder B. Under file B, create folder C

Open: Opens a file

File object = open(file path, mode, encoding format)

File path: relative path relative to the current location of the py file.. /: Current path /xx: Next layer Absolute path Location of the file on the PC, for example, C ://a/b/test.txtCopy the code

Such as: the file object = open (” file path “, mode = “mode,” encoding = “utf-8”)

Mode:

W: Write. If the file does not exist, it will help us to create the file. If the file already exists, delete it and then create the file

R: read

A: Append, if the file does not exist, it can help us to create the file, if the file already exists, use the current file

File operations

Read: file object. Write (what is written)

File object. Write (what is written)

Write: file object. read() Reads everything in a file

Read = file object.read ()

Close: file object.close ()

File object.close ()

2,time

Time.sleep (block time in seconds)

Get current time time.time()

3. Email sending

Need to pack

​ from email.mime.text import MIMEText

​ import smptlib

steps

1. Prepare the required data

Sender nickname

Sender account

Authorization code of the sender account

Recipient email address

Email title

Email body

2. Assemble mail :MIMEText

Text import MIMEText MSG = MIMEText(text) MSG [" from "] = sender nickname MSG ["subject"] = titleCopy the code

3. Log in to the email address smtplib

Import smtplib client = smtplib.SMTP("smtp.qq.com",25) 2. Login sender email Sender authorization code client.loginCopy the code

4. Send an email

Sender email address

Recipient email address

mail

Client.sendmail (MSG =msg.as_string())Copy the code

5. Log out

​ client.quit()

4, visualization

Similar to GUI in Java

The package:

​ tkinter

window

Create a window

​ window=tkinter.Tk()

Settings window

Set window title

Setting window size

Set whether the window width and height can be changed

Display window

​ mainloop()

component

Text: the Lable

Property :text: Displayed text

Input box: Entry

Property :width: indicates the width

Button, the Button

Note: After all components are added, the pack needs to be displayed

Five, front-end web page

Why?

Because crawlers need to get data from web pages, they need to understand web pages

If post production flask project, need front-end display, also need to be able to front-end web page

Code learning path
  1. Set up the environment

  2. Installing development tools

  3. Create a project

    Img: stores the CSS code files. Img: stores the image files used in the project. GIF image js: Stores the JS code files

  4. Write the code

1.Html

Tag format < tag name >

Such as: < span > < / span > < p > < / p > < h1 > < / h1 > < ul > < / ul > < li > < / li > < a > < / a >...Copy the code

Can accommodate other tags < start tag properties > content </ End tag >

< tag name />

Such as:

				<img />
				<meta />
				<br />
				<hr />
Copy the code

It is called the final tag and cannot be nested with other tags < tag name field />

All tags have id and class attributes ** ID: similar to a person's ID number, the value of the ID in a web page cannot be repeated ** **class: similar to classifying tags, the value can be repeated ** ** all tags can be set to click events **Copy the code

Tag: text span: text tag, new font: text tag, old H1 ~ H6: title tag P: Paragraph A: Hyperlink property :href Function: redirects to external url Redirects to internal url Redirects to a specified area in the current web page

Image img: image display attribute: SRC (the value is the address of the image) Note: GIF image input is supported

Input: input attribute: Type value (provided by the system, which cannot be customized) Text: text (default) password: password Button: button Radio: checkbox: multiple options Value (customized) When type is text or password, the value of this attribute is the input content in the input box. When type is button, the value of this attribute is the content displayed on the button. When type is radio or checkbox, the value of this button is name: Value (custom) If type is radio or checkbox and the value of name is the same, a group of media is generated. Audio: audio SRC: location of resource files video: other media BR: newline HR: horizontal split line ul: unordered list Li: list items OL: ordered list Li: list subitem select: selectors option: div: blocks

2.CSS(Cascading Style Sheets):

Action: Beautify, move HTML tag 1, where to write CSS code option 1:(not recommended) Write in the style property of the tag (inline style) Option 2:(for beginners) Write the style tag in the head tag, Writing CSS code in the style tag option 3:(skilled use) write in the CSS file, through the link tag into the required HTML file 2, how to find the HTML tag in the CSS? What are the common selectors to find through selectors? Id selector # name class selector. Name tag selector P,a,div... Select *{} 3, if multiple selectors, set the style for the same tag, listen to whose (selector weight) selector priority is the same: who writes after, listen to whose selector priority is different: Listen to the selector with high priority selector priority selector < tag selector < class selector < ID selector < inline style 4, what can be beautified? * Text Text size font-size font-family font style font-style Stroke thickness font-weight color color /** * color * System defined color * six-digit color palette * # red green Blue * RGB * 0~255 hexadecimal 00~FF */ /*color: #009900; * / / * color: RGB (0,0,255); * / / a is transparency * 1 * * *, * 0 opaque and transparent * / rgba (0,0,0,0.5); Text-decoration :underline * background-color background-image:url(""); Background-size /** * background size * 1 parameter: the width of the background can be px or % * 2 parameter: the height of the background can be PX or % */ Whether the background is tiled background-repeat /** * Whether the background is tiled * No-repeat: not flat * repeat: flat (default) */ background location background-position /** * background location * 1 reference :left * left * right * right * Center * 2 reference: up and down * Top :top * Bottom :bottom * Middle :center */ * Width width Height Inner margin margin upper right lower left margin margin upper right lower left margin border, border-radius, Div IE box: box-sizing: border-top-left-radius Border-box * calculation /** * Standard box component footprint * Component footprint width/height = Width/height + left/right/upper and lower inner margin + left/right/upper and lower border + left/right/upper and lower margins * IE box component footprint: In ie box calculation, margins are not reduced * Component floor space/height = Width/height + Left/right/Top/bottom margins */ * Positioning Elements are positioned relative to their normal positions. Absolute The position of an absolutely positioned element relative to the nearest localized parent element, or if the element has no localized parent element, its position relative to < HTML >: Margin: upper and lower margins: left and right margins: auto: adaptive Note: Components must be block elements Benefit: does not affect other tags position: center left: %50 margin-left:-(width/2)px; Float level: z-index Left float:left float:right float:right Clear * shadow box-shadow: x y Shadow width Shadow color * Animation * Transition 5 * Set width and height are invalid, content width, component width: span,a * Set width and height are valid, not exclusive line: input, img, etc. * Set width and height are valid, exclusive line: P,div,li,h1~h6d, etc. * How to modify tag element type display attribute hidden Hidden element, placeholder None Hidden element, no placeholder block: block element inline-block: block element inline: inline elementCopy the code

Python crawlers

1. Static crawl

  1. Urllib. request import request, urlopen # extract or compress data: import gzip # HTTPS url verification: import SSL

    Convert data to HTML format: from LXML import etree

from urllib.request import Request, urlopen# network request package
import gzipDecompress or compress data
import sslHTTPS url protocol verification
from lxml import etree
if __name__ == '__main__':
    HTTPS protocol verification
    ssl._create_default_https_context = ssl._create_unverified_context;
    path="https://www.biqooge.com/0_3/5372509.html";
    headers={
        "user-agent":"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/ 537.36EDG /91.0.864.67"."accept-encoding":"gzip"
    };
    Encapsulate the request
    req=Request(url=path,headers=headers);
    # open the connection
    conn = urlopen(req);
    Check whether the connection is successful and whether the response code is 200
    if conn.code == 200:
        Get background data
        data = conn.read();
        Check whether the format is compressed
        isGzip=conn.headers.get("content-encoding");
        print(isGzip);
        if isGzip=="gzip":
            # Decompress data
            data=gzip.decompress(data);
        # Change the encoding format of data
        data=data.decode(encoding="gbk");

        # parse data
        1. Convert the data to HTML format
        html=etree.HTML(data);
        Get the chapter name, and prepare the xpath
        titleXpath="//div[@class='bookname']/h1/text()";
        titleTag=html.xpath(titleXpath);
        print(html)
        if len(titleTag)>0:
            print(titleTag[0])
        textXpath="//div[@id='content']/text()";
        textTag=html.xpath(textXpath);
        text="".join(textTag);
        text=text.split();
        text="\n".join(text);
        print(text);
        # store locally
        file=open(file="%s.txt"%(titleTag[0]),mode="a",encoding="utf-8");
        file.write(titleTag[0] +"\n");
        file.write(text);
        file.close();
        pass
    else:
        print("Connection failed");
Copy the code
  1. Climb the directory and then enter the specific list item to climb
from urllib.request import Request, urlopen# network request package
import gzipDecompress or compress data
from lxml import etree
importDemo01_ crawleras pc
import time
import random

paths={}

def dowmload() :
    path = "https://www.biqooge.com/0_3/";
    headers = {
        "user-agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/ 537.36EDG /91.0.864.67"."accept-encoding": "gzip"
    };
    Encapsulate the request
    req = Request(url=path, headers=headers);
    # open the connection
    conn = urlopen(req);
    Check whether the connection is successful
    if conn.code == 200:
        Get background data
        data = conn.read();
        Determine whether the data format is compressed
        isGzip = conn.headers.get("content-encoding");
        print(isGzip);
        if isGzip == "gzip":
            # Compress data
            data = gzip.decompress(data);
        # Change the encoding format of data
        data = data.decode(encoding="gbk");
        # parse data
        1. Convert the data to HTML format
        html = etree.HTML(data);
        Get the position of the a tag
        aXpath = "//div[@id='list']/dl/dd/a";
        aTag = html.xpath(aXpath);
        print(aTag);
        n = 5;
        for a in aTag:
            if n <= 0:
                break;
            itemPath = a.xpath("./@href");
            itemName = a.xpath("./text()");
            tt = random.randint(1.4);
            time.sleep(tt);
            if len(itemPath) > 0 and len(itemName) > 0:
                print(itemName[0]."https://www.biqooge.com" + itemPath[0]);
                paths[itemName[0]] ="https://www.biqooge.com" + itemPath[0];
                n-=1;
        pass
    else:
        print("Connection failed");
if __name__ == '__main__':
    dowmload();
    keys=paths.keys();
    for path in keys:
        pc.main(paths[path]);
Copy the code

2. Resource download (pictures, files, videos)

  1. Pictures for
Download file: urlRetrieve urlRetrieve (URL =" image path ",filename=" file storage path ")
from urllib.request import Request,urlopen,urlretrieve # urlRetrieve (url=" image path ",filename=" file storage path ")
Etree.html (data)
from lxml import etree
# Decompress or compress data gzip. Decompress ()
import gzip
# HTTPS url is used for verification
import ssl
# file manipulation
import os
if __name__ == '__main__':
    # attestation
    ssl._create_default_https_context = ssl._create_unverified_context;

    # request address
    path="https://www.baidu.com";
    # request header
    headers={
        "user-agent":"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/ 537.36EDG /91.0.864.67"."accept-encoding":"gzip"
    }
    Encapsulate the request
    req = Request(url=path,headers=headers);
    # open the connection
    conn = urlopen(req);
    # Determine success
    if conn.code==200:
        # Read server returns data
        data= conn.read();
        Because the download is a compressed format, it needs to be uncompressed
        data = gzip.decompress(data);
        # Because Chinese is garbled, so we need to change the encoding format to the website's encoding
        data = data.decode(encoding="utf-8");

        # parse
        Convert to HTML format
        html = etree.HTML(data);
        Prepare an xpath for the data to fetch
        imgTag= "//img[@id='s_lg_img']/@src";
        # Start parsing fetch data
        imgs = html.xpath(imgTag);
        Get the address and complete it
        if len(imgs)>0:
            imgPath = "https:"+imgs[0];
            print(imgs);
            Check whether the folder exists
            if not os.path.exists("./img") :# create if it doesn't exist
                os.makedirs("./img");
            # download
            urlretrieve(url=imgPath,filename="./img/bd.png");
        pass
    else:
        print("Connection failed!");
Copy the code

3. Obtain dynamic resources

  1. Dynamic Baidu map crawling

    # Google Browser Pack
    from selenium.webdriver import Chrome
    from urllib.request import urlretrieve
    # import time
    # baidu figure
    import os
    if __name__ == '__main__':
        # load driver
        #chrom = chrom (" driver address ")
        chrom = Chrome("./chromedriver.exe");
        Make a request
        chrom.get(url="https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110 &sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E9%A3%8E%E6%99%AF&oq=%E9%A 3%8E%E6%99%AF&rsp=-1");
        Block the program for a while to allow the browser to load dynamic data
        # time.sleep(10);
        Get the class attribute as XXX
        # find_elementS_by_class_name: Fetch the corresponding tag from the class attribute value
        # find_element_by_id: Get the tag by its ID
        # find_elements_by_tag_name: Get the tag by its name
        imgs=chrom.find_elements_by_class_name("main_img");
        # walk through the data
        num=0;
        for img in imgs:
            # get_attribute: Gets the attribute value of the specified attribute
            # Fetch tag. Text: Fetch the content of the tag property area
            Get SRC from img
            imgPath=img.get_attribute("src");
            print(imgPath);
            # save local
            if not os.path.exists("./img"):
                os.makedirs("./img");
            num+=1;
            # Download to local
            urlretrieve(url=imgPath,filename="./img/%d.jpg"%(num));
        print(imgs);
        # Exit browser
        chrom.quit();
    Copy the code

4. Crawl data and save it to the database

  1. Database entry

    from urllib.request import Request,urlopen
    import ssl
    import gzip
    from lxml import etree
    The package used by python to operate mysql
    #import pymysql
    def getData(path,headers,encoding) :
        ssl._create_default_https_context = ssl._create_unverified_context
        req = Request(url=path,headers=headers)
        conn = urlopen(req)
        if conn.code == 200:
            data = conn.read()
            if conn.headers.get("Content-Encoding") = ="gzip":
                data = gzip.decompress(data)
            data = data.decode(encoding=encoding)
            return data
            pass
        else:
            print(Error code:,conn.code)
            return ""
        pass
    "Database operation guide pymysql"
    The package used by python to operate mysql
    import pymysql
    def saveBook(bookName,bookPath) :
        Get the database connection object
        Host: address of the connected database IP port: port number user: account password: password database= database name charset= encoding format
        conn = pymysql.connect(host="127.0.0.1",port=3306,user="root",
                               password="mysql",database="test",charset="utf8");
        Get cursor, execute SQL
        cursor = conn.cursor();
        sql = "insert into books (b_name,b_path) values('%s','%s')"%(bookName,bookPath);
        # execute SQL
        If SQL is a query statement, return the query result set. If increment, delete, change, return number of affected rows.
        result=cursor.execute(sql);
        if result<1:
            print("Insert failed!");
        else:
            Commit if changes are made to the database
            conn.commit();
        # close database
        cursor.close()
        conn.close()
    
    if __name__ == '__main__':
        path="https://www.xbiquge.la/xiaoshuodaquan/";
        headers={
            "User-Agent":"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"."Accept-Encoding":"gzip"
        };
        booksData=getData(path=path,headers=headers,encoding="utf-8");
        bookshtml = etree.HTML(booksData);
        booksXpath="//div[@id='main']/div[@class='novellist']/ul/li/a";
        books_a = bookshtml.xpath(booksXpath);
        for book_a in books_a:
            bookName = book_a.xpath("./text()");
            bookPath = book_a.xpath("./@href");
            # print("%s--%s"%(bookName[0],bookPath[0]));
            saveBook(bookName[0],bookPath[0]);
    
        print(booksData);
    
    Copy the code
  2. Reading database

    The package used by python to operate mysql
    #import pymysql
    
    class Book:
        def __init__(self,id,name,path) :
            self.id=id;
            self.name=name;
            self.path=path;
        def __str__(self) :
            return ("id:%d name:%s path:%s"%(self,id,self.name,self.path));
    "Database operation guide pymysql"
    The package used by python to operate mysql
    import pymysql
    def getBooks() :
        list= [];Get the database connection object
        conn = pymysql.connect(host="127.0.0.1", port=3306, user="root",
                               password="mysql", database="test", charset="utf8");
        Get cursor, execute SQL
        cursor = conn.cursor();
        sql = "select * from books";
        # execute SQL
        If SQL is a query statement, return the query result set. If increment, delete, change, return number of affected rows.
        result = cursor.execute(sql);
        #fetchall retrieves the tuple of the query result
        result = cursor.fetchall();
        for book in result:
            id = book[0];
            name = book[1];
            path = book[2];
            b=Book(id=id,name=name,path=path);
            list.append(b);
        print("Query complete!");
        return list;
    
    if __name__ == '__main__':
        Mysql > alter table books
        books = getBooks();
        num=0;
        for book in books:
            if num>10:
                break;
            print(book.name+":\t"+book.path);
            num+=1;
    Copy the code
  3. Read the contents of the database and process further, requesting storage again

    from urllib.request import Request,urlopen
    import ssl
    import gzip
    from lxml import etree
    The package used by python to operate mysql
    #import pymysql
    
    class Book:
        def __init__(self,id,name,path) :
            self.id=id;
            self.name=name;
            self.path=path;
        def __str__(self) :
            return ("id:%d name:%s path:%s"%(self,id,self.name,self.path));
    class Item:
        def __init__(self,id,bid,name,path) :
            self.id=id;
            self.bid=bid;
            self.name=name;
            self.path=path;
        def __str__(self) :
            return ("Chapter % D: %s---%s"%(self.id,self.name,self.path));
    import pymysql
    def saveItem(bid,name,path) :
        Get the database connection object
        Host: address of the connected database IP port: port number user: account password: password database= database name charset= encoding format
        conn = pymysql.connect(host="127.0.0.1",port=3306,user="root",
                               password="mysql",database="test",charset="utf8");
        Get cursor, execute SQL
        cursor = conn.cursor();
        sql = "insert into items (b_id,i_name,i_path) values(%d,'%s','%s')"%(bid,name,path);
        # execute SQL
        If SQL is a query statement, return the query result set. If increment, delete, change, return number of affected rows.
        result=cursor.execute(sql);
        if result<1:
            print("Insert failed!");
        else:
            Commit if changes are made to the database
            conn.commit();
        # close database
        cursor.close()
        conn.close()
    Get all chapter information of a book saved
    def getData(headers,book) :
        ssl._create_default_https_context = ssl._create_unverified_context
        req = Request(url=book.path,headers=headers)
        conn = urlopen(req)
        if conn.code == 200:
            data = conn.read()
            if conn.headers.get("Content-Encoding") = ="gzip":
                data = gzip.decompress(data)
            itemsData = data.decode(encoding="utf-8");
            itemshtml = etree.HTML(itemsData);
            item_aXpath = "//div[@id='list']/dl/dd/a";
            item_apath= itemshtml.xpath(item_aXpath);
            for item_a in item_apath:
                i_name=item_a.xpath("./text()");
                i_path = item_a.xpath("./@href");
                i_path=book.path+i_path[0];
                bid = book.id;
                # print("%s--%s"%(bookName[0],bookPath[0]));
                saveItem(bid,i_name[0],i_path);
            print("Done!);
            pass
        else:
            print(Error code:,conn.code)
            return ""
        pass
    "Database operation guide pymysql"
    The package used by python to operate mysql
    import pymysql
    def getBooks() :
        list= [];Get the database connection object
        conn = pymysql.connect(host="127.0.0.1", port=3306, user="root",
                               password="mysql", database="test", charset="utf8");
        Get cursor, execute SQL
        cursor = conn.cursor();
        sql = "select * from books";
        # execute SQL
        If SQL is a query statement, return the query result set. If increment, delete, change, return number of affected rows.
        result = cursor.execute(sql);
        #fetchall retrieves the tuple of the query result
        result = cursor.fetchall();
        for book in result:
            id = book[0];
            name = book[1];
            path = book[2];
            b=Book(id=id,name=name,path=path);
            list.append(b);
        print("Query complete!");
        return list;
    
    
    if __name__ == '__main__':
        headers={
            "User-Agent":"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"."Accept-encoding":"gzip"
        }
        Mysql > alter table books
        books = getBooks();
        num=0;
        for book in books:
            if num>10:
                break;
            print(book.name+":\t"+book.path);
            Get all chapters of each novel and store them in the database
            getData(headers=headers,book=book);
            num+=1;
    Copy the code

Seven, Flask

Tip: When creating projects, environment and code should be kept separate

1. Install the flask and Pymysql

pip install flask pymysql -i https://mirrors.aliyun.com/pypi/simple
Copy the code

2. Introduction to Flask

  • Python is a mini-framework for the Web that implements the WSGI specification for Web services () at its core. Includes jinja2 templating technology

  • Flask implements only the core functionality and needs to implement it itself for database operations.

  • To solve8.0Unable to connect to mysql remotely> use mysql;
    mysql> select user,host from user;
    userMysql > select * from host root> update user set host=The '%' where user='root' and            host='localhost';
    mysql> grant all privileges on *.* to 'root'@The '%';
    mysql> flush privileges;
    Copy the code
  • Quick start

    • Alt INSERT Creates file shortcuts
    • Static Static folder
    • Templates Select Template for the template file
  • Flask service entry

"" import JSON from Flask import Flask,make_response,render_template,request # import DB from DB import DB Spider_book import get_book # CTRL +p(CTR + mouse) display the method parameter app = Flask(__name__,static_url_path="/s",static_folder="static") Def index_handle() def index_handle() When an instance exits the context, the object's __exit__ method is called with DB() as c: C. ecute('select * from books') ret = list(c.foochall ()) return render_template('index.html', title=" I am a library ", books=ret) @app.route('/search',methods=['GET','POST']) def search_handle(): Kw = request.form.get('kw',''); # is the dictionary format ret=None; if request.method == 'POST': SQL =""" select * from books where b_name like %s "" with DB() as c: # If args is a list or tuple, %s can be used as a placeholder in the query. # If args is a dict, %(name)s can be used as a placeholder in the query. c.execute(sql,args=(f'%{kw}%',)) ret = list(c.fetchall()) return render_template('search.html',kw=kw,results=ret) @app.route('/spider',methods=['GET']) def spider_handle(): Get ("path") # Get book according to path Book =get_book(path) #json.dumps() Convert dict or list objects to json strings # Json format: object {"key":"value"... }, array: [{},{}] # resp_data = json.dumps({'code':100,'msg':'OK'}) resp_data = json.dumps(book) response = make_response(resp_data) response.headers['Content-Type'] = 'application/json; Charset = utF-8 'return render_template("book.html",book=book) if __name__ == '__main__': App. The run (host = "0.0.0.0", the port = 5000, debug = True)Copy the code
  • Database utility classes
from pymysql import Connect
from pymysql.cursors import DictCursor
Enter the mysql client
""" mysql> use mysql; mysql> select user,host from user; Mysql > update user set host='%' where user='root' and host='localhost'; mysql> grant all privileges on *.* to 'root'@'%'; mysql> flush privileges; "" "
DB_CONFIG = {
    # 180.76.121.47
    'host':'127.0.0.1'.# 3307
    'port':3306.'user':'root'.#root
    'password':'mysql'.'db':'test'.'charset':'utf8'
}

class DB:
    def __init__(self) :
        # **xx converts xx to the corresponding keyword
        self.conn =Connect(**DB_CONFIG,
                           cursorclass=DictCursor)
        print('-- Database connection successful! -- ')

    def __enter__(self) :
        The current instance is used in with
        return self.conn.cursor()   Return the cursor instance

    def __exit__(self, exc_type, exc_val, exc_tb) :
        The current instance exits the with context
        if exc_type:
            SQL > execute SQL
            # rollback transaction
            self.conn.rollback()
            print('--->db_error:',exc_val)
        else:
            SQL > execute successfully
            # commit transaction
            self.conn.commit()
        return True #True The exception is internally digested. False: The exception continues to be thrown

    def create_db(self,name) :
        with self as c:
            c.execute(f'create database {name} charset utf8')
    def create_table(self,table_name,*fields) :
        # fields ->('id integer','name varchar(20) ')
        with self as c:
            sql = 'create table %s (%s)'
            field_args=', '.join(fields)
            c.execute(sql%(table_name,field_args))
if __name__ == '__main__':
    When the instance object is entered into the context with, the object's __enter__ method is called
    When an instance exits the context, the object's __exit__ method is called
    with DB() as c:
        # c.execute("show tables")
        c.execute('desc books')
        for row in c.fetchall():
            print(row)

    Create a database or table
    db=DB()
    # db.create_db('group1');
    # db.create_table('',('','',''))
Copy the code

Book city website home page Dynamically acquired browser driver download

If you have any questions, please write to meCopy the code