My mom set me up with someone, and I haven't graduated from college yet, so I'm in the matchmaking market? 【Python crawler: Matchmaking data Collection 】

Hi, I’m Latiao.

Believe it or not, I was set up by the family of an undergrad… Is it in such a hurry? Is this the wedding market now? Boys and girls will be caught by their families and married if they don’t work hard.

This is a chat with my mom, and I’ll show you this girl.

There’s no more. Is there a problem with my conversation? Guys, tell me I’m not straight. Do YOU want me to go out and talk? Then I thought of the matchmaking market, inspired by this want to climb a matchmaking market data down to see, not only can let everyone understand the situation of single men and women now, but also can learn technology, why not? Get straight to the point!

@TOC

The crawler target

Web site:XXXX matchmaking

Results show

Tool use

Development environment: Win10, PYTHon3.7 Development tools: PyCharm, Chrome toolkit: Requests, DOCx, LXML

Key learning content

1. Xpath extraction of data 2. Docx document data storage 3

Project idea analysis

Choose the age division of your wealth passwordGets the page data for the current page

The corresponding hyperlinks are extracted by xpath

Get the picture address, used to save the picture

def get_data(url) :
    response = requests.get(url, headers=headers)
    # print(response.text)
    data = etree.HTML(response.text)
    href_list = data.xpath("//div[@class='e-img']/a/@href")
    img_list = data.xpath("//div[@class='e-img']/a/img/@src")
Copy the code

Splice the URL address of the detail page obtain the data of the detail page Obtain the image data

The name
Record of formal schooling
professional
Marriage status
Work address
requirements
.

    for href, img in zip(href_list, img_list):
        img = requests.get("https://www.csflhjw.com" + img, headers=headers).content
        print(img)
        f = open("1.jpg"."wb")
        f.write(img)
        res = requests.get("https://www.csflhjw.com" + href, headers=headers)
        # print(res.text)
        html = etree.HTML(res.text)
        name = html.xpath('//div[@class="team-e"]/h2/text()') [0]
        edu = html.xpath('//div[@class="team-e"]/p[1]/text()') [0]
        profession = html.xpath('//div[@class="team-e"]/p[2]/text()')
        sponsa = html.xpath('//div[@class="team-e"]/p[3]/text()') [0]
        children = html.xpath('//div[@class="team-e"]/p[4]/text()') [0]
        house = html.xpath('//div[@class="team-e"]/p[5]/text()') [0]
        add = html.xpath('//div[@class="team-e"]/p[6]/text()') [0]
        ask_for = html.xpath('//div[@class="hunyin-1-2"]/p[2]/span/text()') [0]

Copy the code

Save the data in the docX document creation document file

		document = Document()
		document.add_heading('Sweet Matchmaker')

      	document.add_paragraph("Name:" + name)
        document.add_paragraph(edu)
        document.add_paragraph(profession)
        document.add_paragraph(sponsa)
        document.add_paragraph(children)
        document.add_paragraph(house)
        document.add_paragraph(add)
        document.add_paragraph(ask_for)
        document.add_picture("1.jpg")
        document.add_paragraph("")
Copy the code

Simple source code analysis

import requests
from docx import Document
from lxml import etree

document = Document()
document.add_heading('Sweet Matchmaker')


headers = {
    'user-agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'
}


def get_data(url) :
    response = requests.get(url, headers=headers)
    # print(response.text)
    data = etree.HTML(response.text)
    href_list = data.xpath("//div[@class='e-img']/a/@href")
    img_list = data.xpath("//div[@class='e-img']/a/img/@src")
    # print(href_list)
    for href, img in zip(href_list, img_list):
        img = requests.get("https://www.csflhjw.com" + img, headers=headers).content
        print(img)
        f = open("1.jpg"."wb")
        f.write(img)
        res = requests.get("https://www.csflhjw.com" + href, headers=headers)
        # print(res.text)
        html = etree.HTML(res.text)
        name = html.xpath('//div[@class="team-e"]/h2/text()') [0]
        edu = html.xpath('//div[@class="team-e"]/p[1]/text()') [0]
        profession = html.xpath('//div[@class="team-e"]/p[2]/text()')
        sponsa = html.xpath('//div[@class="team-e"]/p[3]/text()') [0]
        children = html.xpath('//div[@class="team-e"]/p[4]/text()') [0]
        house = html.xpath('//div[@class="team-e"]/p[5]/text()') [0]
        add = html.xpath('//div[@class="team-e"]/p[6]/text()') [0]
        ask_for = html.xpath('//div[@class="hunyin-1-2"]/p[2]/span/text()') [0]
        document.add_paragraph("Name:" + name)
        document.add_paragraph(edu)
        document.add_paragraph(profession)
        document.add_paragraph(sponsa)
        document.add_paragraph(children)
        document.add_paragraph(house)
        document.add_paragraph(add)
        document.add_paragraph(ask_for)
        document.add_picture("1.jpg")
        document.add_paragraph("")


def main() :
    for i in range(1.2):
        url = "https://www.csflhjw.com/zhenghun/9.html?page={}".format(i)
        html_data = get_data(url)


if __name__ == '__main__':
    main()
    document.save('demo.docx')
Copy the code

PS: I have been able to see my future was urging the scene of marriage, brothers refueling, good money to start a career after a family! Article content is for learning exchange only! If it helps you remember to give latiao three lian!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

My mom set me up with someone, and I haven’t graduated from college yet, so I’m in the matchmaking market? 【Python crawler: Matchmaking data Collection 】

The crawler target

Results show

Tool use

Key learning content

Project idea analysis

Simple source code analysis

My mom set me up with someone, and I haven’t graduated from college yet, so I’m in the matchmaking market? 【Python crawler: Matchmaking data Collection 】

The crawler target

Results show

Tool use

Key learning content

Project idea analysis

Simple source code analysis

Related Posts

Don’t you know Redis as a developer? How about mastering Redis first and then thinking about jumping ship after the New Year

Go Tutorials -Channels | Go Theme Month

Technical implementation of netease cloud MySQL instance migration