Writing in the front

Python crawlers are probably boring, so try Golang crawlers! This article will continue to be updated!

Golang provides the NET/HTTP package with native support for request and Response.

1. Send the request

  • Constructing the client
	var client http.Client
  • Construct a GET request:
	reqList, err := http.NewRequest("GET", URL, nil)
  • Constructing a POST request

Go provides a cookiejar.New function method, which is used to retain the generated Cookie information. This is for the case that some websites can only be accessed after logging in, so after logging in, there will be a Cookie, which stores user information. This message lets the server know who is making the call! For example, we need to log in the teaching affairs office of the school to crawl the class schedule. Because the class schedule may be different for everyone, we need to log in and let the server know whose class schedule information it is. Therefore, we need to add cookies on the request head for camouflage crawling.

	jar, err := cookiejar.New(nil)
	iferr ! =nil {
When constructing a POST request, you can encapsulate the data to be transferred and construct it with the URL

	var client http.Client
	Info :="muser="+muserid+"&"+"passwd="+password
	var data = strings.NewReader(Info)
	req, err := http.NewRequest("POST", URL, data)
  • Add headers
	req.Header.Set("User-Agent"."Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36")
	req.Header.Set("Accept"."text/html,application/xhtml+xml,application/xml; Q = 0.9, image/avif, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3; Q = 0.9")
	req.Header.Set("Accept-Language"."zh-CN,zh; Q = 0.9")
  • Send the request
	resp, _:= client.Do(req)  // Send the request
	bodyText, _ := ioutil.ReadAll(resp.Body)  // Use buffer to read web page content
  • About the cookie

As mentioned above, cookies are stored in the client.jar package when the request is sent

	myStr:=fmt.Sprintf("%s",client.Jar)   // force the cast pointer to string
After printing the client.Jar package, select the response cookie and place it on the request header. I can handle cookies in the case of login.

At this point, the sending of the request part is complete!

2. Parse the web page

2.1 CSS selectors provides. The NewDocumentFromReader method parses a web page.

	doc, err := goquery.NewDocumentFromReader(resp.Body)
2.2 Xpath syntax provides. The Parse method parses web pages

	root, _ := htmlquery.Parse(resp.Body)
2.3 the Regex regular

	reId, _ := regexp.Compile(`id=(\d+)`)  // Regex matches
	allId := reId.FindAll(bodyText,1)
	for _,item := range allId {
3. Obtain node information

3.1 CSS selectors

Through 2.1, after we get the doc parsed in the previous step, we can carry out CSS selector syntax and select nodes.

doc.Find("#main > div.right > div.detail_main_content").
			Each(func(i int, s *goquery.Selection) {
			Data.title = s.Find("p").Text()
			Data.time = s.Find("#fbsj").Text() = s.Find("#author").Text()
			Data.count = Read_Count(Read_Id)
			fmt.Println(Data.title, Data.time,,Data.count)

doc.Find("#news_content_display").Each(func(i int, s *goquery.Selection) {
			Data.content = s.Find("p").Text()
3.2 Xpath syntax

Through 3.2, after we get the root parsed in the previous step, we can write Xpath syntax and select nodes.

	tr := htmlquery.Find(root, "//*[@id='LB_kb']/table/tbody/tr/td")   // Use Xpath to get node information
	for _, row := range tr { //len(tr)=13
		classNames := htmlquery.Find(row, "./font")
		classPosistions := htmlquery.Find(row,"./text()[4]")
		classTeachers := htmlquery.Find(row,"./text()[5]")
		if len(classNames)! =0 {
			className = htmlquery.InnerText(classNames[0])
			classPosistion = htmlquery.InnerText(classPosistions[0])
			classTeacher = htmlquery.InnerText(classTeachers[0])
4. Save the information

4.1 Use native SQL statements to save data in Mysql

  • Define database link parameters
const (
	usernameClass = "root"
	passwordClass = "root"
	ipClass       = ""
	portClass     = "3306"
	dbnameClass   = "class"
  • Connecting to a Database
var DB *sql.DB
func InitDB(a){
	path := strings.Join([]string{usernameClass, ":", passwordClass, "@tcp(", ipClass, ":", portClass, "/", dbnameClass, "? charset=utf8"}, "")
	DB, _ = sql.Open("mysql", path)
	iferr := DB.Ping(); err ! =nil{
		fmt.Println("opon database fail")
  • Defining data types
type Class struct {
	classData   string
	teacherName string
	position    string
  • Insert data
func InsertData(Data Class) bool {
	tx, err := DB.Begin()
	iferr ! =nil{
		fmt.Println("tx fail")
		return false
	stmt, err := tx.Prepare("INSERT INTO class_data (`class`,`teacher`,`position`) VALUES (? ,? ,?) ")
	iferr ! =nil{  // Insert data
		fmt.Println("Prepare fail",err)
		return false
	_, err = stmt.Exec(Data.classData,Data.teacherName,Data.position)  // Execute a transaction
	iferr ! =nil{
		fmt.Println("Exec fail",err)
		return false
	_ = tx.Commit()  // Commit the transaction
	return true
4.2 Using GORM to save data to Mysql

  • Construct GORM model model
type NewD struct {
	Title   string `gorm:"type:varchar(255); not null;" `
	Time    string `gorm:"type:varchar(256); not null;" `
	Author  string `gorm:"type:varchar(256); not null;" `
	Count   string `gorm:"type:varchar(256); not null;" `
	Content string `gorm:"type:longtext; not null;" `
  • Connecting to a Database
var db *gorm.DB

func Init(a) {
	var err error
	path := strings.Join([]string{userName_New, ":", password_New, "@tcp(",ip_New, ":", port_New, "/", dbName_New, "? charset=utf8"}, "")
	db, err = gorm.Open("mysql", path)
	iferr ! =nil {
	_ = db.AutoMigrate(&NewD{})
	sqlDB := db.DB()
  • Write data
	NewA := NewD{
		Title:   Data.title,
		Time:    Data.time,
		Count:   Data.count,
		Content: Data.content,
	err = db.Create(&NewA).Error  // Create a data item in the database
