Record the entire process of the command line tool

Video tutorial: www.bilibili.com/video/BV1a8…

0 Preparation before the project

1. Define your goals

    • Achieve the command line crawl novel function
  • Realizes the function of replacing file contents in batches by command line
  • The Intranet penetration function is implemented

2. Project name

EasyTool

Command:

`et novel

et proxy server

et proxy client

et create

3. Project directory structure

1. Initial project

mkdir code
cd code
Copy the code

Here we use a third-party library, Cobra, to help us implement the command line tool,

Install dependencies

go get -u github.com/spf13/cobra
Copy the code

Initialize the project using the tool COBRA

Here, the name of our project is EasyTool, or ET for short

cobra init --pkg-name et 
Copy the code

Execute:

go run mian.go
Copy the code

If there is an error:

main.go:18:8: package et/cmd is not in GOROOT (c:\go\src\et\cmd)
Copy the code

run

go mod init et
Copy the code

2. Set up the welcome page

Create the global/welcome.go file in the root directory

package global

var Welcome = `...... | \ ___ \ | \ ___ ___ \ \ \ __ / \ | ___ \ \ _ | \ \ \ _ | / -- -- \ \ \ \ \ \ _ | \ \ \ \ \ \ \... \ \ \ __ \ \ |... | \ | __ | `

Copy the code

Modify the root. Go

import "et/global"
/ /...
var rootCmd = &cobra.Command{
	Use:   "et",
	Short: "An example of command-line development",
	Long:  global.Welcome,
	// Uncomment the following line if your bare application
	// has an action associated with it:
	// Run: func(cmd *cobra.Command, args []string) { },
}

Copy the code

3. Climb the novel

1. Add commandset novel

Take advantage of the generation tools provided by COBRA

cobra add novel
Copy the code

Take the will go


// novelCmd represents the novel command
var novelCmd = &cobra.Command{
	Use:   "novel",
	Short: "Novel crawler.",
	Long: Instructions: 1. First generate the configuration file using the command et novel config 2. 3. Run the crawl program et Novel Run ',
	Run: func(cmd *cobra.Command, args []string) {
		if len(args) > 0 {
			switch args[0] {
			case "config":
				novel.CreateJson()
			case "run":
				n := novel.NewNovel()
				n.Run()
			default:
				fmt.Println("Invalid order")}}else {
			fmt.Println("Missing parameters")}}}Copy the code

2. Generate a configuration file

Implement the et novel config command

Analysis: Write the prepared configuration template to the novel. Json file of the current position of the channel

New PKG/will be/will go

Start by preparing the fields in the configuration file

var json = `{ "host":"", "url":"", "chapter":"", "novel":"", "is_fix":false }`

Copy the code

The host for the domain name

The URL is the novel chapter list page

Chapater is the chapter picker

A novel is a picker for the content of a novel

Is_fix whether to concatenate the connection, because many sites in the connection omitted domain name, is incomplete, this option will help us concatenate the string

pkg/novel/novel.go

package novel

import (
	"io/ioutil"
	"os"

	"github.com/spf13/cobra"
)

var json = `{ "host":"", "url":"", "chapter":"", "novel":"", "is_fix":false }`

func CreateJson(a) {
	_, err := os.Create("./novel.json")
	cobra.CheckErr(err)
	err = ioutil.WriteFile("./novel.json"And []byte(json), 0777)
	cobra.CheckErr(err)

}
Copy the code

3. Read the configuration file

Use viper to read the configuration file inside json

Download the viper

go get github.com/spf13/viper
Copy the code

Create a method to get a configuration file

type Novel struct {
	Host    string `mapstructure: "host"`
	Url     string `mapstructure:"url"`
	Chapter string `mapstructure:"chapter"`
	Novel   string `mapstructure:"novel"`
	IsFix   bool   `mapstructure:"is_fix"`
}

func NewNovel(a) Novel {
	viper := viper.New()
	viper.SetConfigName("novel") // name of config file (without extension)
	viper.SetConfigType("json")  // REQUIRED if the config file does not have the extension in the name
	viper.AddConfigPath(". /")    // path to look for the config file in
	viper.AddConfigPath(".")     // optionally look for config in the working directory
	err := viper.ReadInConfig()  // Find and read the config file
	cobra.CheckErr(err)
	var config Novel
	viper.Unmarshal(&config)
	return config
}
Copy the code

4. Achieve crawl novel logic

After reading the configuration, the HTTP library is used for page fetching

Get the content and use Gojquery to get the content. Let’s install this library first

go get github.com/PuerkitoBio/goquery
Copy the code

Using goquery

Extract the content according to CSS

type ChapterNode struct {
	Url  string
	Name string
}

func (n *Novel) Run(a) {
	os.Mkdir("./novel".0777)
	content := make(chan ChapterNode)
	for i := 0; i < 100; i++ {
		go n.SaveContent(content)
	}
	doc, err := Request(n.Url)
	iferr ! =nil {
		cobra.CheckErr(err)
	}
	// Find the review items
	doc.Find(n.Chapter).Each(func(i int, s *goquery.Selection) {
		// For each item found, get the band and title
		chapter := s.Text()
		link, _ := s.Attr("href")
		if n.IsFix {
			link = n.Host + link
		}
		fmt.Printf("Section Name: %s Section Address: %v \n", chapter, link)
		node := ChapterNode{
			Url:  link,
			Name: chapter,
		}
		content <- node
	})
}

func Request(url string) (*goquery.Document, error) {
	resp, err := http.Get(url)
	cobra.CheckErr(err)
	defer resp.Body.Close()
	// Load the HTML document
	doc, err := goquery.NewDocumentFromReader(resp.Body)
	return doc, err
}

Copy the code

5. The storage of novels

Text storage, single file/one file per chapter (this is more like each record in the database)

There are two ways to do this two ways, one is to store a single file, in which case you can’t use a channel, because of the append process, because of the sequence between the chapters, so in this case, to experience channel+ Goroutine, you store a novel in chapters into a file, and then

func (n *Novel) SaveContent(c chan ChapterNode) {
	for node := range c {
		doc, _ := Request(node.Url)
		content := doc.Find(n.Novel).Text()
		os.Create("./novel/" + node.Name + ".txt")
		ioutil.WriteFile("./novel/"+node.Name+".txt"And []byte(content), 0777)
		time.Sleep(1 * time.Second)
	}
}

Copy the code