Because a few days ago, killed a programmer to die, resulting in a shortage of programmers, this is not, the programmers to verify the reliability of the current data, the programmer was a hate back :” time is tight task urgent, if you don’t want to affect the schedule, do it yourself ~”

I go to! How dare you talk to the product manager like that!

Na quickly to programmers to buy a take-out lunch box pressure.

Now we have tens of thousands of records. Mainly information about tourist routes. But some have been invalidated or taken off the shelves. We need to know what percentage of the total number of lines that are no longer available.

Plan 1:

Pick a few hundred of them, click on each link, look at them, make a note of them, and put them all together.

Well, it’s very feasible. Try this way first.

Press the Samsung Suicide stopwatch. Double-check the cell to make the URL a clickable hyperlink. Click, wait for 5 seconds, oh, oh, the line is working now, write a 1 in the table to indicate normal (write a 0 to indicate abnormal), look at the stopwatch, the total time is 25 seconds.

Draw 200 bars = 5000 seconds. If you add the rest time, that’s rounded up to 7,200 seconds. Good. It’s only two hours.

Na of course is veto! That’s 200 pieces of data in two hours. Pigs do it.

Scheme 2:

Take a few hundred randomly, and let the crawler crawl the basic information back, then filter or filter it, and count the badcase ratio at once.

Well, what about reptiles? Hey, Jia, can I borrow your crawler? (Jia: Do you have a machine? Do you have clearance? Can you use the Beautiful Soup framework?

. Meditate for 1 minute………………. Meditate for 3 minutes………………… Meditate for 10 minutes…………………. Sleep for 30 minutes……………

Small son, is nothing but bullying the old disk space is small, can not install the development environment. But I still have Excel.

Isn’t this a ready-made reptile?

Try a line:

Sure enough, let’s do it! (Wiping his drool while sleeping)

  1. Arrange the data simply first, at least into a table. I’ll call the table whatever it is, “table VMamaset”

2. Open “Development tools “, please see the previous related Excel tutorial.

3. Insert a module

4. Write code ~~~~~ is not difficult, really don’t lie to you, super simple. I could have done it on two crayfish.

Sub crawler(a)


Dim URL As String
Dim RowNum As Integer
Dim UrlList As Range
Dim DestiArea As Range

'First open the original file and get the URL in the source data
Windows("Donkey mother merchandise sampling. XLSX").Activate
Range("TableLvmamaSet[url]").Select

'Save the list of urls you get, because many operations in Excel affect selection, so save it temporarily
Set UrlList = Selection

'Then open the target file
Windows("Workbook 1").Activate

'writes to the destination file
For Each Cell In UrlList
 
 'Each result, because there are many rows, can only have one column for each result
 URL = Cell.Text
 RowNum = Cell.Row
 Cells(1. RowNum).Value = URL
 
 Set DestiArea = Cells(2. RowNum)
 
    With ActiveSheet.QueryTables.Add(Connection:="URL;" & URL. Destination:=DestiArea)
        .Name = URL
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

Next
End Sub
Copy the code

5. Run, have a cup of coffee, and check back in 20 minutes

And it worked. Each column is the content of a web page. They did pull the product.

6. Then use the find tool

7. 于是:

You’re done. Keep the code and you can use it again next time. Na’s going back to eating crayfish.

I’ll see you next time.