When writing a crawler, have you ever encountered a scenario where you download a specified number of files from a search result

For example, the search results are presented in 10 pages, adding up to 50 pieces of data. Now you need to download a specified amount of data from the 50 pieces of data

To implement this feature, I started thinking like this:

1. Iterate through 10 pages of data in turn and append each page to the same list so that after 10 pages of data are requested, the list contains all the results;

2. Then extract a specified amount of data from the large list for download

This worked, but there was a problem with running it: it was too slow because no matter how many data downloads you wanted to download, you had to append all the data requests to the list first, which was time-consuming and unreasonable

So another way to think about it: to download n pieces of data, just extract n pieces of data, and don’t request all the data down in advance

Concrete implementation method

The above example can be abstracted as follows

First there is a nested list

[[1.2.3.4.5], [6.7.8.9.10], [11.12.13.14.15], [16.17.18.19.20]]
Copy the code

Then extract the data from this list into a new list, such as the first three digits, the first five digits, or the first eight digits

You can do this with a double-layer for loop, and note the conditions to break out of the loop, as follows

#Python Learning Exchange group: 531509025
source = [[1.2.3.4.5], [6.7.8.9.10], [11.12.13.14.15], [16.17.18.19.20]]
target = []

def get_data(source, count) :
    for i in range(0.len(source)+1):
        temp = source[i]
        for j in temp:
            target.append(j)
            if len(target) >= count:
                break

        else:
            continue
        break

t = get_data(source, 6)
print(target)
Copy the code

Source represents the original list; Count represents the number of numbers extracted

Follow the tips from the rookie tutorial for… Else means as follows:

1. The for statement is the same as the normal statement. Else statements are executed when the loop completes normally.

2, If for is broken by break, else statement is not executed

To analyze the running process and results when count is different

1, count = 3

When count=3, we get the following result

target = [1.2.3]
Copy the code

Source contains four sublists, each containing five numbers;

If len(target)>=count is reached, the inner for loop is broken. If len(target)>=count is reached, the inner for loop is broken

Since this is not a normal exit from the inner for loop, we do not execute the following ELSE statement (ps: else statement continues, i.e. continues through the outer for loop), and then execute the berak statement following the else statement, which exits the outer for loop

In summary, target = [1, 2, 3] is obtained

2, the count = 8

When count=8, we get the following result

target = [1.2.3.4.5.6.7.8]
Copy the code

If len(target)>=count, the target will be 5 digits long. If len(target)>=count, the target will be 5 digits long

At this point, the inner for loop completes the first loop, and since the for loop completes, we execute the else statement, which is a continue for the outer for loop, which continues to fetch the second sublist of the outer for loop

When the second sublist is iterated through to the third number, target’s length is equal to 8, such that len(target)>=count, breaks out of the inner for loop

Similarly, since this is not a normal exit from the inner for loop, the next ELSE statement is not executed, and the berak statement after the else continues, breaking out of the outer for loop

In summary, target =[1, 2, 3, 4, 5, 6, 7, 8] is obtained

That’s for… Else jump out of the double layer nested loop content, hope to help everyone ~