I wrote a script in Python that automatically generates indexes for documents! A lazy person wisdom

I wrote a script in Python that automatically generates indexes

Introduction: In order to brush algorithm problems, I built a GitHub warehouse: PiperLiu/ACMOI_Journey, to record my own brush problem track, and summarize the method, experience. One requirement came to mind: could I use the program to automatically categorize and index every time I added a topic to my notes? Use Python to implement an entry-level small script, involving file reading and writing, command line parameters, array manipulation applications, etc., in this share with friends.

Need to implement

I have a Markdown document that looks like this:

# ACM/OI Journey leave traces and experience of brush questions here. The irregular methodology is summarized here [./notes/ readme.md](./notes/ readme.md). Learning materials: -OI Wiki: https://oi-wiki.org/ - Likou China: https://leetcode-cn.com/ ## Archive ## Date archiveCopy the code

Notice that the two secondary headings ## archive and ## date archive are empty.

My requirement is that when I finish A question, I record it in the ## date archive in the format: -uU date Title and Summary Category A Category B Category C… [Procedure file 1] [Procedure file 2] [Procedure file 3]…

Let’s say I swipe 2 questions today, so I’ll record them under my ## date archive, as shown below.

## date file - UU 2020.11.26 Containers with the most water "Because the two edges jointly determine the upper limit, the shorter edge is moved inward, abandoning the search for suboptimal solutions" two-pointer search [py](./vsc_leetcode/11. Py) [CPP](./vsc_leetcode/11. CPP) -uU 2020.11.27 Integer to Roman numerals "Life describes numbers from large digits, so match from large numbers and characters" matches the string [CPP](./vsc_leetcode/12. Integer to Roman numeral. CPP)Copy the code

While there is nothing under my ## archive yet, I want my script to automatically create three levels of directories in my ## archive: double pointer, search, match, string, and put the corresponding topics under it.

The end result:

# # # archive - [match] (matching) - [string] # (string) - [method] double pointer (# double pointer method) - [search] # # # # (search) match - integers to Roman numerals "life starts from large figures describe the number, So start with large numbers and characters "[CPP](./vsc_leetcode/12. CPP) 2020.11.27 ### String - Integer to Roman numerals "Life describes numbers from the large number of digits, so from the large number to the character matching" [CPP](./vsc_leetcode/12. CPP) 2020.11.27 ### Two-pointer method - the container with the most water "Because the two edges jointly determine the upper bound, the shorter edge is moved inward, abandoning the search for suboptimal solutions" [py](./vsc_leetcode/11. Py) [CPP](./vsc_leetcode/11. CPP) 2020.11.26 ### Search - The container with the most water "because the two edges jointly determine the upper bound, it moves the shorter edge inward, abandoning the search for suboptimal solutions" [py](./vsc_leetcode/11). Py) [CPP](./vsc_leetcode/11. Container with the most water. CPP) 2020.11.26 ## date file - 2020.11.26 Container with the most water "Because the two edges jointly determine the upper limit, the shorter edge is moved inward, abandoning the search for suboptimal solutions" two-pointer search [py](./vsc_leetcode/11. Py) [CPP](./vsc_leetcode/11. CPP) -2020.11.27 Integer to Roman numerals "Life describes numbers from large digits, so match from large numbers and characters" matches the string [CPP](./vsc_leetcode/12. Integer to Roman numeral. CPP)Copy the code

The Markdown engine renders the image below.

As shown above, NOT only did I add a tertiary title ### match, ### string, etc., but I also created a directory index link for the tertiary title.

The final program implementation is shown below.

Python and script files

Which brings us to Python. I think this is what Python does best: script files. Remember Python Cat had an article about why the comment symbol in Python is # instead of //.

The reason is probably that Python’s roots are in writing easy-to-use script files, similar to the shell.

Consider Python’s features: interpreted language, dynamic language, command line typing, os.system() direct command invocation… So Python is perfect for performing small tasks (script files).

The overall logical

Logic is:

List list ## archive ## archive dateCopy the code

The details are in the code (the code file refresh.py), which I have indicated in Chinese.

""" """ import os.path as osp import re def refreah(): """ THE file I'm working with is readme. md so I get its absolute path notice that the file I'm working with is in the same directory as the code file """ "dirName = op.dirname (__file__) filepath = op.join (dirname, Readme. md) """ With open(filepath, 'r+', encoding=' utF-8 ') as f: """ read the contents of the file into memory f.read() """ content = f.read() """ string split with "newline"/" carriage return" Row_list each element is a line of text. """ row_list = content.split('\n') """ Now start fetching the entries corresponding to different directories """ # found the un-packed row un_packed_rows = [] dict_cata = {} dict_row_flag = False date_row_flag = False dict_row_num = 0 date_row_num = 0 Cur_cata = None for IDx, row in enumerate(row_list): """ if dict_row_flag: if "### "in row[:4]: Cur_cata = row[4:] """ data_cata = {" match ": [matching 1, matching 2,... , "string ": [string 1, string 2... . } """ dict_cata.setdefault(cur_cata, []) elif "- " in row[:2] and not re.match('\[.*\]\(.*\)', row[2:]): """ this uses a re because the index format is - [index name](# index name) and the title format is - title program date so if you just start with -, Dict_cata [cur_cata] = [row] + dict_cata[cur_cata] else: If row == "## archive ": Dict_row_flag = True dict_ROW_num = IDx + 1 """ if date_ROW_flag: """ -uu is my own format if the topic has uU, then this is the topic I will script to add to the archive """ if '-uu' in row[:5]: un_packed_rows = [row] + un_packed_rows row_list[idx] = "- " + row[5:] else: If row == "## date archive ": Date_row_flag = True dict_ROW_flag = False date_row_num = idx + 1 # pack those rows to "## date archive """" "" for row in un_packed_rows: row = row.split(") file_num = 0 file_name = "" for ele in row: if re.match('\[.*\]\(.*\)', ele): file_num += 1 file_name += (ele + ' ') catas = row[4:-file_num] for c in catas: dict_cata.setdefault(c, []) row_ = '-' + row [3] + ' '+ file_name + row [2] dict_cata [c]. Append (row_) # del file # # file "" "" "here is empty # # file content According to the [:dict_row_num] row_list_c = row_list[date_ROW_num -2:] ## row_list_b row_list_b = [] for key in dict_cata: row_list_b.append("\n### " + key) for row in dict_cata[key]: Row_list_b [0] = row_list_b[0][1:] ROW_list = row_list_a + row_list_b + row_list_c Encoding =' utF-8 'with open(filepath, 'w', encoding=' utF-8 ') as f: for row in row_list: F.rite (row + '\n') """ print("\033[1; 34mreadme. md refresh done\033[0m") print("\033[1; 36mhttps://github.com/PiperLiu/ACMOI_Journey\033[0m") print("star" + "\033[1; 36m the above repo \033[0m" + "and practise together!") def cata_index(): This is the functional index I used to generate the index: # # # archive - [match] (matching) - [string] # (string) - [method] double pointer (# double pointer method) - [search] # (search) idea is very simple, Dirname = op.dirname (__file__) filepath = op.join (dirname, "README.md") with open(filepath, 'r+', encoding='utf-8') as f: content = f.read() row_list = content.split('\n') cata_list = [] dict_row_flag = False dict_row_num = 0 cata_row_num = 0  for idx, row in enumerate(row_list): if dict_row_flag: if cata_row_num == 0: cata_row_num = idx if "### " in row[:4]: Cata = row [4:] cata = "- []" + cata + "" +" (# "+ cata +") "cata_list. Append (cata) elif row = =" # # file ": Dict_row_flag = True dict_row_num = IDx + 1 elif row == "## date archive ": cata_list.append("\n") break # add idx row_list_a = row_list[:dict_row_num] row_list_c = row_list[cata_row_num:] row_list = row_list_a + cata_list + row_list_c with open(filepath, 'w', encoding='utf-8') as f: for row in row_list: f.write(row + '\n') refresh() cata_index()Copy the code

The end result is that I execute the script from the command line, and the document is organized automatically.

Argparse application

Notice that I entered a parameter -r above to make refresh. Py more functional and do different things with different parameters. Parameters are like different “buttons”.

I encapsulated each function in different functions and applied decoupling, meaning that the different functions did not depend on each other to prevent logical errors.

In addition, I created a new function to get parameters.

def get_args():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        '--refresh', '-r',
        action='store_true',
        help='refreah README.md'
    )

    args = parser.parse_known_args()[0]
    return args
Copy the code

In this way, we can get the -r parameter, and in the main process, we can determine whether the user uses the r function, and if so, we can call the corresponding function.

PS: If you need Python learning materials, please click on the link below to obtain them

Free Python learning materials and group communication solutions click to join

def main(args=get_args()):
    if args.refresh:
        refreah()
        cata_index()

if __name__ == "__main__":
    main()
Copy the code

Note: encoding

In addition, because it is Chinese, the encoding rules are worth noting.

For example, add #-* -coding :UTF-8 -*-; Encoding =’ ufT-8 ‘when opening the file.

Points worth improving: Better regex

If you read my code, you’ll see that the logic to read and judge lines is a bit rough.

It is not appropriate to judge only if the first four characters of the line are – [], etc., and I am judging -uu date title name and summary category A Category B category C… [Procedure file 1] [Procedure file 2] [Procedure file 3]… If else is used to determine whether there are square brackets and parentheses to distinguish category fields from program file fields.

That’s not right, so it’s hard for me to write freely in the question. One possible improvement is the use of powerful regular expression advanced properties.

I have no energy to discuss it yet, and I may revise the discussion further in the future. Welcome to keep following me.

I wrote a script in Python that automatically generates indexes for documents! A lazy person wisdom