preface

Today is the first day of genwen Challenge in August. Compared with the first phase of genwen Challenge, the participation threshold of this activity has been lowered, the minimum is only 7 days of genwen Challenge! However, we will find that there are some new article requirements in this activity compared with the last one, among which there is a requirement that the ratio of code to text should not exceed 70%. How can we know the proportion of code in our article?

Began to theme

Let’s start by creating an index.md file to use as a test article in this article.

# index.mdToday is August 1, 2021, the first day of the Gwen ChallengeCopy the code

Create a checkarticle. py file in the root directory and read the contents of the index.md file to see if it works.

Get the contents of the MD file
with open('index.md'.'r', encoding='UTF-8') as f:
    content = f.read()
    print(content) 
    # Today is August 1, 2021, the first day of the Gwen Challenge
Copy the code

Create a CheckArticle class that will be used to inspect our article.

class CheckArticle:
    def __init__(self, content) :
        self.content = content
Copy the code

This class needs to pass in a content argument (markdown document content), and in __init__ we hang the content argument on the content property of the instance.

Match the content

Once we have the contents of the document, the next step is to match the code and code blocks in the document to see how much of the document they occupy.

Match the code

In markdown documentation the code consists of two backquotes, for example:

`hello world`
Copy the code

The above source code looks like this in Markdown: Hello World, which we call markdown code.

We write down regular expressions that match the code to match the “code” in our article.

` (. *?) `# match code
Copy the code

Mount the expression matching the code to the instance in the class’s __init__ method.

def __init__(self, content) :
    self.content = content
    # match code re
    self.short_code = r'`(.*?) ` '
Copy the code

Create a match_short_code method in the CheckArticle class that matches the “code” in the content of our document.

Class CheckArticle: # omit some code def match_code(self): short_code_result = re.findall(self.short_code, self.content) print(short_code_result)Copy the code

Now we run the code matching is empty because we don’t have regular expression requirements in our documentation.

We need to add some code and code blocks to the index.md file for our next test.

# index.mdToday is2021 ` `years8 ` `month1 ` `It was also the first day of the challenge```javascript
console.log('javascript')
```

An unknown piece of code

Copy the code

Run the code and see if the code matches.

with open('index.md'.'r', encoding='UTF-8') as f:
    content = f.read()
    print(CheckArticle(content).match_short_code())
Copy the code

What’s going on? Why is there an empty string?

Let’s paste the expression and the contents of the Markdown document into Regexr for a look.

Match_short_code = match_short_code = match_short_code = match_short_code = match_short_code = match_short_code = match_short_code = match_short_code

Let’s run it again and try ~ ~

def match_short_code(self) :
    short_code_result = re.findall(self.short_code, self.content)
    short_code_result = filter(None, short_code_result)
    sum = 0
    for item in short_code_result:
        sum+ =len(item)
    return sum
Copy the code

NICE! Now returning 6 means the length of the current document code is 6, the same length as in the index.md document.

Matching code block

It is also possible to wrap code in markdown documents, which is often used to wrap multiple lines of code, for example

Print (' hello World ') print(' Hello World ')
Copy the code

Let’s write down a regular expression that matches a block of code.

```([\s\S]*?) ` ` `# Match code block
Copy the code

As above, mount the expression that matches the code block to the instance.

def __init__(self, content) :
    # Ignore some code
    # Match code block re
    self.long_code = r'```([\s\S]*?) ` ` ` '
Copy the code

Create a match_long_code method in the CheckArticle class that matches the code blocks in the contents of our document

def match_long_code(self): long_code_result = re.findall(self.long_code, self.content) sum = 0 for item in long_code_result: Sum += len(item.replace("\n", "")) return sumCopy the code

Run the

No problem, the length of the returned code block is the same as the length of the code block in index.md

Let’s store the length of the code and the block of code

def __init__(self, content) :
    # Ignore some code
    # Code length
    self.short_code_len = self.match_short_code()
    Code block length
    self.long_code_len = self.match_long_code()
Copy the code

Calculate the percentage of code

Create a get_code_percent method that requires the length of the code/code block passed in

def get_code_percent(self, num) :
    return str(round((num / self.content_len) * 100.2)) + The '%'
Copy the code

In the __init__ method, add two lines of code

def __init__(self, content) :
    # Calculate the code ratio
    self.short_percent = self.get_code_percent(self.short_code_len)
    self.long_percent = self.get_code_percent(self.long_code_len)
Copy the code
Get the contents of the MD file
with open('index.md'.'r', encoding='UTF-8') as f:
    content = f.read()
    index_md = CheckArticle(content)
    print('Code to ratio', index_md.short_percent)  # Code ratio 6.52%
    print('Block ratio', index_md.long_percent)  # Code block ratio 45.65%
Copy the code

To optimize the

Target: Input the file path when running the script and output the percentage of the code directly

Clear the checkArticle. py file of all code except the code of the CheckArticle class

Create a new index.py file

# index.py
import os, sys
from CheckArticle import CheckArticle

Get the file path
file_path = input("Enter the path of the article file you want to detect (relative path supported):\n")
if not os.path.exists(file_path):
    print('File does not exist, please check the correct path')
    sys.exit()

Get the contents of the MD file
try:
    with open(file_path, 'r', encoding='UTF-8') as f:
        content = f.read()
        index_md = CheckArticle(content)
        print('Code to ratio', index_md.short_percent)  # Code ratio 6.52%
        print('Block ratio', index_md.long_percent)  # Code block ratio 45.65%
except:
    print('Reading error')
Copy the code

Let’s look at the code/block ratio for the article you’re reading right now:

Enter the path of the article file you want to detect (relative path supported):./20210801.md 7.5% code block ratio 30.96%Copy the code

The last

I wish you all the best to win your favorite prize 😁😁 in August

If you think there are any other ideas that could be improved you can post them in the comments section 📃 📃

And I’m working on an online version