The problem

The previous mathematical OCR rendering mathematical formula used katex to render, front-end solution, we need to input Chinese when writing the formula, such as:

Fe_{2}O_{3} + 3 C O \stackrel{high temperature}{=} 2 F e + 3CO_{2}
Copy the code



Or:

c = \sqrt{a^{} square + b_{xy}^{} square + e^Exponent {x}}Copy the code

Our service supports full Latex syntax, so chemical formulas and data formulas are recognized using the new service. After all, Katex is a fast web mathematical formula renderer for Latex. Now let’s combine the two formulas and render them on our service

Fe_ {2} O_ \ {3} + 3 C O stackrel temperature {} {} = 3 co_ 2 F e + C = {2} \ \ \ SQRT {a ^ {} square + b_ {x, y} ^ {} square + e ^ power {x}}Copy the code

WTF! Isn’t the Chinese problem solved? How did it go wrong again? Chinese has been processed, why does it not take effect?

Some examples:

Fe_{2}O_{3} + 3CO \ stackRel {HighTemperature}{=} 2 Fe + 3CO_{2Copy the code

Fe_{2}O_{3} + 3CO \stackrel{HighTemperature}{=} 2 Fe + 3CO_{2}$Copy the code

Fe_ {2} O_ \ {3} + 3 C O stackrel {\ mbox {high temperature}} {} = 3 co_ 2 F e + {2}Copy the code

Analysis of the

In-line formula and in-line formula

Latex uses $and $$as the start and end characters for the in-line and inter-line formulae respectively. If Chinese characters are displayed in the in-line formulae, we need to wrap them with \mbox{} so that the Chinese characters can be displayed normally in the formulae. Before that, we typed the Chinese characters directly after the Chemfig formula. Since chemfig has obvious initial judgment, our Chinese latex is not recognized as part of the formula, so it can be displayed normally. Once we use Chinese in the formula, there will still be a problem that Chinese cannot be rendered, but this problem does not exist in Katex, it should be katex’s adaptation

The solution

In accordance with standard latex syntax, users enter the formula with \mbox{} or mark the end of the formula with **$**, so that the formula in the line and the Chinese outside the formula can be displayed normally. Since the mathematical formula has no obvious starting mark, it is possible to wrap all successive Chinese characters in the background with \mbox{}. The corresponding successive Chinese characters need to be manually intercepted in the code and wrapped with mbox

To solve

There is no doubt that in order to maintain the inertia of user use, plan two is adopted, on the code

def with_mbox(mix_str) :
    """ Mixed string auto-fill mbox :param mix_str: chemfig expression :return: auto-wrap continuous Chinese chemfig expression """
    flag = False
    t = ' '
    for char in mix_str:
        if not flag and is_chinese(char):
            flag = True
            t += "\\mbox{" + char
        elif flag and not is_chinese(char):
            t += "}" + char
            flag = False
        elif is_chinese(char):
            t += char
        else:
            t += char
            flag = False
    if is_chinese(t[len(t) - 1]):
        t += "}"
    return t


def is_chinese(check_char) :
    "" "check whether Chinese characters, including punctuation marks in Chinese: param check_char: character: return: True | False "" "
    if u'\u4e00' <= check_char <= u'\u9fff' or is_zw_punctuation(check_char):
        return True
    return False


def is_zw_punctuation(char_arr) :
    punctuation = "" "! ? . . "" # $% & '() * +- / :; [] [] [] ⦅ ways the ways in which the ways exist [] [] [] [] [] [] [] Leave detailed arguments 〚〛 ~ 〟 ~ 〾〿... DE man "" "
    re_punctuation = "[{}] +".format(punctuation)
    result = re.match(re_punctuation, char_arr)
    return result is not None
Copy the code

Chem_fig = with_mbox(request.json[‘chemfig’]

Refer to the link

About TeX, LaTeX and KaTeX: blog.csdn.net/wobushisong… Python matching Chinese characters: www.cnblogs.com/iamjqy/p/68…