Python is a strongly typed, dynamically typed language. Developers can assign types to objects dynamically, but type mismatches are not allowed. Dynamic typing helps developers write code with ease, however, as the saying goes: Dynamic is good for a while, refactoring the crematorium. Dynamic typing also causes a lot of trouble. What would be good if dynamic languages could add static typing tags? This article will focus on Python’s support for static typing, the current state of the community, the introduction and comparison of type checking tools, and the actual practice of type parsing.

The author | | ali don’t like source technology public number

A background

Python is a strongly typed, dynamically typed language. Developers can assign types to objects dynamically (dynamically), but type mismatches are not allowed (strongly typed, such as STR and int cannot be added).

Dynamic typing helps developers write code with ease, however, as the saying goes: Dynamic is good for a while, refactoring the crematorium. Dynamic typing also brings a lot of trouble. If a dynamic language can add static type tags, there are several main benefits:

  • It’s easier to write. With a variety of IDE tools, you can define jump, type prompts and so on.
  • Encoding is more reliable. With the help of type definitions, many tools can detect semantic errors early in the static coding phase.
  • Refactoring is more reassuring. The input and output parameters of the interface are clarified to make the code reconfiguration more clear and stable.

Most of the current mainstream languages support static typing, such as Java, Go, and Rust. Dynamic languages (Python, JS) are also embracing static typing, such as TypeScript.

This article focuses on Python’s support for static typing, the current state of the community, the introduction and comparison of type checking tools, and type parsing in action.

Python’s static typing support

Python3.0 introduced the syntax for type annotations back in 2006 and listed many improvements.

Def add(a:int, b:int) -> int: return a + bCopy the code

As Python3.5 continues to evolve, it is possible to do Type Hints with Type annotations, and the IDE can do Type Checking.

Moving on to Python3.7, static typing support is almost complete.

Let me introduce you to type checking tools and some basic concepts.

Introduction to the three types of check tools

Python type-checking tools are available from both Python authors and major manufacturers:

The functions of these type resolution tools are similar. Here is a brief introduction:

1 mypy

The first official release of Mypy was developed by The father of Python, Guido Van Rossum, and is integrated by various major editors (PyCharm, Emacs, Sublime Text, VS Code, etc.) with a large user base and documentation experience.

2 pytype

Google’s PyType does type checking and provides some useful gadgets. Here’s a quick look at how they can be used:

  • Annotate-ast, the ast tree marking tool in the process.
  • Merge-pyi, which merges the generated PYi file back into the original file, can even hide the type and load it during type checking.
  • Pytd-tool, a tool that parses pyi files into pyType customized PyTD files.
  • Pytype-single, which can parse a single Python file given all the dependent Pyi files.
  • Pyxref, a generator for cross references.

3 pyre

Facebook’s Pyre-Check has two special features:

  • Watchman allows you to listen to code files and track changes.
  • Query function, you can do local regional check of source code, such as Query the type of an expression in a line, Query all methods of a class and return to the list, avoid global check.

4 pyright

Microsoft’s PyRight is the latest to open source and claims the following benefits:

  • Speed is fast. It is five times or more faster than Mypy and other checkers written in Python.
  • Independent of the Python environment. It is written in TypeScript, runs on Node, and does not rely on Python environments or third-party packages.
  • Strong configurability. Configuration is supported freely, allowing you to specify different runtime environments (PYTHONPATH Settings, Python versions, platform targets).
  • Check items are complete. Support for checking types and other syntax items (peP-484, PEP-526, PEP-544), function return values, class variables, global variables, and even conditional loop statements.
  • Command line tools. It includes two VS Code plug-ins: a command line tool and a Language Server Protocol.
  • Built-in Stubs. A copy of Typeshed is used (note: use static PYI files and check the built-in modules, standard libraries, and tripartite components).
  • Language service features. Hover prompts, jump to symbol definition, real-time editing feedback.

Introduction to the use of four PyTypes

Let’s focus on PyType. Why pyType is selected? Firstly, Mypy is relatively old, and many functions are not as novel and practical as new tools. The plan is to use Python LSP to process Python files to provide some syntax services. Pyre-check uses Ocamel, so we’ll use Python’s PyType to do this, and PyType provides some utilities, such as parsing a PyI file, Generate PyI files based on Python files.

1 Basic Concepts

Pyi file

The “I” of PyI refers to Interfiace, which stores Python file type definitions as interfaces to PyI files to aid in type checking.

Pycharm (Python 3.6) Typeshed Stubs has a number of built-in PyI files that assist in type cues and localization during coding.

Typeshed Stubs

The typeshed Stubs mentioned above is a pre-integrated pyI collection, and PyCharm seems to maintain a bit of data on its own. Many larger open source projects are also offering StuBs, such as pyTorch. Tensorflow is also being considered.

Many Python libraries require a lot of work to make PYI, and there are a lot of C API calls, so you need to be patient.

2 of actual combat

I read the source code of PyType, the more practical code and requirements to do a combination of the following examples:

The overall effect

import logging import sys import os import importlab.environment import importlab.fs import importlab.graph import importlab.output from importlab import parsepy from sempy import util from sempy import environment_util from pytype.pyi  import parserCopy the code

Sample Demo, using the Importlab tool, resolves project space dependencies and corresponding PYI files:

def main(): ROOT = '/path/to/demo_project' https://github.com/python/typeshed TYPESHED_HOME = '/ path/to/TYPESHED_HOME' util. Setup_logging typeshed loaded (#), If TYPESHED_HOME is configured incorrectly, Returns None typeshed = environment_util.initialize_typeshed_or_return_none(TYPESHED_HOME) # load the valid file inputs = in the target directory Env = environment_util.create_importlab_environment(inputs, inputs, Typeshed) # based on pyi and project files are generated import graph import_graph = importlab. Graph. ImportGraph. Create (env, inputs, Trim =True) # Print the entire dependency tree logging.info('Source tree:\n%s', Importlab. Output. Formatted_deps_list (import_graph) # import module alias e.g. Import numpy as np - > {' np: 'numpy'} alias_map = {} # Import the name of the module and the mapping of the specific PYI file. '/path/to/ OS /__init__.pyi'} import_path_map = {} # alias_map Alias_map key for file_name in inputs: Resolved # error resolved # error resolved # error resolved # error resolved # error resolved # error resolved # error resolved # error resolved Resolved, unresolved) = import_graph.get_file_deps(file_name) for item in resolved: item_name = item.replace('.pyi', '') \ .replace('.py', '') \ .replace('/__init__', '').split('/')[-1] import_path_map[item_name] = item for item in unresolved: file_path = os.path.join(ROOT, item.new_name + '.py') import_path_map[item.name] = file_path import_stmts = parsepy.get_imports(file_name, env.python_version) for import_stmt in import_stmts: Alias_map [import_stmt. New_name] = import_stmt. Name print(' n\n') Either by introducing modules are using object associated with the print (' \ n \ n# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # \ n \ n ') print (' for code search scene, only need alias_map, ') print('alias_map: ', alias_map) # For code completion scenarios, we need to further parse the current file and the referenced Pyi file. If the current file is an __init__ file, Should further to global search in the all files in the directory method print (' \ n \ n# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # \ n \ n ') print (' scenes for code completion, it is necessary to further resolve the current file and reference of pyi files, If the current file is an __init__ file, go further to the global search ') print('import_path_map: ', import_path_map) print (' \ n \ n \ n following by pytype tools, AST parse pyi file to analyze three sides rely on return type, so as to resolve the current variable type \ n \ n ') # through pytype parsing, to parse the dependence of pyi file, Fname = '/path/to/parsed_file' with open(fname, 'r') as reader: fname = '/path/to/parsed_file' with open(fname, 'r') as reader: lines = reader.readlines() sourcecode = '\n'.join(lines) ret = parser.parse_string(sourcecode, filename=fname, python_version=3) constant_map = dict() function_map = dict() for key in import_path_map.keys(): v = import_path_map[key] with open(v, 'r') as reader: lines = reader.readlines() src = '\n'.join(lines) try: res = parser.parse_pyi(src, v, key, 3) except: continue # Alias # Classes for constant in res.constants: constant_map[constant.name] = constant.type.name for function in res.functions: signatures = function.signatures sig_list = [] for signature in signatures: sig_list.append((signature.params, signature.return_type)) function_map[function.name] = sig_list var_type_from_pyi_list = [] for alias in ret.aliases: variable_name = alias.name if alias.type is not None: Typename_in_source = alias.type.name typename = typename_in_source # If '.' not in typename: Continue if typename. Split ('.')[0] in alias_map: real_module_name = alias_map[typename.split('.')[0]] typename = real_module_name + typename[typename.index('.'):] if typename in function_map: possible_return_types = [item[1].name for item in function_map[typename]] var_type_from_pyi_list.append((variable_name, possible_return_types)) if typename in constant_map: possible_return_type = constant_map[typename] var_type_from_pyi_list.append((variable_name, Possible_return_type) pass the print (' \ n \ n# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # \ n \ n ') print (' these things come from PYI file analysis of return value type ') for the item In var_type_from_pyi_list: print (' variable name: 'item [0],' return type: 'item [1]) if __name__ = = "__main__' : sys. Exit (the main ())Copy the code

Sample code parsed:

# demo.py
import os as abcdefg
import re
from demo import utils
from demo import refs


cwd = abcdefg.getcwd()
support_version = abcdefg.supports_bytes_environ
pattern = re.compile(r'.*')


add_res = utils.add(1, 3)
mul_res = refs.multi(3, 5)


c = abs(1)
Copy the code

Specific steps

First, PyType leverages another Google open source project: ImportLab.

Importlab is used to analyze dependencies between files. Files in the typeshed directory can also be put into the environment. Importlab generates dependency diagrams.

env = environment_util.create_importlab_environment(inputs, Typeshed) import_graph = importlab. Graph. ImportGraph. Create (env, inputs, trim = True) # if pyi file matching, Resolved # if it depends on a Build_in dependency, it will be skipped. # If it depends on a custom dependency, it will be resolved in unresolved. Resolved, unresolved) = import_graph.get_file_deps(file_name)Copy the code

Using the import graph we get the source of the variable (reference alias, method call return value) :

{'ast': 'ast', 'astpretty': 'astpretty', 'abcdefg': 'os', 're': 're', 'utils': 'demo.utils', 'refs': 'demo.refs', 'JsonRpcStreamReader': 'pyls_jsonrpc.streams.JsonRpcStreamReader'}
Copy the code

Where are the dependencies that can be referenced directly from the dependency graph:

import_path_map: {'ast': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/ast.pyi', 'astpretty': '/ Users/zhangxindong/Desktop/search/code/sempy/venv/lib/python3.9 / site - packages/astpretty. P y', 'OS: '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/os/__init__.pyi', 're': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/re.pyi', 'utils': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/demo/utils.py', 'refs': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/demo/refs/__init__.py', 'streams': '/ Users/zhangxindong/Desktop/search/code/sempy/venv/lib/python3.9 / site - packages/pyls_jsonrpc/streams. Py'}Copy the code

Next, it’s time to parse the corresponding file. My requirement is to get the return value type of some method. In the case of pyI files, PyType helps us to parse and then we call the relationship to match.

Print (' \ n \ n \ n following by pytype tools, AST parse pyi file to analyze three sides rely on return type, so as to resolve the current variable type \ n \ n ') # through pytype parsing, to parse the dependent pyi file, Fname = '/path/to/parsed_file' with open(fname, 'r') as reader: fname = '/path/to/parsed_file' with open(fname, 'r') as reader: lines = reader.readlines() sourcecode = '\n'.join(lines) ret = parser.parse_string(sourcecode, filename=fname, python_version=3) constant_map = dict() function_map = dict() for key in import_path_map.keys(): v = import_path_map[key] with open(v, 'r') as reader: lines = reader.readlines() src = '\n'.join(lines) try: res = parser.parse_pyi(src, v, key, 3) except: continue # Alias # Classes for constant in res.constants: constant_map[constant.name] = constant.type.name for function in res.functions: signatures = function.signatures sig_list = [] for signature in signatures: sig_list.append((signature.params, signature.return_type)) function_map[function.name] = sig_list var_type_from_pyi_list = [] for alias in ret.aliases: variable_name = alias.name if alias.type is not None: Typename_in_source = alias.type.name typename = typename_in_source # If '.' not in typename: Continue if typename. Split ('.')[0] in alias_map: real_module_name = alias_map[typename.split('.')[0]] typename = real_module_name + typename[typename.index('.'):] if typename in function_map: possible_return_types = [item[1].name for item in function_map[typename]] # print('The possible return type of', typename_in_source, 'is', possible_return_types) var_type_from_pyi_list.append((variable_name, possible_return_types)) if typename in constant_map: possible_return_type = constant_map[typename] var_type_from_pyi_list.append((variable_name, possible_return_type)) passCopy the code

Such as:

pattern = re.compile(r'.*')
Copy the code

From/Users/zhangxindong/Desktop/search/code/sempy sempy/typeshed/stdlib/re pyi file, we loaded the two methods are re.com running, just in different, All return values are of Pattern type.

So we know that the pattern variable is of type re.pattern.

  • These are the return value types parsed from pyI files.
  • Variable name CWD Return type: [‘ STR ‘]
  • Variable name support_VERSION Mandatory type: bool
  • Variable name pattern Return type: [‘typing. pattern ‘, ‘typing.pattern ‘]

Five applications

Part of Python syntax analysis has been applied in aliyun Dev Studio’s code document search recommendation and intelligent code completion.

1 Code document search recommendation

When a developer does not know how to use an API (such as call method or method entry parameter), he can move the mouse over the specified API to display the API outline provided by the intelligent coding plug-in. Developers can click “API Documentation Details” to see the official API documentation, code examples and other details in the right pane, or directly search for required API code documents. Currently supports JavaScript, Python language code document search recommendations.

In the process of document collection, we can get the API name and the corresponding class of THE API. In the actual code, we can get the class information based on the method called through syntax analysis, so as to be used for document search.

2. Intelligent code completion

When developers write codes, the intelligent coding plug-in will automatically sense the code context and provide developers with accurate code completion candidates. The code completion candidates marked with ✨ symbol are the intelligent code completion results. Currently support Java, JavaScript, Python language code intelligent completion.

In the whole process of code completion, the class information of variables used by users can be more accurately learned through syntax analysis, which helps filter out unreasonable options recommended by the deep learning model and recall some reasonable completion items based on the internal method set of the class.

Six summarize

The ideas and tools for Python static typing support have been refined, but due to heavy historical baggage and a lack of community push, the actual results are limited. In addition, official, major factories and local IDE have their own implementation and analysis methods, has not reached a unified standard and format. You can choose your own way of parsing based on the advantages and disadvantages mentioned above and the matching toolset and data set. Expect static typing support from the Python community to improve.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.