- A deep dive on Python Type Hints
- Vicki Boykis
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: Hu Qimei
- Proofread by: Ultrasteve, Talisk, Sunui, Jiang Wu Slag
In-depth understanding of Python’s type hints
Title: Presser, by Konstantin Makovsky 1900
Introduction to the
Ever since Python’s type hints were released in 2014, people have been applying them to their code. I would hazard a guess that about 20-30% of Python 3 code currently uses hints (sometimes called comments). I’ve seen them in more and more books and tutorials in the last year.
In fact, I’m curious now — if you’re actively developing in Python 3, do you use annotations and hints in your code?
— Vicki Boykis (@vboykis) May 14, 2019
This is a typical example of what code looks like using type hints.
Code without type hints:
def greeting(name):
return 'Hello ' + name
Copy the code
Code with type hints:
def greeting(name: str) -> str:
return 'Hello ' + name
Copy the code
A common format for a prompt is usually this:
def function(variable: input_type) -> return_type:
pass
Copy the code
However, there is still a lot of confusion about what they are (for the purposes of this article, I’ll call them tips) and how they might benefit your code.
When I started investigating and measuring whether type hints worked for me, I became very confused. So, as I usually do with things I don’t understand, I decided to dig deeper, while also hoping that this article will be useful to others.
As always, if you want to comment on something you’ve seen, please feel free to pull the request.
How does a computer compile our code
To understand what core Python developers are trying to do with type hints, let’s take a few layers from Python to better understand how computers and programming languages work.
The core function of a programming language is to use the CPU for data processing and store input and output in memory.
The CPU is pretty stupid. It can do hard work, but it can only understand machine language, and its underlayer is powered by electricity. The machine language base is represented by zeros and ones.
To get those zeros and ones, we need to move from a high-level language to a low-level language, which requires compiled and interpreted languages.
The programming language is either compiled or executed (Python is interpreted and executed by an interpreter), and the code is translated into lower-level machine code that tells the lower-level components of the computer, the hardware, what to do.
There are several ways to turn code into machine-readable code: you can build binaries and have the compiler translate them (C++, Go, Rust, etc.), or you can just run the code and let the interpreter execute it. The latter is how Python (and PHP, Ruby, and similar scripting languages) works.
How does the hardware know how to store these zeros and ones in memory? The software — our code — needs to tell the hardware how to allocate memory for the data. What kind of data are these? This depends on the data type chosen by the language.
Every language has data types, and they’re often the first thing you learn when you learn programming.
You’ve probably seen tutorials like this (the excellent textbook by Allen Downey, “Think Like a Computer Scientist”) about what they are. In short, they are different ways of representing data in memory.
Depending on the language used, there may be strings, integers, and other types. For example, Python’s basic data types include:
int, float, complex
str
bytes
tuple
frozenset
bool
array
bytearray
list
set
dict
Copy the code
There are also advanced data types made up of several basic data types. For example, Python lists can contain integers, strings, or both.
To know how much memory needs to be allocated, the computer needs to know what type of data is being stored. Fortunately, Python’s built-in function getSizeof tells us how many bytes each different data type takes up.
This excellent answer tells us some approximations of “empty data structures” :
import sys
import decimal
import operator
d = {"int": 0."float": 0.0."dict": dict(),
"set": set(),
"tuple": tuple(),
"list": list(),
"str": "a"."unicode": u"a"."decimal": decimal.Decimal(0),
"object": object(),
}
# Create new dict that can be sorted by size
d_size = {}
for k, v in sorted(d.items()):
d_size[k]=sys.getsizeof(v)
sorted_x = sorted(d_size.items(), key=lambda kv: kv[1])
sorted_x
[('object'.16),
('float'.24),
('int'.24),
('tuple'.48),
('str'.50),
('unicode'.50),
('list'.64),
('decimal'.104),
('set'.224),
('dict'.240)]
Copy the code
If we sort the results, we can see that by default, the largest data structure is the empty dictionary, followed by the collection; An integer takes up very little space compared to a string.
This gives us an idea of how much memory is occupied by different types of data in the program.
Why should we care? Because some types are more efficient than others and better suited to different tasks. There are also cases where we need to do rigorous checks on types to make sure they don’t violate some of the constraints of our program.
But what exactly are these types? Why do we need them?
Here’s where the type system comes in.
Type System Introduction
Long ago, people who relied on manual math realized that when they proved equations, they could reduce many logical problems by using “types” to mark numbers or other elements in equations.
In the beginning, computer science basically relied on doing a lot of math by hand, and some principles continued. Type systems became a way to reduce the number of errors in programs by assigning different variables or elements to specific types.
Here are some examples:
- If we write software for a bank, we cannot use strings in the snippet that calculates the total amount of user accounts.
- If we’re dealing with survey data and want to know whether people did or didn’t do something, a yes or no Boolean is most appropriate.
- In a large search engine, we have to limit the number of characters allowed into the search box, so we need to type validation for certain types of strings.
In programming today, there are two types of non-stop typing systems: static and dynamic. Steve Klabnik writes:
In a static system, the compiler examines the source code and assigns “type” tags to parameters in the code, which it then uses to infer information about program behavior. In a dynamic typing system, the compiler generates code to keep track of the data types (also coincidentally called “types”) that the program uses.
What does that mean? This means that for compiled languages, you need to specify the type in advance so that the compiler can do type checking at compile time to make sure the program is sound.
Perhaps the best explanation for both is what I read recently:
I used to use statically typed languages, but for the past few years I’ve mostly used Python. The initial experience was annoying, and it felt like it just slowed me down, when Python could have just let me do what I wanted, even if I made occasional mistakes. It’s a bit like commanding someone who is inquisitive, rather than someone who always says they agree with you, but you’re not sure they understand everything correctly.
One caveat here: statically and dynamically typed languages are closely related, but are not synonyms for compiled or interpreted languages. You can compile execution using a dynamically typed language, such as Python, or interpret execution using a static language, such as Java REPL.
Data types in static and dynamic typing languages
So what’s the difference between the data types in the two languages? In static typing, you must first define the type. For example, if you use Java, your program might look like this:
public class CreatingVariables {
public static void main(String[] args) {
int x, y, age, height;
double seconds, rainfall;
x = 10;
y = 400;
age = 39;
height = 63;
seconds = 4.71;
rainfall = 23;
double rate = calculateRainfallRate(seconds, rainfall);
}
private static double calculateRainfallRate(double seconds, double rainfall) {
return rainfall/seconds;
}
Copy the code
Notice at the beginning of this program that we declare the type of the variable:
int x, y, age, height;
double seconds, rainfall;
Copy the code
The method must also contain the variables passed in for the code to compile correctly. In Java, you have to design the type from the start so that the compiler knows what to check when it compiles code to machine code.
Python hides the type. Python code looks like this:
x = 10
y = 400
age = 39
height = 63
seconds = 4.71
rainfall = 23
rate = calculateRainfall(seconds, rainfall)
def calculateRainfall(seconds, rainfall):
return rainfall/seconds
Copy the code
What’s the theory behind this?
How does Python handle data types
Python is dynamically typed, which means it only checks the types of variables you declare when you run the program. As we saw in the code snippet above, you don’t have to plan ahead for types and memory allocation.
Here’s what happened:
In Python, CPython compiles the source code into a simpler form of bytecode. These instructions are similar to CPU instructions, but they are not executed by the CPU, but by the virtual machine software. (These virtual machines do not mimic the entire operating system, just a simplified CPU execution environment)
When CPython compiles a program, how does it know the type of a variable if it does not specify a data type? The answer is it doesn’t know, it just knows that variables are objects. Everything in Python is an object until it becomes a concrete type, and that’s when it’s checked.
For a type like a string, Python assumes that anything surrounded by single or double quotes is a string. Python has a numeric type for numbers. If we try to do something on a type that Python cannot do, Python will prompt us.
For example, something like this:
Name = 'Vicki' seconds = 4.71; --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-9-71805d305c0b> in <module> 3 4 ----> 5 name + seconds TypeError: must be str, not floatCopy the code
It tells us that we cannot add strings and floating point numbers. Python does not know that name is a string and seconds is a floating-point number until the moment it executes.
In other words,
The duck type happens in this case: Python doesn’t care what type the object is when we perform addition. It cares if what is returned by the addition method it calls is reasonable, and if not, it throws an exception.
So what does that mean? If we write a piece of code in a Java or C style, we don’t encounter any errors until the CPython interpreter executes the line of code with the answer.
This has proven to be inconvenient for teams writing a lot of code. Instead of dealing with just a few variables, you’re dealing with a lot of classes calling each other, and you need to be able to check everything quickly.
If you can’t write good test code and find bugs in your program before it goes into production, you’ll break the entire system.
In general, there are many benefits to using type hints:
If you use complex data structures, or functions with many inputs, it will be easier to read the code again much later. It would be easy to just write a simple function with a single argument in our example.
But what if you’re dealing with a code base with a lot of input, like the example in the PyTorch document?
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
Copy the code
What is a Model? Let’s look at the following code.
model = Net().to(device)
Copy the code
Wouldn’t it be cool if we could specify it in the method signature without having to look at the code? Like this:
def train(args, model (type Net), device, train_loader, optimizer, epoch):
Copy the code
What is device
device = torch.device("cuda" if use_cuda else "cpu")
Copy the code
What is torch. Device? It is a special PyTorch type. If we go to the rest of the documentation and code, we can find:
A :class:`torch.device` is an object representing the device on which a :class:`torch.Tensor` is or will be allocated.
The :class:`torch.device` contains a device type ('cpu' or 'cuda') and optional device ordinal for the device type. If the device ordinal is not present, this represents the current device for the device type; e.g. a :class:`torch.Tensor` constructed with device 'cuda' is equivalent to 'cuda:X' where X is the result of :func:`torch.cuda.current_device()`.
A :class:`torch.device` can be constructed via a string or via a string and device ordinal
Copy the code
Wouldn’t it be nice if we could comment this out without having to look it up in the program?
def train(args, model (type Net), device (type torch.Device), train_loader, optimizer, epoch):
Copy the code
There are many more examples……
So type hints are helpful for programming.
Type hints also help others read your code. Code with type hints is easier to read without having to check the contents of the entire program as in the example above. Type hints improve readability.
So what has Python done to improve the same readability as statically typed languages?
Python type hints
Here’s where type hints come from, as comments next to code called type comments or type hints. I’m going to call them with type hints. In other languages, comments and hints have completely different meanings.
In Python 2 people started adding hints to their code to indicate what various functions returned.
That code would look something like this:
users = [] # type: List[UserID]
examples = {} # type: Dict[str, Any]
Copy the code
Start type hints are like comments. But over time Python moved to a more uniform approach to type hints, starting to include function comments:
Function annotations, both for parameters and return values, are completely optional.
Function annotations are nothing more than a way of associating arbitrary Python expressions with various parts of a function at compile-time.
By itself, Python does not attach any particular meaning or significance to annotations. Left to its own, Python simply makes these expressions available as described in Accessing Function Annotations below.
The only way that annotations take on meaning is when they are interpreted by third-party libraries. These annotation consumers can do anything they want with a function's annotations. For example, one library might use string-based annotations to provide improved help messages, like so:
Copy the code
As PEP 484 evolved, it was developed with MYPy, a project out of DropBox that checks types as you run the program. Remember not to check types at run time. If you try to run a method on an incompatible type, you will only have problems. Such as trying to slice a dictionary or pop a value from a string.
From the implementation details:
While these annotations are available at run time via the Annotations property, no type checking is done at run time. Instead, the proposal assumes a separate offline type checker that users can run their own source code. In essence, this type of inspector acts like a powerful linter. (Of course, individual users can use similar inspectors at run time for design execution or just-in-time optimization, but these tools are not mature enough.)
How does that work in practice?
Type checking also means that you can use the INTEGRATED development environment more easily. PyCharm, for example, provides Code completion and checking by type, just like VS Code does.
Type checks are also beneficial in another way: they can stop you from making stupid mistakes. Here’s a good example.
Here we will add a name to the dictionary:
names = {'Vicki': 'Boykis'.'Kim': 'Kardashian'}
def append_name(dict, first_name, last_name):
dict[first_name] = last_name
append_name(names,'Kanye'.9)
Copy the code
If we allowed the program to execute like this, we’d end up with a bunch of misformed entries in the dictionary.
So how to correct it?
from typing import Dict
names_new: Dict[str, str] = {'Vicki': 'Boykis'.'Kim': 'Kardashian'}
def append_name(dic: Dict[str, str] , first_name: str, last_name: str):
dic[first_name] = last_name
append_name(names_new,'Kanye'.9.7)
names_new
Copy the code
By running in mypy:
(kanye) mbp-vboykis:types vboykis$ mypy kanye.py
kanye.py:9: error: Argument 3 to "append_name" has incompatible type "float"; expected "str"
Copy the code
As we can see, MYPy does not allow this type. It makes sense to include MYPY in the test pipeline in the continuous integration pipeline.
Inherit type hints from the development environment
One of the biggest benefits of using type hints is that you get the same auto-completion functionality in the IDE as you would in a static language.
For example, let’s say you have a piece of code that just wraps the two functions you used above into a class.
from typing import Dict
class rainfallRate:
def __init__(self, hours, inches):
self.hours= hours
self.inches = inches
def calculateRate(self, inches:int, hours:int) -> float:
return inches/hours
rainfallRate.calculateRate()
class addNametoDict:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
self.dict = dict
def append_name(dict:Dict[str, str], first_name:str, last_name:str):
dict[first_name] = last_name
addNametoDict.append_name()
Copy the code
The neat thing is, now that we’ve added the type, we can see what happens when we call the class’s methods:
Start using type hints
Mypy has some good advice for developing a code base:
1. Start small -- get a clean mypy build for some files with few hints 2. Write a mypy runner script to ensure consistent results 3. Run mypy in Continuous Integration to prevent type errors 4. Gradually annotate commonly imported modules 5. Write hints as you modify existing code and write new code 6. Use MonkeyType or PyAnnotate to automatically annotate legacy codeCopy the code
To start using type hints in your own code, it’s helpful to understand the following:
First, if you are using Python primitives other than strings, integers, bools, and other primitives, you need to import the type module.
Second, through modules, there are several complex types available:
Dictionary, tuple, list, set, etc.
For example, dictionary [STR, float] means that you want to check a dictionary where the key is of type string and the value is of type float.
There are also types called Optional and Union.
Third, here is the form of the type hint:
import typing
def some_function(variable: type) -> return_type:
do_something
Copy the code
If you want to start using type hints in more depth, a lot of smart people have written tutorials. Here’s the best tutorial to get started. And it knows how you set up the test environment.
So how do you decide? To use or not to use?
Should you use type hints?
It depends on your usage scenario, as Guido and Mypy documentation states:
The goal of Mypy is not to convince everyone to write statically typed Python; statically typed programming is, and will be, entirely optional. The goal of MYPy is to increase programmer productivity and improve software quality by providing Python programmers with more choices and making Python a more competitive alternative to other statically typed languages on large projects.
Because of the overhead of setting up mypy and thinking about the type required, type hints don’t make sense for small code libraries (as in Jupyter Notebook). What is a small code base? Anything under 1K, conservatively speaking.
For large code bases, when you need to collaborate with others, package, and when you need a version control and continuous integration system, type hints can make a lot of sense and save a lot of time.
My opinion is that type hints are becoming more common. It wouldn’t be a bad thing to take the lead in using it, even in less common places, over the next few years.
Thank you
Special thanks toPeter Baumgartner.Vincent Warmerdam.Tim Hopper.Jowanza Joseph, andDan BoykisRead the draft of this article, all the remaining errors are from me 🙂
If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.