English | Python Tips and was catnip, You Haven ‘t Already Seen
The original | Martin Heinz (martinheinz. Dev)
The cat under the translator | pea flowers
Disclaimer: This article is translated with the authorization of the original author. Please keep the original source for reprint. Please do not use it for commercial or illegal purposes.
Many, many articles have written about many cool features in Python, such as variable unpacking, partial functions, and enumerated iterables, but there is much more to discuss about Python, so in this article, I will try to show some features that I know about and use, but have rarely been mentioned in other articles. Let’s get started.
1. “sanitize” the input string
The problem of sanitizing user input applies to almost any program you write. It’s usually enough to convert characters to lowercase or uppercase, and sometimes you can use regular expressions to get the job done, but for complex cases, there are better ways:
user_input = "This\nstring has\tsome whitespaces... \r\n" character_map = { ord('\n') : ' ', ord('\t') : ' ', ord('\r') : None } user_input.translate(character_map) # This string has some whitespaces... "Copy the code
In this example, you can see that the space characters “n” and “t” have been replaced with single Spaces, and the “r” has been removed entirely. This is a simple example, but we can go a step further and use the unicodeData library and its Combining () function to generate a larger remapping table and use it to remove all stress from the string.
2. Slice the iterator
If you try slicing an iterator directly, you’ll get a TypeError that says the object does not take a not subscriptable, but there is a simple solution:
import itertools
s = itertools.islice(range(50), 10, 20) # <itertools.islice object at 0x7f70fab88138>
for val in s:
...Copy the code
Using itertools.islice, we can create an islice object, which is an iterator that generates what we need. But there is an important caveat, which is that it consumes all the elements before slicing and in the slicing object islice.
(For more on iterator slicing, see Python Advanced: Iterators and iterator Slicing.)
3. Skip the beginning of the iterable
Sometimes you have to deal with files that start with a variable number of unwanted lines (such as comments). Itertools again provides a simple solution:
string_from_file = """
// Author: ...
// License: ...
//
// Date: ...
Actual content...
"""
import itertools
for line in itertools.dropwhile(lambda line:line.startswith("//"), string_from_file.split("\n")):
print(line)Copy the code
This code will only print after the initial comment section. This is useful if we just want to discard the beginning of the iterator (in this case, the comment) and don’t know how much.
4. Functions that only support keyword arguments (kwargs)
When functions are required to provide (enforce) clearer arguments, it may be useful to create functions that support only keyword arguments:
def test(*, a, b):
pass
test("value for a", "value for b") # TypeError: test() takes 0 positional arguments...
test(a="value", b="value 2") # Works...Copy the code
As you can see, this problem can be easily solved by placing a single parameter before the keyword parameter. If we put positional arguments before arguments, then obviously we can have positional arguments as well.
Create an object that supports the with statement
We all know how to use the with statement, such as opening a file or acquiring a lock, but can we implement our own? Yes, we can implement the context manager protocol using the __enter__ and __exit__ methods:
class Connection:
def __init__(self):
...
def __enter__(self):
# Initialize connection...
def __exit__(self, type, value, traceback):
# Close connection...
with Connection() as c:
# __enter__() executes
...
# conn.__exit__() executesCopy the code
This is the most common way to implement context management in Python, but there is an even simpler way:
from contextlib import contextmanager
@contextmanager
def tag(name):
print(f"<{name}>")
yield
print(f"</{name}>")
with tag("h1"):
print("This is Title.")Copy the code
The code snippet above implements the content management protocol using the ContextManager decorator. The first part of the tag function (before yield) is executed when the with statement is entered, then the with block is executed, and finally the rest of the tag function is executed.
5. Save memory with __slots__
If you’ve ever written a program that created a large number of instances of a class, you’ve probably noticed that your program suddenly needs a lot of memory. That’s because Python uses a dictionary to represent the attributes of a class instance, which makes it fast, but memory is not very efficient. This is usually not a problem, however, if your program is experiencing problems, you can try using __slots__ :
class Person:
__slots__ = ["first_name", "last_name", "phone"]
def __init__(self, first_name, last_name, phone):
self.first_name = first_name
self.last_name = last_name
self.phone = phoneCopy the code
What happens here is that when we define the __slots__ attribute, Python uses small arrays of fixed size instead of dictionaries, which greatly reduces the memory required per instance. There are also some disadvantages to using __slots__ — we can’t declare any new attributes and can only use attributes in __slots__. Similarly, classes with __slots__ cannot use multiple inheritance.
6. Limit CPU and memory usage
If you don’t want to optimize your program’s memory or CPU usage, but want to limit it directly to a fixed number, Python also has a library that can do this:
import signal import resource import os # To Limit CPU time def time_exceeded(signo, frame): print("CPU exceeded..." ) raise SystemExit(1) def set_max_runtime(seconds): # Install the signal handler and set a resource limit soft, hard = resource.getrlimit(resource.RLIMIT_CPU) resource.setrlimit(resource.RLIMIT_CPU, (seconds, hard)) signal.signal(signal.SIGXCPU, time_exceeded) # To limit memory usage def set_max_memory(size): soft, hard = resource.getrlimit(resource.RLIMIT_AS) resource.setrlimit(resource.RLIMIT_AS, (size, hard))Copy the code
Here, we see two options to set the maximum CPU run time and memory usage limit. For CPU limits, we first get the soft limit and hard limit for that particular resource (RLIMIT_CPU), and then set it by the number of seconds specified by the parameter and the hard limit previously obtained. Finally, if CPU time is exceeded, we register a signal for the system to exit. For memory, we get soft and hard limits again, and set them with ‘setrLimit’ with the size parameter and the hard limit we get.
8. Control what you can import
Some languages have very obvious mechanisms for exporting members (variables, methods, interfaces), such as Golang, which exports only members that begin with a capital letter. In Python, on the other hand, everything is exported unless we use __all__ :
def foo():
pass
def bar():
pass
__all__ = ["bar"]Copy the code
Using the code snippet above, we can restrict what from some_module import * can import when used. For the example above, only bar is imported during wildcard imports. In addition, we can set __all__ to empty so that it can’t export anything, and raise AttributeError when we use wildcard mode to import from this module.
9. A simple way to compare operators
Implementing all of the comparison operators for a class can be annoying because there are many comparison operators — __lt__, __le__, __gt__, or __ge__. But what if there was an easier way? Functools. Total_ordering can be saved:
from functools import total_ordering
@total_ordering
class Number:
def __init__(self, value):
self.value = value
def __lt__(self, other):
return self.value < other.value
def __eq__(self, other):
return self.value == other.value
print(Number(20) > Number(3))
print(Number(1) < Number(5))
print(Number(15) >= Number(15))
print(Number(10) <= Number(2))Copy the code
How exactly does this work? The total_ordering decorator is used to simplify the ordering process for our class instance. Just define __lt__ and __eq__, which are the minimum requirements, and the decorator will map the rest of the operations — it fills in the blanks for us.
(Here’s the second of two posts I’ve put together for your convenience.)
Use the slice function to name slices
Using a large number of hard-coded index values can quickly mess up maintainability and readability. One way to do this is to use constants for all index values, but we can do better:
# ID First Name Last Name
line_record = "2 John Smith"
ID = slice(0, 8)
FIRST_NAME = slice(9, 21)
LAST_NAME = slice(22, 27)
name = f"{line_record[FIRST_NAME].strip()} {line_record[LAST_NAME].strip()}"
# name == "John Smith"Copy the code
In this case, we can avoid cryptic indexes by naming them using the slice function and then using them. You can also use the.start,.stop, and.stop properties to learn more about slice objects.
11. Prompt the user for a password at runtime
Many command line tools or scripts require a user name and password to operate. So if you happen to write one, you might find the getPass module useful:
import getpass
user = getpass.getuser()
password = getpass.getpass()
# Do Stuff...Copy the code
This very simple package prompts the user for a password by extracting the current user’s login name. However, it is important to note that not every system supports hidden passwords. Python will try to warn you, so be sure to read the warning on the command line.
Find word/string close matches
Now, some arcane features of the Python standard library. If you find yourself needing to use something like Levenshtein Distance [2] to find similar words for certain input strings, Python’s Difflib provides support.
import difflib
difflib.get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'], n=2)
# returns ['apple', 'ape']Copy the code
Difflib. getClosematches looks for the best “good enough” match. Here, the first argument matches the second argument. We can also provide the optional argument n, which specifies the maximum number of matches to return. Another optional keyword argument, cutoff (0.6 by default), sets a threshold for string score.
13. Use the IP address
If you must use Python for web development, you may find the IpAddress module very useful. One scenario is to generate a series of IP addresses from CIDR (Classless Inter-domain Routing) :
Import ipaddress net = ipaddress.ip_network('74.125.227.0/29') # Works for IPv6 too # IPv4Network('74.125.227.0/29') for Addr in NET: print(addr) # 74.125.227.0 # 74.125.227.1 # 74.125.227.2 # 74.125.227.3 #Copy the code
Another nice feature is to check network membership for IP addresses:
Ip_address ("74.125.227.3") IP in net # True IP = ipaddress.ip_address("74.125.227.12") IP in net # FalseCopy the code
There are many more interesting features that can be found here [3] and I won’t go over them again. Note, however, that there is only limited interoperability between the IPAddress module and other network-related modules. For example, you cannot treat IPv4Network instances as address strings – you need to convert them using STR first.
The debugger crashed in the Shell
If you’re someone who rejects ides and codes in Vim or Emacs, you might find it useful to have a debugger like the one you have in an IDE.
You know what? You have one — just run your program with Python3.8 -i — and once your program terminates, -i launches the interactive shell, where you can view all the variables and call the functions. Neat, but what about using an actual debugger (PDB)? Let’s use the following program (script.py) :
def func():
return 0 / 0
func()Copy the code
And run the script using python3.8 -i script.py: python3.8
# Script crashes...
Traceback (most recent call last):
File "script.py", line 4, in <module>
func()
File "script.py", line 2, in func
return 0 / 0
ZeroDivisionError: division by zero
>>> import pdb
>>> pdb.pm() # Post-mortem debugger
> script.py(2)func()
-> return 0 / 0
(Pdb)Copy the code
We have seen the crash, now let’s set a breakpoint:
def func():
breakpoint() # import pdb; pdb.set_trace()
return 0 / 0
func()Copy the code
Now run it again:
script.py(3)func()
-> return 0 / 0
(Pdb) # we start here
(Pdb) step
ZeroDivisionError: division by zero
> script.py(3)func()
-> return 0 / 0
(Pdb)Copy the code
Most of the time, printing statements and error messages are enough for debugging, but sometimes you need to poke around to understand what’s going on inside your program. In these cases, you can set a breakpoint, and the program will stop at the breakpoint when it executes. You can examine the program, such as listing function arguments, evaluating expressions, listing variables, or just stepping as shown above.
The PDB is a full-featured Python shell that can theoretically execute anything, but you’ll need some debugging commands, which can be found here [4].
Define multiple constructors in a class
Function overloading is a very common feature in programming languages (not including Python). Even if you can’t override normal functions, you can still override constructors using class methods:
import datetime
class Date:
def __init__(self, year, month, day):
self.year = year
self.month = month
self.day = day
@classmethod
def today(cls):
t = datetime.datetime.now()
return cls(t.year, t.month, t.day)
d = Date.today()
print(f"{d.day}/{d.month}/{d.year}")
# 14/9/2019Copy the code
Instead of using class methods, you might prefer to put all the logic for the replacement constructor into __init__ and use *args, **kwargs, and a bunch of if statements. That might work, but it becomes difficult to read and maintain.
Therefore, I recommend putting very little logic into __init__ and doing everything in a separate method/constructor. This results in clean code for both the maintainer and the user of the class.
Use decorators to cache function calls
Have you ever written a function that performs expensive I/O operations or some fairly slow recursion that might benefit from caching (storing) its results? If you do, there is a simple solution, which is to use FuncTools’s lru_cache:
from functools import lru_cache
import requests
@lru_cache(maxsize=32)
def get_with_cache(url):
try:
r = requests.get(url)
return r.text
except:
return "Not Found"
for url in ["https://google.com/",
"https://martinheinz.dev/",
"https://reddit.com/",
"https://google.com/",
"https://dev.to/martinheinz",
"https://google.com/"]:
get_with_cache(url)
print(get_with_cache.cache_info())
# CacheInfo(hits=2, misses=4, maxsize=32, currsize=4)Copy the code
In this case, we used cacheable GET requests (up to 32 cached results). You can also see that we can use the cacheInfo method to check the cache information of a function. The decorator also provides clearCache methods to invalidate cached results.
I also want to point out that this function should not be used with functions that have side effects, or with functions that create mutable objects on every call.
17. Look for the most frequent elements in the iterable
Looking up the most common elements in a list is a very common task, and you could use a for loop and a dictionary (map), but this isn’t necessary because the Collections module has the Counter class:
from collections import Counter
cheese = ["gouda", "brie", "feta", "cream cheese", "feta", "cheddar",
"parmesan", "parmesan", "cheddar", "mozzarella", "cheddar", "gouda",
"parmesan", "camembert", "emmental", "camembert", "parmesan"]
cheese_count = Counter(cheese)
print(cheese_count.most_common(3))
# Prints: [('parmesan', 4), ('cheddar', 3), ('gouda', 2)]Copy the code
In fact, Counter is just a dictionary that maps elements to occurrences, so you can use it as a normal dictionary:
print(cheese_count["mozzarella"])
# Prints: 1
cheese_count["mozzarella"] += 1
print(cheese_count["mozzarella"])
# Prints: 2Copy the code
In addition, you can easily add more elements using the update(more_words) method. Another cool feature of Counter is that you can combine and subtract instances of Counter using mathematical operations (addition and subtraction).
summary
Not all of these features are essential or useful in everyday Python programming, but some may come in handy from time to time, and they may simplify tasks that could otherwise be tedious and annoying.
I should also point out that all of these features are part of the Python standard library, although some of them look very much like non-standard content of the standard library to me. So, whenever you want to implement something in Python, look in the standard library first, and if you can’t find it, you probably haven’t looked hard enough (if it doesn’t exist, it’s certainly in some tripartite library).
If you use Python, I think most of the tips shared here will be useful almost every day, so I hope they come in handy. Also, if you have any thoughts on any of these Python tricks and manipulations, or if you know of a better way to solve the above problem, please let me know! 🙂
A link to the
[1] the original address: https://martinheinz.dev/blog/1
Original text: https://mp.weixin.qq.com/s/vaFL75hm1lx3mvURY4V6_A
[2]Â Levenshtein distance:Â https://en.wikipedia.org/wiki/Levenshtein_distance
[3] here: https://docs.python.org/3/howto/ipaddress.html
[4] here: https://docs.python.org/3/library/pdb.html%23debugger-commands#debugger-commands
The public account “Python Cat”, this serial quality articles, cat philosophy series, Python advanced series, good books recommended series, technical writing, quality English recommended and translation, etc., welcome to pay attention to oh.