Interview for a Python reptile engineer


Basic Python skills

1. Briefly describe Python’s features and advantages

Python is an open source interpreted language. Compared with Java C++ and other languages, Python has dynamic features and is very flexible.

2. What data types do Python have?

Python has six built-in data types, of which immutable data types are Number, String, and Tuple, and mutable data types are List, Dict, and Set.

3. The difference between lists and tuples

Lists and tuples are both iterable objects that can be loiterable, sliced, and so on, but tuples are immutable. The immutable nature of a tuple makes it a key in a Dict.

4. How does Python work

CPython:

When a Python program is run, the code in the.py file is compiled into byte code, and the compiled result is stored in an in-memory PyCodeObject, which is then interpreted and run by the Python virtual machine. When the program is finished running, the Python interpreter saves the PyCodeObject to a PYC file. At each run time, Python looks for a PYC file with the same name as the file. If pyC exists, the pyC file is compared with the change record, and the pyC file is generated based on the change record.

5. Why Python runs slowly

A). Python is not a strongly typed language, so the interpreter needs to check the data type of a variable, as well as data type conversions, comparison operations, and references to variables.

B). Python’s compiler starts faster than JAVA’s, but it starts compilation almost every time.

C). Python’s object model causes inefficient memory access. Numpy’s pointer points to the value of the cached data, while Python’s pointer points to the cached object, which points to the data:



6. What are the solutions to Python’s slow problems

A). Other interpreters can be used, such as PyPy and Jython etc.

B). CPython can be used for applications with high performance requirements and lots of statically typed variables.

C). For applications with many IO operations, Python provides asyncio module to improve asynchronous capability.


Describe the global interpreter lock GIL

Each thread needs to obtain the GIL before execution to ensure that only one thread can execute the code at the same time, that is, only one thread can use the CPU at the same time, that is to say, multi-threading does not execute simultaneously in the real sense. However, locks can be released during IO operations (which is why Python can be asynchronous). And if you want to use a multi-core CPU, you can use multi-processes.

8. Deep copy shallow copy

A deep copy copies the object itself to another object, and a shallow copy copies a reference to an object to another object. Therefore, when the copied object changes, the value of the original object in the deep copy will not change, but the value of the original object in the shallow copy will be changed.

9. The difference between is and ==

Is stands for object identity, and == stands for equality.

Is is used to check whether the identifiers of objects are consistent, that is, to compare whether the addresses in memory of two objects are the same, while == is used to check whether two objects are equal. But to improve system performance, Python keeps a copy of the value of a small string and points directly to that copy when creating a new string. Such as:

a = 8
b = 8
a is b
Copy the code

10. File reading and writing

Describe the differences and functions of read, readline, and readlines in file reading

In addition to the range of content they read, they also differ in the type of content they return.

Read () reads the entire file, putting it into a string variable that returns STR.

Readline () reads a line, puts it in a string variable, and returns STR.

Readlines () reads the entire contents of the file and places them in a list of behavior units, returning the list type.

11. One line of code

Take the product of the elements in [0, 1, 2, 3, 4, 5] and print out the tuples using anonymous functions and derivations respectively.

print(tuple(map(lambda x: x * x, [0, 1, 2, 3, 4, 5])))
print(tuple(i*i for i in[0, 1, 2, 3, 4, 5])Copy the code


12. One line of code

Calculate n factorial with reduce (n! = 1 x 2 x 3 x… (n)

print(reduce(lambda x, y: x*y, range(1, n)))
Copy the code

13. One line of code

Filter and print the set of numbers up to 100 divisible by 3

print(set(filter(lambda n: n % 3 == 0, range(1, 100))))
Copy the code

14. Use one line of code

text = 'Obj{"Name": "pic", "data": [{"name": "async", "number": 9, "price": "$3500"}, {"name": "Wade", "number": 3, "price": "$5500"}], "Team": "Hot"'
Copy the code

Tuples of player prices in printed text, e.g. ($3500, $5500)

print(tuple(i.get("price") for i in json.loads(re.search(r'" (. *) "', text).group(0))))
Copy the code

Please write down the basic skeleton of the recursion

def recursions(n):
 if n == 1:
 # Exit condition
 return 1
 # continue recursing
 return n * recursions(n - 1)
Copy the code

16, slicing

Write the output below

tpl = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
print(tpl[3:])
print(tpl[:3])
print(tpl[::5])
print(tpl[-3])
print(tpl[3])
print(tpl[::-5])
print(tpl[:])
del tpl[3:]
print(tpl)
print(tpl.pop())
tpl.insert(3, 3)
print(tpl)
Copy the code


[0, 5, 10] [0, 25, 50, 75, 80, 85, 90, 95] [0, 25, 50, 75] 85 15 [95, 70, 45, 20] [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95] [0, 5, 10]Copy the code

17. File path

Prints the path of the current file directory

import os
print(os.path.dirname(os.path.abspath(__file__)))
Copy the code

Prints the current file path

import os
print(os.path.abspath(__file__))
Copy the code

Prints the file directory paths of the two layers above the current file

import os
print(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
Copy the code

18. Please write down the results and answer the questions

tpl = (1, 2, 3, 4, 5)
apl = (6, 7, 8, 9)
print(tpl.__add__(apl))
Copy the code

Question: Has the value of TPL changed?

The running results are as follows:

(1, 2, 3, 4, 5, 6, 7, 8, 9)
Copy the code

A: Tuples are immutable; they generate new objects

19. Please write down the results and answer the questions

name = ('James'.'Wade'.'Kobe')
team = ['A'.'B'.'C']
tpl = {name: team}
print(tpl)
apl = {team: name}
print(apl)
Copy the code

Question: Does this code run? Why is that? What are the results?

A: This code does not run completely and will throw an exception at APL because dictionary keys can only be immutable objects and list is mutable and therefore cannot be used as dictionary keys. The results are as follows:

{('James'.'Wade'.'Kobe') :'A'.'B'.'C']}
TypeError
Copy the code

20. Decorators

Write the decorator code skeleton

def log(func):
 def wrapper(*args, **kw):
 print('call %s():' % func.__name__)
 return func(*args, **kw)
 return wrapper
Copy the code

A brief description of the role of decorators in Python:

Add new functionality to a function without changing the original code.

Multi-process multi-threading

Is multi-process more stable or multi-thread more stable? Why is that?

Multiple processes are more stable because they run independently and do not affect other processes because of a crash.

What is the fatal drawback of multithreading?

Because all threads share process memory, the failure of any one thread can directly cause the entire process to crash.

What are the methods of interprocess communication?

Share variables, queues, pipes.

Second, Python details

1, join string with join or +

When the + concatenate string is used, a new block of memory is allocated for each execution of the + operator, and the result of the previous + operation and the right operator of this operation are copied to this block of memory. Therefore, the + concatenate string involves several memory requests and copies. When join joins strings, it calculates how much memory is needed to store the results, and then applies for the required memory and copies the strings. This is why join has better performance than +. Therefore, join should be preferred when joining arrays of strings.

Python garbage collection mechanism

Refer to https://blog.csdn.net/xiongchengluo1129/article/details/80462651

Garbage collection in Python is dominated by reference counting and supplemented by generational collection. The flaw in reference counting is the problem of circular references.

In Python, if an object has zero references, the Python virtual machine reclaims the object’s memory.

The principle of reference counting is that each object maintains an OB_refCNt, which keeps track of the number of times the object has been referenced, that is, how many references refer to the object. When the object is created, the object is referenced, the object is passed into a function, and the object is stored in the container, the reference counter of the object is +1

Object is created a=14

Object referenced b=a

The object is passed as an argument to the function func(a)

List={a, “a”, “b”,2}

Corresponding to the above situation, when the object alias is destroyed by DEL, when the object reference is given to a new object, after the execution of hanshu, when deleted from the container, the object reference counter -1

Del a when the alias of the object is explicitly destroyed

When the object’s citation alias is assigned to a new object, a=26

When an object leaves its scope, such as a func function, the reference counter for local variables in the function will be -1 (but not for global variables).

When the element is removed from the container, or when the container is destroyed.

When the reference counter to the memory pointing to this object reaches zero, the memory will be freed by the Python virtual machine.

Sys.getrefcount (a) checks the reference count of object A, but it is one more than the normal count because a is passed in when the function is called, which increases the reference count of a by one

Advantages of reference counting:

1, high efficiency

2. No pauses at runtime: Once there are no references, memory is freed. Don’t wait for a specific moment like other mechanics. Another benefit of real-time is that the time spent processing reclaimed memory is spread over the usual time.

3. Objects have a defined life cycle

4. Easy to implement

Disadvantages of reference counting:

1. Maintaining reference counts consumes resources. The number of times reference counts are maintained is proportional to reference assignments, unlike mark and Sweep, which are basically related to the amount of memory reclaimed.

2. Unable to solve the problem of circular references. A and B refer to each other without any external references to either A or B, both of which have A reference count of 1 but should obviously be reclaimed.

# loop reference example
list1 = []
list2 = []
list1.append(list2)
list2.append(list1)
Copy the code

Python also introduces additional mechanisms to address these two shortcomings: tag scavenging and generational recycling.

Mark clear

The “Mark — Sweep” algorithm is a garbage collection algorithm based on tracing GC. It is divided into two phases: the first phase is the marking phase, in which GC marks all “live objects”, and the second phase is the collection of unmarked “inactive objects”. So how does GC determine which objects are live and which are not?

Objects are connected by references (Pointers) to form a digraph, objects form nodes of the digraph, and reference relations form edges of the digraph. Starting from the root object, the object is traversed along the directed edge. The reachable objects are marked as active objects. The unreachable objects are the inactive objects to be removed. The root object is the global variable, call stack, register.




In the diagram above, we put the little black circles as a global variable, or use it as the root object, from the black circle, 1 can direct object, then it will be marked, object 2, 3 can be indirectly reach will be marked, and 4, and 5 inaccessible, then 1, 2, 3, is active objects, 4 and 5 are active objects by GC.

As Python’s auxiliary garbage collection technology, the tag clearing algorithm mainly deals with container objects such as list, dict, tuple, instance, etc., because it is impossible to cause circular reference problems for string and numeric objects.

Python uses a bidirectional linked list to organize these container objects. However, there is an obvious drawback to this crude markup cleanup algorithm: it must scan the entire heap sequentially before removing inactive objects, even if only a small number of active objects are left.

Generational recycling

Generational collection also works with container objects as an auxiliary garbage collection technique in Python.

The logic of the GC

Allocate memory -> find that threshold is exceeded -> Trigger garbage collection -> put all collectable object linked list together -> iterate, calculate valid reference count -> divide into two sets valid reference count =0 and valid reference count > 0 -> put into older generation -> =0, Recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycle -> recycleCopy the code

In Python, a generation is a linked list. All memory blocks belonging to the same generation are linked together in the same linked list. The structure used to represent generation is GC_generation, which contains the current generation list header, the upper limit on the number of objects, and the current number of objects.

Python defines a collection of three generations of objects by default. The larger the number of indexes, the longer the object will live, and the newly generated objects will be added to generation 0. The preceding part of _PyObject_GC_Malloc is the time when Python GC is triggered. Every new object is checked to see if generation 0 is full, and if it is, garbage collection begins.

Generational recycling is a way of operating in space for time, Python will memory according to the object’s survival time is divided into different sets, each set is called a generation, Python memory can be divided into three “generation”, respectively, in the younger generation (0), s (1), (2) the old s, they are the corresponding three linked list, Their garbage collection frequency and object lifetime increase and decrease. New objects are allocated to the young generation. When the total number of lists in the young generation reaches a limit, Python garbage collection is triggered to recycle objects that can be collected, and objects that can’t be collected are moved to the middle age, and so on. Objects in the old age are the ones that live the longest. Or even for the entire life of the system. At the same time, the generation recycling is based on the marker removal technology.

3, the recursion

What is the default depth of Python recursion? What is the reason for the recursive depth limit?

The depth of Python recursion can be seen with sys.getrecursionLimit () in the built-in function library.

Because infinite recursion causes C stack overflows and Python crashes.