The difference between processes and threads in an operating system

Process thread specific comparison:

  1. Scheduling overhead: In traditional OS, processes are the basic unit of independent scheduling and dispatching, and context (i.e. register contents and page table information) is expensive for each switch. Threading is the basic unit of scheduling and dispatching in OS, and switching costs are low. Switching between threads in the same process does not cause process switching. Switching between threads in different processes causes process switching

1.1 Why process switching costs more than thread switching: Process switching involves the switching of virtual address space, but threads do not. Threads have their own virtual address space. Threads share the virtual address space of their own process, so switching does not need to convert address space. The process switchover needs to convert the virtual address space into the physical address space, while the search for the physical address space needs to search the page table. Generally, TLB is used to cache the page table. However, the TLB becomes invalid after the page table switchover, so the conversion from the virtual address to the physical address is slow and costly

  1. Concurrency: the introduction of threads in the OS, not only between processes can be executed concurrently, the same process can be executed concurrently between multiple threads, and even allow all threads in a process to be executed concurrently, threads between different processes can also be executed concurrently, improve concurrency

  2. Owning resources: Processes can own resources as a basic unit of owning resources, whereas threads only own basic resources that are guaranteed to run independently

  3. Independence: The independence between different threads in the same process is much less than that between different processes. To prevent interference and destruction between processes, each process has a separate address space and other resources

  4. Support multi-processor system: the traditional process no matter how many processors, the process can only run on a processor, for multi-threaded process can be assigned to multiple processes on multiple processors, accelerate the completion of the process

  5. Interprocess communication:

6.1 Pipe: Half-duplex communication. Interprocess communication of kinship

6.2 Named pipes: interprocess communication with or without a relationship

6.3 Semaphore: A counter that controls access to a shared resource by multiple processes, often as a locking mechanism

6.4 Message queue: A linked list of messages stored in the kernel

6.5 Signal: more complex, used to notify the receiving process

6.6 Shared Memory: Maps a segment of memory that can be accessed by other processes. Shared memory is created by one process and can be accessed by multiple processes

6.7 Socket: Can be used for different devices and interprocess communication

  1. Interthread communication

7.1 Locking mechanisms: mutex, conditional variables, read/write locks, volatile, notify/wait

7.2 Semaphore mechanism: Nameless thread semaphore and named thread semaphore

7.3 Signal mechanism: Similar to interprocess signal processing


How does the suspension of a thread affect a process

  1. If the process does not mask segment Fault, one thread crashes and all threads terminate.

  2. If the segment Fault signal is blocked and a thread crashes at a thread-private location (stack), then other threads are fine.

  3. If you block segment fault signals and the thread crashes in thread-shared locations (heap, global variables, etc.), then other threads will have problems as well.


Process scheduling

  1. First come, first served
  2. Short jobs (processes) are scheduled first
  3. High priority priority scheduling

3.1 Non-preemptive priority algorithm

3.2 Preemptive priority scheduling algorithm

  1. High response ratio priority scheduling algorithm
  2. Time slice rotation
  3. Multi-level feedback queue scheduling algorithm

Common algorithms for batch processing systems:

  1. First come, first served
  2. Shortest job first
  3. Minimum remaining time is preferred
  4. The highest response rate is preferred

Common algorithms of time-sharing system:

  1. Rotation scheduling
  2. Priority queue
  3. Multi-level queue
  4. Lottery scheduling

Real-time system commonly used algorithms:

  1. Single ratio scheduling
  2. Deadline scheduling
  3. Minimum margin method

Can a single-core CPU multithread

Yes, and it makes sense to improve the user experience. Multi-threading to improve efficiency refers to network IO, a task time-consuming is particularly long, other tasks will be unable to execute, in order to prevent this phenomenon, it is necessary to switch between tasks.


Computer network five layer protocol?

Physical layer Data link layer Network layer Transport layer Application layer


TCP/udp the difference?

TCP byte stream oriented and UDP packet oriented

TCP delivers reliable delivery, UDP delivers best effort

TCP is connection-oriented, udp does not establish connections

TCP connections can only be point-to-point, udp one-to-one, one-to-many, and many-to-many

TCP provides full-duplex communication, and UDP provides half-duplex communication

Udp has no congestion control, and the udp header has a small overhead of only eight bytes


The difference between GET and POST

They are all TCP connections, but there are some differences in the application process due to HTTP regulations and browser restrictions. GET generates one packet and POST generates two. However, it does not mean that the efficiency is always high. In the case of good network, time difference is basically ignored. In the case of bad network, two packets have significant advantages in ensuring data loss

In addition, in data transmission, GET transmits a small amount of data, so it needs to splice URLS, and requests can be cached. POST transmits a large amount of data. The request cannot be cached because the parameters are passed through the body body. Neither is secure in principle, but POST is comparatively more secure than GET


HTTP Three times handshake four times wave:

Three-way handshake: send syn(online) for the first time, return SYN for the second time, ACK (confirm), and send ACK for the third time. Ensure efficiency and stability

Four waves: First send FIN (end), second return ACK, third return FIN, fourth return ACK. Since ack and FIN are sent separately, there is more than one handshake


The differences between HTTP1.0 and HTTP2.0:

  1. Http2.0 uses binary format rather than text format
  2. Http2.0 is fully multiplexed, not ordered and blocking, requiring only one connection to achieve parallelism
  3. Http2.0 uses header compression. Reduce overhead
  4. Http2.0 allows the server to actively “push” responses to the client cache

Differences between HTTP and HTTPS:

  1. HTTPS = HTTP + SSL, that is, an additional layer of encryption services can be used to transmit private information
  2. HTTPS uses both symmetric encryption for efficient transmission of content and asymmetric encryption for authentication of certificates
  3. Different connection modes use different ports. HTTP port 80/8080 and HTTPS port 443 are used
  4. HTTPS requires a certificate

Packet capture principle of Https:

Charles, as the intermediary agent, receives the server certificate during communication, dynamically generates his own certificate and sends it to the browser, and the data is intercepted and decrypted by himself to form packet capture


A complete network request process

  1. Obtain the IP address of the URL through DNS
  2. The BROWSER establishes a TCP connection with the server
  3. The client sends an HTTP request
  4. The server executes the corresponding business logic after receiving the response
  5. The server sends a response
  6. Parsing HTML
  7. Disconnect and wave four times

The role of DNS

A table of domain names and IP addresses is kept for resolving message domain names


Kernel mode and user mode

  1. To protect the core security of the operating system, Linux is divided into kernel space and user space
  2. Kernel space is shared by all processes, as is the case with the Binder mechanism
  3. In kernel mode, the process runs in kernel space, and the instructions executed by the CPU are not checked and are easy to crash
  4. In user mode, the process runs in user space, and the code executed receives CPU checks and can only access specified addresses
  5. Partitioning space isolates operating system code from application code so that the operating system is not affected even if a single application fails

What happens when you enter the URL and jump to the page?

First according to the URL to find the corresponding IP, establish TCP link, and then send HTTP request, then let the server to process the request, browser parsing rendering page, and then close the TCP link, using DNS, TCP/IP, HTTP, ARP, ICMP, HTML and other protocols


Why not use MAC as IP address?

A MAC address is unique, but you can’t tell where it came from from that address, whereas AN IP address hierarchically groups computers into groups, and routing tables determine their location. Right


The implementation principle of Ping

Ping sends an ICMP packet. The receiver sends an ECHO message to the destination address and reports whether it has received the ICMPecho. The receiver sends a packet to the destination IP address based on the uniqueness of IP addresses on the network


TCP sliding window flow control and congestion avoidance

Sliding window: The so-called flow control is to make the sending rate of the sender not too fast, to make the receiver in time to receive, if the sender sends data too fast, the receiver may be too late to receive, which will cause data loss. Sliding Windows are dynamically resized by algorithms. In general, the receive window >= send window

Congestion avoidance: Slow start, congestion avoidance, fast retransmission, fast recovery


2021.4.16 Updated interview questions

This section describes how SSL encryption works in HTTPS

Process:

  1. CA Authentication Server
  2. Randomly generated key
  3. The encrypted

Principle:

  1. SSL encryption is based on asymmetric encryption algorithms that produce a pair of long strings called key pairs (public and private keys). Data encrypted with a public key can only be decrypted with a private key. In fact, a web site with a server certificate stores the private key on the server, and makes an SSL certificate with the public key and site-related information (such as domain name, owner name, and expiration date), and publishes the SSL certificate on the Internet.

  2. When the user visits the site, he can obtain this SSL certificate. When the user submits data, the client uses the SSL certificate of the protected public key to encrypt the data. Asymmetric encryption can only be decrypted with the private key. Even if the data is intercepted in the process of network transmission, the interceptor cannot get the private key, and the interceptor cannot decipher the ciphertext. Therefore, THE HTTPS protocol based on SSL encryption is considered secure, and HTTPS sites are considered secure by browsers.


Computer network status code

1XX: The received request is being processed

2xx: The request is processed normally

3XX: Additional operations are required to complete the request, such as redirects

4XX: The server cannot process the request

5xx: Server processing request error


How to use UDP to implement TCP

  1. Add seQ/ACK to ensure that data is sent to the peer end
  2. Add send and receive buffers, mainly for user timeout retransmission
  3. Add timeout retransmission mechanism

– When sending data, the sender generates a random SEq =x, and then allocates SEQ to each piece according to the data size. After the data arrives at the receiver, the receiver puts it into the cache and sends an ACK =x packet to indicate that the receiver has received the data. After receiving the ACK packet, the sender deletes the data in the buffer. When the time is up, a scheduled task checks whether data needs to be retransmitted.

This is mainly concerned with the data structure differences between TCP and UDP headers


The concept of TCP connections

In fact, link is not a so-called wire connecting two devices. Instead, it is a virtual concept that two parties record the status locally after confirming the existence of each other, and then record a status information in each device


Virtual memory

Virtual memory is a computer system memory management technology. It makes an application think it has contiguously available memory, that is, a contiguously complete address space. In practice, it is typically split into physical memory fragments, with some temporarily stored on external disk storage for data exchange as needed.


CPU affinity

CPU affinity: The process that runs as long as possible on a given CPU without being migrated to another processor. Also known as CPU affinity. A simple point description binds the specified process or thread to the appropriate CPU. On a multi-core machine, each CPU has its own cache, which stores the information used by the process, and the process may be scheduled by the OS to other cpus. In this way, the CPU cache hit ratio is low. After the CPU is bound, the program will always run on the specified CPU, and the operating system will not schedule it to other cpus. Performance has been improved.

Soft affinity: Processes need to run on a given CPU for as long as possible without being migrated to other processors. Linux kernel process schedulers are built with a feature called soft CPU affinity, which means that processes usually don’t migrate frequently between processors. This state is exactly what we want, because less frequent process migration means less load.

Hard affinity: Binds processes or threads to a specified CPU core using the API provided by the Linux kernel.


The TCP connection page is displayed after the network cable is removed

After you remove the network cable, the current page will be displayed immediately


TCP handshake failure

Initial handshake A fails to send SYN. Therefore, both A and B do not apply for resources, and the connection fails. If multiple SYN connections are made within A period of time, A will accept only the SYN+ACK response of its last SYN, ignore all other responses, and the resources that have been requested in B will be released

After the second handshake, USER B fails to send A SYN+ACK packet. User A does not apply for resources. User B applies for resources but does not receive an ACK packet from User A and releases resources after A period of time. If A receives multiple SYN requests from A, B will reply with A SYN+ACK, but A will acknowledge only the earliest SYN and reply with the last HANDSHAKE

The third handshake ACK fails to be transmitted. User B does not receive the ACK, releases resources, and returns RST for the subsequent data transmitted by user A. In fact, because B does not receive an ACK from A, it will send SYN+ACK multiple times. The number of times can be set. If it still does not receive an ACK from A, it will release resources and return RST for data transmission from A


The difference between asymmetric encryption and symmetric encryption and the connection of certificates

  1. Symmetric encryption:

1.1 If A and B use the same encryption and decryption key, the communication is symmetric encryption

1.2 Encryption and decryption keys must be unique, so the problem also arises. Once there are too many links, there will inevitably be too many keys to manage

1.3 Generally cannot provide complete information authentication, unable to verify the identity of the sender and receiver

1.4 It is difficult to save a large number of public or private keys, and security problems cannot be guaranteed

  1. Asymmetric encryption:

2.1 Asymmetric encryption includes public key and private key. The public key is disclosed freely, and the private key is kept by itself. Only the private key can unlock the public key

2.2 The public key and private key are a pair. If the public key is used to encrypt data, only the corresponding private key can be used to decrypt data. If data is encrypted with a private key, it can only be decrypted with the corresponding public key

2.3 The typical application is digital signature DSA algorithm

  1. Certificate:

It takes digital certificate as the core encryption technology (encryption transmission, digital signature, digital envelope and other security technologies) can encrypt and decrypt the information transmitted on the network, digital signature and signature verification, to ensure the confidentiality and integrity of the information transmitted online and the non-repudiation of transactions

For example, memory

A person S there are public and private keys, the man gave A public key to his friends, his friends wrote A letter to the S, A in S after send the public key encryption to send S, as long as it S own private key is not leaking, to be sure that the letter was only their own private key to decrypt, even if the letter falls on someone else’s hands, others cannot decrypt, very safe. S decided to use “digital signature” to encrypt the letter with his private key. After friend A received the letter, he decrypted the letter with the public key and proved that it was A letter from S. Complex cases, friend cheat A friend. A, B, with its own public key in go S public key, friend B do it my own private key digital signature wrote A letter to A friend A, let your public key to decrypt A with B, then A friend A thinks is wrong, not sure whether the public key belongs to S, then A requirement S certificate center for public key authentication, The certificate center uses its private key to encrypt S’s public key and some information to generate A digital certificate. After S gets the certificate, he can write A letter to A with his digital signature and digital certificate attached. After receiving the letter, A unlocks the digital certificate with the public key of the certificate center. After obtaining the real public key of S, it can determine whether the digital signature is S’s


2021.9.20 Updated interview questions

What is the DNS resolution process?

  1. The local client sends a domain name resolution request to the local HOST file and sends the request to the local DNS server.

  2. After receiving the request, the local DNS server queries the local cache. If the entry exists, the local DNS server directly returns the query result.

  3. If it does not exist in the local DNS cache, the local DNS server sends the request directly to the root DNS server. The root DNS server then returns to the local DNS server the address of the primary DNS server of a domain queried (a subdomain of the root).

  4. The local server sends a request to the next DNS server. The server that receives the request queries its cache and returns the address of the relevant subordinate DNS server if it does not have the record.

  5. Repeat step 4 until the correct record is found.

  6. The local DNS server stores the returned results in a cache for future use and returns them to the client.

Recursive query: In this mode, when the DNS server receives a client request, it must reply to the client with an accurate query result. If the DNS server does not store the query DNS information locally, the server queries other servers and submits the returned query results to the client.

Iterative query: DNS host server without can response as a result, the other will be provided to the client to parse query request DNS server address, and when the client sends a query request, the DNS server is not directly reply to the query results, but to tell the client another DNS server address, and the client to submit the request to the DNS server Loop until the result of the query is returned.


Why not one, two, or four TCP handshakes?

Not once or twice: Because both parties cannot determine whether a connection has been established

Not four times: three times is enough to confirm the connection. Four times five times is useless


Deadlock generation and solution?

Definition of deadlock

A group of processes is deadlocked if each process in the group is waiting for events that can only be raised by other processes in the reshuffled process. This is usually caused by multiple processes competing for resources

  1. A deadlock occurs when a resource that is not preemptable is contested
  2. Contention for consumable resources can cause deadlocks
  3. Improper progression of processes causes deadlocks

The necessary conditions for a deadlock to occur

Deadlock will not occur if any of the following four conditions are not true

  1. Mutually exclusive condition: a resource can only be used by one thread at a time
  2. Request and hold conditions: when a thread is blocked requesting a resource, it holds on to the acquired resource
  3. Non-preemption condition: A thread cannot forcibly preempt resources until they are fully used
  4. Circular waiting condition: a circular waiting resource relationship is formed between several processes

Methods to handle deadlocks

  1. Deadlock prevention: Breaking any one of the above prerequisites in addition to the mutually exclusive condition
  2. Avoiding deadlocks: The Banker algorithm
  3. Deadlock detection: OS detection algorithm
  4. Unlock deadlock: The unlock algorithm of the OS

From top to bottom, there is less protection against deadlocks, but this is matched by increased resource utilization and less frequent resource blocking of processes (i.e., increased concurrency)


2021.10.11 Updated interview questions

Does TCP lose packets?

The problem is not very good say, someone said will online, some people say not, the interviewer is not, after I baidu to verify a reliable theory is “will be lost package, but because the TCP timeout retransmission mechanism makes the wrong data can transmit over and over until he correctly, so look for the upper business will not be lost package”. Of course, if you have different opinions here, welcome to the comment section or private communication


Stick and unpack

Unpacking and sticky packets are common in socket programming. During socket communication, if one end of communication sends multiple packets consecutivelyat one time, TCP packets are packaged into one TCP packet and sent out, which is called “sticky packets”. If the number of packets sent by one end exceeds the maximum number of TCP packets, one packet is split into multiple TCP packets of the maximum LENGTH for transmission. This is called “packet unpacking”.

Reasons for sticky bags:

  1. The data to be sent is smaller than the size of the TCP send buffer. TCP sends data that has been written to the buffer several times
  2. The application layer at the receiving end does not read the data in the receiving buffer in time
  3. Data is sent too fast, and data packets are piled up to accumulate multiple data in the buffer before being sent out at a time (If the client sleeps for a period of time every time it sends a data, packet sticking will not occur).

Synchronous asynchronous and blocking non-blocking

In Zhihu, I see a relatively simple distinction:

Story: Lao Wang boils water.

Characters: Lao Zhang, two kettles (ordinary kettles, referred to as kettles; A ringing kettle for short).

Lao Wang thought, there are several ways to wait

Lao Wang boiled water in a kettle and stood there, checking it at regular intervals, no matter whether it was boiling or not. – Synchronous blocking

Lao Wang thought about it, but this method was not clever enough.

2. Lao Wang still uses the kettle to boil water, no longer stands there to watch the water boil foolily, runs to the dormitory to surf the Internet, but still comes every once in a while to see whether the water boil, water does not boil to leave. – Synchronization is non-blocking

Lao Wang thought about it, and now the method is more clever, but still not good enough.

3. This time Lao Wang used a tall, loud kettle to boil water and stood there, but instead of checking the water every once in a while, he waited for it to boil and the kettle would inform him automatically. – Asynchronous blocking

Lao Wang thought, not ah, since the kettle can inform me, that I why also silly standing there and so on, well, have to change a method.

4. Lao Wang still uses a loud kettle to boil water, so he goes to the living room to surf the Internet and waits for the kettle to tell him when it has cooked itself. – Asynchronous non-blocking

Lao Wang suddenly felt much more relaxed

As you can see, synchronization requires one party to poll until the result is produced, whereas asynchro can know the result through callback. Blocking and non-blocking correspond to whether a thread is stuck and cannot do other tasks while it is stuck, while a non-blocking thread can.


Welcome to point out mistakes