If you are preparing for an interview recently, you can pay attention to my column — Help you fall. I hope it will help you

preface

Yue four just past before long, watch and be autumn recruit, so to write this article, from a byte just attending the interview and ace friends personally, in addition to byte offers, he also respectively through the jingdong, baidu and tencent, alibaba company, so his experience still has a certain value, If you are going to participate in the autumn recruitment, you can collect it as a reference. If it really helps your interview, I will be very honored.

I have also brought all the materials he used before the interview, so that he can share them with friends who need them for free. Just click and claim!

  • Java basic knowledge summary
  • A line of Internet companies Java interview core knowledge points

That words say not much, sit steady hold good, start!

One side two side together, three side because of the May Day so separated for a long time, HR side in three after a day

One side: 45 minutes

Project: Introduce the project requirements, design ideas and main technologies (because the project is related to AI, so I didn’t ask about technology)

A, JAVA

1. Garbage collection algorithm

Typical garbage collection algorithm

The JVM specification does not specify how GC works, and vendors can implement a garbage collector in different ways. Several common GC algorithms are discussed here.

Mark-sweep algorithm

The most basic garbage collection algorithm is divided into two stages, annotation and cleanup. The marking phase marks all objects that need to be reclaimed, and clears the space occupied by the marked objects reclaimed by the phase. As shown in figure:

As can be seen from the figure, the biggest problem of this algorithm is serious memory fragmentation, which may lead to the subsequent problem that large objects cannot find available space.

Copying algorithms

In order to solve the defect of memory fragmentation of Mark-Sweep algorithm, this algorithm is proposed. The memory is divided into two equal size pieces by memory capacity. Only one chunk is used at a time. When the memory of this chunk is full, the remaining objects are copied to another chunk to clear the used memory, as shown in the figure:

Although this algorithm is simple to implement, high memory efficiency, and not easy to generate fragments, the biggest problem is that the available memory is compressed to half of the original. Moreover, with more surviving objects, the efficiency of Copying algorithm will be greatly reduced.

Mark-compact algorithm

Combining the above two algorithms, it is proposed to avoid defects. The marking phase is the same as the Mark-Sweep algorithm. Instead of cleaning the object after marking, the living object is moved to the memory end. Objects outside the end boundary are then cleared. As shown in figure:

Generational Collection algorithm

Generation collection is the approach used by most JVMS today. The core idea is to divide memory into domains based on the life cycle of objects. Generally, the GC heap is divided into Tenured/Old Generation and Young Generation. The feature of the old generation is that only a small number of objects need to be collected in each garbage collection. The feature of the new generation is that a large number of garbage needs to be collected in each garbage collection. Therefore, different algorithms can be selected according to different regions.

Today, most JVMS adopt Copying algorithms for new generation because most objects are collected per garbage collection, so there are fewer operations to be copied, but typically the new generation is not divided 1:1. Generally, the new generation is divided into a large Eden Space and two small Survivor Spaces (From Space, To Space). Each time, the Eden Space and one of the Survivor Spaces are used. Copies the surviving objects in the two Survivor Spaces to the other Survivor space.

The old generation uses the Mark-compact algorithm because it only collects a small number of objects at a time.

Also, don’t forget the Permanet Generation in the method area mentioned in Java Basics: The Java Virtual Machine (JVM). It is used to store classes, constants, method descriptions, and so on. Recycling of eternal generations mainly includes discarding constants and useless classes.

Object memory allocation is mainly in the Eden Space of the new generation and From Space of the Survivor Space. In a few cases, memory allocation is directly allocated to the old generation. When the Eden Space and From Space of the new generation are insufficient, a GC will occur. After GC, the surviving objects in Eden Space and From Space will be moved To To Space, and then Eden Space and From Space will be cleaned up.

If To Space is not sufficient To store an object, the object is stored To the old generation. After GC, Eden Space and To Space are used, and so on. When an object escapes a GC in Survivor, its age is +1. By default, objects older than 15 are moved to the old generation.

Typical garbage collector

Garbage collection algorithm is the theoretical basis of garbage collector, and garbage collector is its concrete implementation. Here are some of the garbage collectors that the HotSpot virtual machine provides.

3.1 Serial/Serial Old

The oldest collector is a single-threaded collector that must suspend all user threads for garbage collection. Serial is based on Copying algorithm for the new generation of collectors. Serial Old is a collector for Old generations and adopts mark-compact algorithm. The advantage is that it is simple and efficient, but the disadvantage is that you need to pause the user thread.

ParNew

Multithreaded version of Seral/Serial Old, using multiple threads for garbage collection.

Parallel Scavenge

A new generation of parallel collectors uses Copying algorithm without suspending other threads during collection. This collector differs from the first two in that it aims to achieve a manageable throughput.

Parallel Old

Parallel Scavenge’s old version of the application uses the Mark-Compact algorithm and multithreading.

CMS

The Current Mark Sweep collector is a concurrent collector aiming at the minimum collection time pause, so the Mark-sweep algorithm is adopted.

G1

Garbage First (G1) collector is a cutting-edge achievement of the Garbage First (G1) collector technology. It is a service-oriented collector that can make full use of cpus and multi-core environments. Is a parallel and concurrent collector that models predictable pause times.

2. Differences between CMS and G1

CMS: Collector with the goal of obtaining the shortest collection pause time, based on concurrent “mark cleaning” implementation

Process:

1. Initial tags: exclusively mark the PUC and only mark the objects that can be directly associated with GCroots

2. Concurrent marking: Can be executed in parallel with user threads, marking all reachable objects

3, re-marking: exclusive CPU(STW), to mark the garbage objects generated by user threads running in the concurrent marking phase

4, concurrent cleanup: can be executed in parallel with the user thread, clean up garbage

Advantages:

Concurrency, low pauses

Disadvantages:

1. CPU sensitive: The concurrent phase does not cause user threads to stall, but it can slow down the application by occupying some threads

2. Unable to handle floating garbage: In the last step of concurrent cleaning process, the user county execution also generates garbage, but this part of the garbage is after the mark, so it has to wait until the next GC to clean up, this part of the garbage is called floating garbage

3, CMS uses the “mark-clean” method will generate a large number of space debris, when the fragmentation is too much, will bring great trouble to the allocation of large object space, often in the old age there is a large space but cannot find enough continuous space to allocate the current object, have to trigger a FullGC in advance, To solve this problem, THE CMS provides a switch parameter that enables the merge and defragmentation process to be started if the CMS fails to hold up and FullGC is to be performed. However, the defragmentation process is not concurrent, the space defragmentation is gone and the pause time is longer

G1: Is a server-side application oriented garbage collector

Features:

Parallelism and concurrency: The G1 takes full advantage of The hardware benefits of CPU, multi-core environments, using multiple cpus (or CPU cores) to reduce stop-the-world pause times. While other collectors would have needed to stop the GC execution of Java threads, the G1 collector allows Java programs to continue executing concurrently.

2. Generational collection: The concept of generational collection was retained in G1. While G1 can manage the entire GC heap independently of other collectors, it can handle newly created objects and older objects that have survived multiple GCS in a different way to get better collection results. In other words, the G1 can manage the new generation as well as the old.

3, spatial integration: as the G1 USES the concept of independent Region (Region), on the whole is based on the G1 “tag – sorting algorithm to realize the collection,” on the local (Region) of it is based on the “copy” algorithm, but in any case, this means that both algorithms G1 does not produce memory space debris during operation.

4, predictable pauses: this is another big advantage of G1 relative to CMS, reduce the pause time is the common concern of G1 and CMS, but G1 in addition to the pursuit of low pause, also can establish predictable pauses model, can use this explicitly specify a length of M segment within milliseconds, time on garbage collection may not consume more than N milliseconds

3. Stop the world

This one or two words are not clear, interested friends can their own Internet search tutorial

4. Check whether the object is alive

There are two ways to determine whether an object is alive:

Reference count: Each object has a reference count attribute. When a reference is added, the count is increased by 1, when the reference is released, the count is decreased by 1, and when the count is 0, it can be reclaimed. This approach is simple and does not solve the problem of objects looping around each other.

Reachability Analysis: Search down from GC Roots, and the path searched is called the reference chain. When an object is not chained to GC Roots by any reference, the object is proved to be unavailable. Unreachable object.

In the Java language, GC Roots include:

Objects referenced in the VM stack.

The object referenced by the class static property entity in the method area.

Object referenced by a constant in the method area.

Objects referenced by JNI in the local method stack.

Two, computer network:

1. TCP three-way handshake and four-way wave process and status

Three-way handshake The first handshake: Host A sends A packet whose bit code is SYN=1 and seq number=10001 to the server. Host B knows by SYN=1, and host A needs to set up an online connection. The status is SYN_SENT. Second handshake: Host B sends ack number=(host A’s SEq +1), SYN =1, ACK =1, and A packet with SEq =20001 is generated randomly. In this case, the status changes from LISTEN to SYN_RECV. The third handshake: After receiving the ack number, host A will check whether the ack number is correct, that is, the seq number+1 sent the first time, and whether the bit code ACK is 1. If the ack number is correct, host A will send ack number=(seQ +1 sent by host B),ack=1. If the SEQ value is 1 and the ACK value is 1, host B is in the ESTABLISHED state.

The three-way handshake is complete, and data transfer between hosts A and B begins

  • Name and meaning of each status

    CLOSED: No more to say on this. Initial state. LISTEN: This is also a very understandable state, indicating that a server SOCKET is in the listening state and is ready to accept connections. SYN_RECV: This state represents the receipt of a SYN packet. Normally, this state is an intermediate state during the three-way handshake session between the server SOCKET and the TCP connection. This state is rarely seen with Netstat unless you have written a client-side test program. The last ACK packet in the three-way TCP handshake was deliberately not sent. Procedure Therefore, when receiving an ACK packet from the client, the client enters the ESTABLISHED state. SYN_SENT: This state echoes SYN_RECV remote thinking. When a client SOCKET initiates a CONNECT connection, it sends a SYN packet first. Therefore, it enters the SYN_SENT state and waits for the server to send the second packet in the three-way handshake. The SYN_SENT status indicates that the client has sent SYN packets. ESTABLISHED: This is easy to understand and indicates that the connection has been ESTABLISHED.

  • Four times to wave

FIN_WAIT_1: The FIN_WAIT_1 and FIN_WAIT_2 states are both states that wait for the other party’s FIN packets. The difference between the two states is as follows: The FIN_WAIT_1 state indicates that when a SOCKET is in ESTABLISHED state, it wants to close the connection and sends a FIN packet. In this case, the SOCKET enters the FIN_WAIT_1 state. When the recipient responds to an ACK packet, the recipient enters the FIN_WAIT_2 state. Of course, under normal circumstances, the recipient should respond to an ACK packet regardless of the circumstances. Therefore, the FIN_WAIT_1 state is rarely seen. The FIN_WAIT_2 state is often seen with netstat.

FIN_WAIT_2: This state is explained in detail above. In fact, a SOCKET in FIN_WAIT_2 state is half connected. In fact, a SOCKET in FIN_WAIT_2 state is half connected.

TIME_WAIT: Indicates that a FIN packet is received and an ACK packet is sent. After 2MSL, the device returns to the CLOSED state. If FIN_WAIT_1 receives packets with both THE FIN and ACK flags, the packets can enter the TIME_WAIT state instead of FIN_WAIT_2 state.

CLOSING: This state is special. In fact, it should be rare and belongs to a rare exception state. Normally, after sending a FIN packet, you should receive (or both receive) an ACK packet and then a FIN packet from the recipient. However, the CLOSING state indicates that you do not receive an ACK packet but a FIN packet after sending the FIN packet. When does this happen? If both parties are CLOSING the SOCKET at the same time, then they are sending FIN packets at the same time. The CLOSING state means that both parties are CLOSING the SOCKET.

CLOSE_WAIT: The meaning of this state is to wait for shutdown. How do you understand that? When the peer party closes a SOCKET and sends a FIN packet to the peer party, the system will undoubtedly respond with an ACK packet and enter the CLOSE_WAIT state. The next thing you really need to consider is to see if you have any data to send to the other party. If not, you can close the SOCKET and send the FIN packet to the other party, thus closing the connection. So what you need to do in CLOSE_WAIT state is wait for you to close the connection. LAST_ACK: indicates that the endpoint waits for an ACK packet after sending a FIN packet. After receiving ACK packets, the system can enter the CLOSED state.

  • Why is a three-way handshake used to establish a connection, but a four-way handshake used to close a connection? This is because a server SOCKET in the LISTEN state can send an ACK and SYN (ACK responds and SYN synchronizes) in one packet after receiving a SYN request. However, when you close the connection and receive a FIN packet from the peer, it only indicates that the peer has no data to send to you. But not all of your data all send to each other, so you can not will soon close the SOCKET, or you may also need to send some data to each other, then send FIN a message to the other party to represent you agree you can now close the connection, so it most cases of the ACK packet and FIN packet is sent separately.

  • Why does the TIME_WAIT state return to the CLOSED state after 2MSL? Because even though both parties have agreed to close the connection, and all four handshake packets have been sent, we can go straight back to the CLOSED state (as from the SYN_SENT state to the ESTABLISH state), but we must assume that the network is unreliable. You cannot guarantee that the last ACK packet you (the client) sent will be received. This means that the SOCKET in the LAST_ACK state will not receive an ACK packet due to timeout and will resend a FIN packet. The TIME_WAIT state is used to resend a lost ACK packet.

  • Do you need four waves to close a TCP connection? Not necessarily. Four waves is the safest way to close a TCP connection. However, there are times when we don’t like the TIME_WAIT state (for example, when the MSL value is set too high and there are too many TCP connections in TIME_WAIT state on the server side, reducing the number of entries can close connections faster and free up resources for new connections). In this case, we can prevent the SOCKET from entering a TIME_WAIT state after close() by setting the SO_LINGER flag in the SOCKET variable, which will force the TCP connection to terminate by sending an RST (instead of the normal TCP four-way handshake). While this is not a very good idea, TIME_WAIT is often beneficial for us.

2. What phase does accept Connect Listen correspond to

  1. Connect () function: is a blocking function that establishes a connection to the parent server through the TCp three-way handshake

The client actively connects to the server. Set up the connection mode. Notify the Linux kernel to automatically complete the TCP three-way handshake

Normally the client’s connect function blocks by default until the three-way handshake phase succeeds.

2. The server’s LISTEN () function is not a blocking function: it tells the Linux kernel about a socket and the length of the socket queue

The listen function turns socketFd into a passive connection and listens to the socket. The backlog parameter sets the length of the queue in the kernel.

3. The accept() function blocks: Fetchingcompleted connections from the queue in the Established state. If the connection is not completed in the queue, it blocks until the user that fetchingcompleted connections connects.

Problem 1: What if the server does not call the Accept function in time to remove the completed connection queue?

When the server’s connection queue is full, the server does not respond to a SYN that initiates a new connection, so the client’s Connect returns ETIMEDOUT. But that’s not what Linux actually does when the TCP connection queue is full and Linux doesn’t reject the connection as the book says, it just delays the connection.

Iii. Operating system:

1. Linux C program layout

A program is essentially composed of BSS segment, data segment and text segment. You can see that an executable program stored (without being called into memory) is divided into code sections, data sections, and uninitialized data sections.

  • BSS segment (uninitialized data) : In segment-based memory management architectures, the BSS segment is usually an area of memory used to hold uninitialized global variables in a program. BSS is short for Block Started by Symbol. The BSS segment belongs to static memory allocation.
  • Data segment: In a segmented memory management architecture, a data segment is usually an area of memory used to hold initialized global variables in a program. The data segment belongs to static memory allocation.
  • Code segment: In segment-based memory management architectures, a text segment is usually an area of memory used to hold code for a program execution. The size of this area is determined before the program is run, and the memory area is read-only. In the code snippet, it is also possible to include read-only constants such as string constants.

The object file generated after the compilation of the program contains at least the three segments, and the general structure of the three segments is shown as follows:

The TEXT and data segments are allocated space at compile time, while the BSS segment does not take up the size of the executable and is acquired by the linker.

The contents of the BSS segment (uninitialized data) are not stored in the program files on disk. The reason for this is that the kernel sets them to 0 before the program starts running. Only the body and initialization data segments need to be stored in the program file.

The data segment (already initialized data) allocates space for the data, which is stored in the target file.

The data segment contains initialized global variables and their values. The size of the BSS segment is obtained from the executable, and then the linker gets a memory block of this size, immediately following the data segment. When this memory enters the address space of the program, it is cleared to zero. The entire extent that contains the data segment and the BSS segment is usually called the data segment at this point.

The executable runs in two more areas: the stack area and the heap area.

(4) Stack area: automatically released by the compiler to store function parameter values, local variables, etc. Each time a function is called, the return type of the function and some information about the call are stored on the stack. The called function then allocates space on the stack for its automatic and temporary variables. A new stack is used every time a function is called. The stack area is a continuous memory area that increases from the high address bit to the low address bit. The maximum capacity is predefined by the system. If the requested stack space exceeds this limit, an overflow message will be displayed.

(5) Heap area: the address area between BSS and stack, used for dynamically allocating memory. Assigned and released by the programmer. The heap grows from low address bit to high address bit and adopts chain storage structure. Frequent malloc/free causes discontinuity in the memory space, resulting in fragmentation. When applying for heap space, the library functions follow an algorithm to search for a large enough space to be available. So the heap is much less efficient than the stack.

The following figure shows the corresponding storage space of C’s source file:

At this point the program has not been put into memory, but in the case of hard disk storage, BSS does not take up space at this point. BSS gets memory space at link time.

The following figure shows the storage layout when the program is running, that is, when the program is in memory:

Four, intelligence questions:

A coin is not even, how to design it into a fair coin (refused to sample) this problem is left to you to think, divergent thinking, after all, who has not been bullied by intelligence questions!

Five,Algorithm and data structure:

Heap sorting building and sorting process

Simply mention it here Building pile downward from the first leaf nodes exchange (namely every adjustment is from the parent node, left child node, choice of right child node of the three largest exchanged with the parent node (after exchange may cause by the exchange of child node does not meet the nature of the heap, so to be returned to be traded after every exchange child nodes). The sorting process is to take out the elements at the top of the heap, the implementation details (take out the elements at the top of the heap and exchange with the elements at the bottom of the heap, the top of the heap elements to adjust the heap – – in each adjustment process when the adjustment is not down, the adjustment is finished). The first non-leaf node, the second non-leaf node… All the way to the top of the pile

It also asks about the time complexity of heap adjustment, but I won’t go into detail about that

Hand to tear:

Longest non-repeating substring

Train of thought and solution Idea 1: violence method, the actual problem will not use violence method, this does not mean that we can ignore it. The index starts at the beginning of the string and adds the following characters to the set. If this character is already in the set, the loop ends and the size is recorded after the loop ends. Each bit of the string is computed in this way, and the largest size is the answer. The code is as follows (if not Java, I have commented the key syntax, same as below).

public int lengthOfLongestSubstring(String s) { int maxLen = 0; for(int i = 0; i < s.length(); I++){HashSet<Character> set = new HashSet<>(); for(int j = i; j < s.length(); If (set.contains(s.charat (j))) break; if(set.contains(S.charat (j)) break; set.add(s.charAt(j)); } maxLen = Math.max(maxLen,set.size()); } return maxLen; }Copy the code

This is the only solution I’m going to post here, but there are a lot of interesting solutions, and you can think outside the box.

Second interview: 1 hour and 10 minutes

Project: Design ideas, encountered the biggest problem, how to ensure that the distributed primary key global unique (project chat for a long time….) A large number of hosts send messages to a single server. How to ensure performance?

Computer network:

1. With TCP three times four times the old eight strand

I’ve already said that, so I’m going to skip that

2. Implementation principle of TCP Keep Alive

For example, if a server detects how many milliseconds a packet has passed a certain number of times (the default value is 7,200,000 milliseconds, which is about 2 hours), then it sends a packet to each other. A keep-Alive packet is sent to the client (the keep-Alive packet is a combination of the ACK and the current TCP sequence number minus one). The client should be in one of three situations:

  1. The client still exists and the network connection is good. The client returns an ACK. After receiving an ACK, the server resets the timer and sends the probe 2 hours later. If there is data transfer on the connection within 2 hours, then delay it by 2 hours from that time.

  2. The client is shut down abnormally or the network is disconnected. In either case, the client does not respond. The server does not receive a response to its probe and sends the keep-alive packet repeatedly after a certain period of time (the default is 1000 ms). And repeat it a certain number of times (the default is 5 times for 2000 XP 2003 and 10 times for Vista).

  3. The client crashed but has been restarted. In this case, the server will receive a response to its survival probe, but the response is a reset, causing the server to terminate the connection.

3. The TCP packet

www.cnblogs.com/sui77626523…

Three sides: 1 hour

Project: asked all projects, design ideas, encountered the biggest difficulty

Why don’t you just tell me about your experience here

Operating system:

Malloc (from memory layout, free block list allocation algorithm and system call all said again)

2.copyonwrite

CopyOnWrite thought

The idea of CopyOnWrite (COW) is a general optimization strategy in the field of computer programming. Its core idea is that if there are multiple caller (Callers) to access the same resource at the same time, such as the data on the memory or disk storage), they will get the same pointer to the same common resources, until a caller to modify resources content, system will truly reproduce a copy of a dedicated (private copy) to the caller, The original resource, as seen by other callers, remains the same. This process is transparently transparent to all other callers. The main advantage of this approach is that if the caller does not modify the resource, no private copy is created, so multiple callers can share the same resource when they just read the operation.

In plain English, the technology of replication on write means that when different processes access the same resource, they will copy a new copy of data and update and replace it only during the update operation. Otherwise, they all access the same resource.

JDK CopyOnWriteArrayList/CopyOnWriteArraySet container is used the COW thought, how does it work? In simple terms, when querying, it does not need to lock and access freely. Only when updating, it will copy a copy from the original data, and then modify the copy, and finally replace the original data with the current copy. While modifying the operation, the read operation will not block, but will continue to read the old data. This should be distinguished from read/write locks.

Source code analysis

Let’s take a look at CopyOnWriteArrayList’s add() method, which is also very simple, which is to lock the access time, copy a copy, first operate on this copy, and then replace the existing data with this copy.

public boolean add(E e) { final ReentrantLock lock = this.lock; lock.lock(); try { Object[] elements = getArray(); int len = elements.length; Object[] newElements = Arrays.copyOf(elements, len + 1); newElements[len] = e; setArray(newElements); return true; } finally { lock.unlock(); }}Copy the code

The GET (int Index) method of CopyOnWriteArrayList is just plain unlocked access.

    public E get(int index) {
        return get(getArray(), index);
    }

    @SuppressWarnings("unchecked")
    private E get(Object[] a, int index) {
        return (E) a[index];
    }    

Copy the code

Pros and cons

1. The advantages

For some read more than write data, write replication is good, such as configuration, blacklist, logistics address changes very little data, this is a lock free implementation. Can help us achieve higher concurrency.

CopyOnWriteArrayList is concurrent safe and performs better than Vector. Vector is synchronized, but each method needs to acquire a lock when it is executed. CopyOnWriteArrayList only locks the add, delete, and update methods, but reads are not locked. Read performance is better than Vector.

2. The shortcomings

Data consistency problems. This implementation only ensures the final consistency of the data, and when added to the copied data without replacement, the old data is still read.

Memory usage is incorrect. If the objects are large and frequent substitutions consume memory and cause Java GC problems, another container, such as ConcurrentHashMap, should be considered

Computer network:

1. How to ensure reliable UDP transmission at the application layer 2. The difference between HTTPS and HTTP and the SSL handshake mechanism of HTTPS 3

Intelligence:

1. Rand5 to Rand7, it is a refusal to sample, and the same way 2. There are 32 stones of the same size, there is a scale, how many times can you weigh at least to find out the largest mass of stones the second largest and the n largest

Hand:

Snake through the number group

Talk:

What book to read normally, what technology website to see how long to be able to ask

Hr side

This is nothing to say, no technology, just chat, don’t put the day to death.

I was a little surprised not to be asked about the database. After all, the most well-prepared is the database. It is obvious that the department attaches importance to the operating system and computer network.


I’m running out of time, so I’ll make it up if I have time. All right, Rathby! Remember to take the data

  • Java basic knowledge summary
  • A line of Internet companies Java interview core knowledge points

Previous hot article:

  • Java basic knowledge summary
  • Performance Tuning series (JVM, MySQL, Nginx, and Tomcat)
  • From being kicked out to 5 30K+ offers, all the way through the rough, sink your heart, it is not the future
  • 100 Java project parses with source code and learning documentation!

end