I am a junior of 985, looking for an internship in the direction of big data. Since there are no classes, I can practice for about six months.

Ali.

Date: March 26, 2020

One side

When Ali met him, he was just beginning his revision. Python, Java, and so on. I just looked at the knowledge of big data. Cool and clear.

  1. Write an algorithm, rotate the array to minimize the value. I remember binary search, I didn’t write it down.
  2. Ask Java GC (no)
  3. Ask Python GIL(no)

Thanks a lot to the teacher. After the interview, I no longer felt so strong. I know I have a lot to learn. I still think the interview is too easy, surrounded by too many great people.

360

A 43 min:

8 April 2020

  1. To introduce myself
  2. How to implement data synchronization between Mysql and HDFS
  3. Talk about the understanding of Hadoop talked about the principle of HDFS, MapReduce process, want to talk about HDFS checkpoint mechanism did not let talk.
  4. What is the difference between Spark and MapReduce? Does Spark eliminate the need for MapReduce? (Not so good)
  5. What do you know about Hive? I mentioned the skew of the data
  6. If you have a lot of data, how do you figure out the Top10? (Made a mistake, later corrected)
  7. Linux common use instructions? Hadoop fs-ls, mkdir, HDFS FSCK, Hadoop fs-ls, HDFS FSCK, Hadoop fs-ls, HDFS FSCK, Hadoop fs-ls, HDFS FSCK
  8. What language is more common? (I use py a lot.) Common data types in Python? What’s the difference between a list and a tuple? Immutable objects and mutable objects
  9. How do common Python libraries (Requests, BS4, Keras, and so on) install packages
  10. Algorithm: array de-escalation, dynamic programming and recursive differences
  11. Do you have any questions for me? (Asked if there is a shortage of HC and main work content)

The second interview 23 min

Flask source code (I haven’t seen Spark source code yet). Each project asks more deeply, algorithm model and so on.

HR面 30min

20 April 2020

  1. How long can I practice?
  2. How do you balance projects and communities with your daily life
  3. A brief introduction to spark Streaming Storm Flink’s pros and cons
  4. Future development and so on

tencent

Tencent is looking for internal push, background development, but related to big data. Interview in Tencent conference, one side of the tear code is screen sharing, two side of the tear code in Tencent document.

One side

23 April 2020

  1. Introduce yourself.
  2. Difference between Spark and Hadoop
  3. MR flow, RDD flow
  4. New Malloc differentiates epoll principle Let’s talk about polymorphism
  5. Let’s go to the algorithm: binary search, write the rotation array minimum (this is the problem again. I said I had done it before and the interviewer didn’t ask me to write it again).

Second interview

28 April 2020

  1. Introduce yourself, at home? No school?
  2. Having learned Hadoop, let’s talk about MR process. Partition: Data skew on MR (Partition: data skew on MR) For example, why increasing the number of Reduce can improve efficiency, how to customize partitions, how to redesign keys)
  3. Then asked about Hadoop Streaming (spark Streaming)
  4. How to retrieve data from a socket
  5. Let me write an algorithm. Given a tree, val of each node has either a take or a no-take state. The qualification is that directly connected nodes cannot be taken at the same time or not (such as parent and child nodes). Ask how to prune. Didn’t answer it
  6. I ran out of ideas on that one. So let’s switch, give a binary tree and two nodes A and B and find the closest common ancestor of the binary tree. (The last question refers to the offer)

Usually mostly Py Java Scala development, rarely use C++, it is very uncomfortable. Still need to learn more. Later, Tencent caught me again, but at that time I already had the offer of byte, so I did not delay the time of both parties

Bytes to beat

One side

To introduce myself

Introduction: This section describes the deep collaborative filtering algorithm and synchronization policy between mysql and HDFS

Mysql > select * from left JOIN (select * from left join (select * from left join (select * from left join (select * from left join)); All AD ids that intersect with this date.

Write two algorithms ① write a hierarchical traversal ② write a sorting array to find the number k occurrences (I used binary search, and then search backwards and forwards)

The Spark. Introduce width dependency. (From Action to shuffle, from Aggregator to DAG parsing stage division)

Hive works. Let’s talk about the difference between the inside and the outside

Mysql > Mysql > index structure (clustered index and non-clustered index)

There are also differences between SQL joins

Second interview

  1. To introduce myself
  2. So let’s write a problem where you convert a number into a character like 1001, “one thousand and one,” and you use recursion but it’s a little bit too complicated to write
  3. Tell me about your two most difficult projects
  4. The difference between Spark Cache and Persite
  5. What do you want to learn in the future (briefly on Flink, Spark source code and parameter server)
  6. Anything else you want to ask me

HR side

  1. To introduce myself
  2. How long do you want to intern?
  3. Graduate school in the future
  4. Do you want to rent a house in Beijing
  5. Parents’ opinions on coming to Beijing
  6. What weaknesses do you see in yourself?

other

Also invested in a lot of other companies, such as Mogujie, Baidu, Zhihu and so on, there is no news, may not match it.

conclusion

I feel like I still have a lot to lose with my skills. The route of their review is:

  1. Language: Python, Java, Scala
  2. Computer Basics: Computer networks, operating systems, and databases (indexing is an emphasis)
  3. Big data: Hadoop, Spark, Hive, hbase, Flume, Kakfa, Storm, etc
  4. Algorithm: basically swipe the sword to point to offer.

You can also pay attention to my public account big data growth notes: