I am a junior of 985, looking for an internship in the direction of big data. Since there are no classes, I can practice for about six months.

Ali.

Date: March 26, 2020

One side

When Ali met him, he was just beginning his revision. Python, Java, and so on. I just looked at the knowledge of big data. Cool and clear.

Write an algorithm, rotate the array to minimize the value. I remember binary search, I didn’t write it down.
Ask Java GC (no)
Ask Python GIL(no)

Thanks a lot to the teacher. After the interview, I no longer felt so strong. I know I have a lot to learn. I still think the interview is too easy, surrounded by too many great people.

360

A 43 min:

8 April 2020

To introduce myself
How to implement data synchronization between Mysql and HDFS
Talk about the understanding of Hadoop talked about the principle of HDFS, MapReduce process, want to talk about HDFS checkpoint mechanism did not let talk.
What is the difference between Spark and MapReduce? Does Spark eliminate the need for MapReduce? (Not so good)
What do you know about Hive? I mentioned the skew of the data
If you have a lot of data, how do you figure out the Top10? (Made a mistake, later corrected)
Linux common use instructions? Hadoop fs-ls, mkdir, HDFS FSCK, Hadoop fs-ls, HDFS FSCK, Hadoop fs-ls, HDFS FSCK, Hadoop fs-ls, HDFS FSCK
What language is more common? (I use py a lot.) Common data types in Python? What’s the difference between a list and a tuple? Immutable objects and mutable objects
How do common Python libraries (Requests, BS4, Keras, and so on) install packages
Algorithm: array de-escalation, dynamic programming and recursive differences
Do you have any questions for me? (Asked if there is a shortage of HC and main work content)

The second interview 23 min

Flask source code (I haven’t seen Spark source code yet). Each project asks more deeply, algorithm model and so on.

HR面 30min

20 April 2020

How long can I practice?
How do you balance projects and communities with your daily life
A brief introduction to spark Streaming Storm Flink’s pros and cons
Future development and so on

tencent

Tencent is looking for internal push, background development, but related to big data. Interview in Tencent conference, one side of the tear code is screen sharing, two side of the tear code in Tencent document.

One side

23 April 2020

Introduce yourself.
Difference between Spark and Hadoop
MR flow, RDD flow
New Malloc differentiates epoll principle Let’s talk about polymorphism
Let’s go to the algorithm: binary search, write the rotation array minimum (this is the problem again. I said I had done it before and the interviewer didn’t ask me to write it again).

Second interview

28 April 2020

Introduce yourself, at home? No school?
Having learned Hadoop, let’s talk about MR process. Partition: Data skew on MR (Partition: data skew on MR) For example, why increasing the number of Reduce can improve efficiency, how to customize partitions, how to redesign keys)
Then asked about Hadoop Streaming (spark Streaming)
How to retrieve data from a socket
Let me write an algorithm. Given a tree, val of each node has either a take or a no-take state. The qualification is that directly connected nodes cannot be taken at the same time or not (such as parent and child nodes). Ask how to prune. Didn’t answer it
I ran out of ideas on that one. So let’s switch, give a binary tree and two nodes A and B and find the closest common ancestor of the binary tree. (The last question refers to the offer)

Usually mostly Py Java Scala development, rarely use C++, it is very uncomfortable. Still need to learn more. Later, Tencent caught me again, but at that time I already had the offer of byte, so I did not delay the time of both parties

Bytes to beat

One side

To introduce myself

Introduction: This section describes the deep collaborative filtering algorithm and synchronization policy between mysql and HDFS

Mysql > select * from left JOIN (select * from left join (select * from left join (select * from left join (select * from left join)); All AD ids that intersect with this date.

Write two algorithms ① write a hierarchical traversal ② write a sorting array to find the number k occurrences (I used binary search, and then search backwards and forwards)

The Spark. Introduce width dependency. (From Action to shuffle, from Aggregator to DAG parsing stage division)

Hive works. Let’s talk about the difference between the inside and the outside

Mysql > Mysql > index structure (clustered index and non-clustered index)

There are also differences between SQL joins

Second interview

To introduce myself
So let’s write a problem where you convert a number into a character like 1001, “one thousand and one,” and you use recursion but it’s a little bit too complicated to write
Tell me about your two most difficult projects
The difference between Spark Cache and Persite
What do you want to learn in the future (briefly on Flink, Spark source code and parameter server)
Anything else you want to ask me

HR side

To introduce myself
How long do you want to intern?
Graduate school in the future
Do you want to rent a house in Beijing
Parents’ opinions on coming to Beijing
What weaknesses do you see in yourself?

other

Also invested in a lot of other companies, such as Mogujie, Baidu, Zhihu and so on, there is no news, may not match it.

conclusion

I feel like I still have a lot to lose with my skills. The route of their review is:

Language: Python, Java, Scala
Computer Basics: Computer networks, operating systems, and databases (indexing is an emphasis)
Big data: Hadoop, Spark, Hive, hbase, Flume, Kakfa, Storm, etc
Algorithm: basically swipe the sword to point to offer.

You can also pay attention to my public account big data growth notes:

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

2020 Big Data Development Internship Interview (Alibaba 360 Tencent Bytes)

Ali.

One side

360

A 43 min:

The second interview 23 min

HR面 30min

tencent

One side

Second interview

Bytes to beat

One side

Second interview

HR side

other

conclusion

2020 Big Data Development Internship Interview (Alibaba 360 Tencent Bytes)

Ali.

One side

360

A 43 min:

The second interview 23 min

HR面 30min

tencent

One side

Second interview

Bytes to beat

One side

Second interview

HR side

other

conclusion

Related Posts

What is FPGA? Why are FPGas so important?

Double non 211 master no practice no project of chicken autumn recruit summary

After four years in Shanghai, I think IT’s time for me to return to Zhengzhou