Python is a high-level dynamic programming language, created and published in 1991 by Guido van Rossum. Applications range from Web development to data science to DevOps.

It focuses on simplicity, high readability and high scalability, especially in the field of data science, which can well support business analysis (exploratory analysis, data visualization), machine learning (data cleaning, algorithm modeling), deep learning (neural framework building) and other scenarios. Therefore, most data workers prefer to work in Python, and novice data analysts will consider starting data analysis through Python.

Through close communication with tens of thousands of data professionals currently gathered at Corsykesci.com, we found that while Python itself is very friendly to beginners of zero-based data analysis, most of them have also faced the “data analysis from start to quit” problem of the century.

The reasons behind the analysis can be summarized as follows:

  1. Hand-built local programming environment tedious engineering, the process of various installation bugs, very easy to let beginners lose interest in learning (especially non-computer related professional friends);
  2. Faced with a large amount of data analysis related knowledge and learning tutorials shared publicly on the Internet, I do not know how to screen and screen;
  3. After finishing the tutorial and learning how to program, I found that when faced with real business data analysis problems, my mind was blank and I had no way to start.

But the truth is, we all have the opportunity to make data learning, data analysis tools and human interaction a little bit better.

Part1: Basic knowledge required for beginners

Python’s popularity in the field of artificial intelligence and machine learning is largely due to its very large third-party libraries and powerful general-purpose programming capabilities. Therefore, to quickly master Python for data analysis is to learn various Python third-party libraries and toolkits.

For beginners, the software will be able to perform simple data analysis for Pandas, Numpy, Matplotlib, and Seaborn.

Machine learning sciKit-learn related learning materials we are handling, please look forward to!

Week1

Pandas

Is a quick and simple implementation of data manipulation, integration and visualization tool library

  1. Corzine X machine heart from scratch to hand Python key code
  2. Pandas Basic command quick lookup table
  3. How to use Pandas to do data analysis

Week2

Numpy

Is the most basic library to do scientific calculations. For n – dimensional vectors and Numpy arrays, it provides a variety of functions to speed up operations

  1. Numpy Quick Start Guide – Basics
  2. Numpy Quick Start Guide — Advanced
  3. These 100 exercises will take you through Numpy

Week3

Learn: Matplotlib

Note: Matplotlib is designed to generate powerful visualizations in an easy and simple way, but it is a low end library and requires more code to implement visualizations than other high end libraries.

  1. Learn Python from scratch [1] — Matplotlib (bar chart)
  2. Learn Python from scratch [2] — Matplotlib (pie chart)
  3. Learn Python from scratch [3] — Matplotlib

Week4

Seaborn

Note: Focusing on the visualization of statistical models can provide a variety of effects such as thermal maps to depict the overall distribution of data.

  1. Seaborn Visualization learning categorial visualization
  2. Timeseries & Regression & HeatMap for Seaborn Visualization
  3. Seaborn Distribution Visualization for learning

How to learn

  1. As for the above single learning content, you can directly click the “Fork” key to receive the project content to the personal K-Lab work area after kesai login. Click the “Run” key to experience the practical learning scene of “mastering knowledge while typing code while running” in the interactive programming interface of K-Lab.
  2. We in the website the column of “project” to create the “DATA” TRAIN “| DATA analysis study plan”, will continue to update later.

Part2: Choose a good data analysis programming environment

We all know the importance of sharpening tools to do a good job. K-lab is an online data analysis collaboration platform.

Zero data engineering problems

K-lab is an online data analysis and collaboration platform. After logging in, users can have their own K-Lab work area and enjoy 2-core 8G high-performance cloud computing resources for free. Python3, Python2, and R (whatever language you choose) are integrated for you, and 100+ common data analysis kits can be directly called. It means that once you log in, you can learn programming languages and do data analysis directly in the cloud.

Interactive programming

  1. Traditional integrated development environments (ides) are being replaced in data analytics. Jupyter, JupyterLab and RStudio are outstanding examples of this trend.
  2. Computational narratives are being widely created. Real-time code, narrative text, and visualizations will be integrated, making it easy for data workers to tell stories using code and data.

K-lab provides online data analysis services based on Jupyter Notebook, continuing the design method of interactive programming, so as to unify the whole process and results of data analysis.

Part3: Application of project testing capabilities using real business data

The ultimate goal of learning Python is to master data analysis skills and be able to solve problems related to data analysis in real work or daily life.

For students in institutions of higher learning and those who want to transform into data workers, it is difficult to obtain real production-level data resources in the industry, and learning cases of digital innovative application solutions in enterprises are scarce.

Therefore, we simultaneously open the “Financial industry data algorithm training camp” on the official website of KOSai, choose the previous “Qianhai Credit Investigation” Good Letter Cup “Big data Algorithm Contest” as a case, disassemble the learning tutorial, provide real industry data combat test for data analysis advanced enthusiasts.

See here, perhaps you will think I do not know how to do finance? In fact, we have considered that the advantages of this case are:

  • Data resources are very rich and open to use. As a professional third-party commercial credit investigation agency under Ping An, it provided 40,000 pieces of data of credit loan business and 4,000 pieces of cash loan business.
  • Low service complexity and universal application scenarios. Through the “credit loan” business, establish the “cash loan” business credit scoring model. Very easy to understand, low requirements for financial business understanding.

At the same time, we are equipped with online submission and evaluation system, which will be ranked and updated in real time according to the pre-set “evaluation standard”. It is convenient for you to evaluate your data analysis ability reasonably, and continuously optimize and improve it.

All things are difficult before they begin. I look forward to your persistence and transformation.