Kaggle is a popular data science competition platform for data analytics and machine learning beginners. The platform has a lot of data sets that are close to real world business scenarios, making it a great place to practice. Once the Kaggle API is configured, you can write a script so that you can download the data in the future.

The installation

pip install kaggle
Copy the code

Execute after installation

kaggle compeitions list
Copy the code

Json file is not available. Ignore it. This step is mainly to make it run after the configuration folder is generated, generally on disk C – user – user name. Kaggle

configuration

Go to KaggleClick your profile picture in the upper right corner and select AccountOnce inside, scroll down to the bottom API and select Create New API Token

It will then automatically download a kaggle.json file and save it to the.kaggle folder in the first step

Download data set

Execute the following

kaggle compeitions list
Copy the code

You can see some recent contests, focusing on the following prizes 😃

List, Kaggle Competitions and some other uses that I won’t go into.

kaggle competitions {list, files, download, submit, submissions, leaderboard}
Copy the code

Everyone is most concerned about the data set download

kaggle datasets{list,files,download,create,version,init,metadata,status}
Copy the code

The more common ones are: list (list of available datasets), files (data files), Download (download)

kaggle datasets list
Copy the code

usage

usage: kaggle datasets list [-h] [--sort-by SORT_BY]
[--size SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] 
[--tags TaG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v]
Copy the code

There are two common arguments: -s search, which can be followed by a keyword; -p Displays how many rows. The default is 20

kaggle datasets download
Copy the code

usage

usage: kaggle datasets download 
[-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
[-o] [-q][dataset]
Copy the code

Truer usage

If simply executing a download command at CMD is overkill, we can also use the kaggleAPI to write shell scripts for more complex uses, such as:

#! /bin/sh DATASET="noxmoon/chinese-official-daily-news-since-2016" ARCHIVE_FILE="chinese-official-daily-news-since-2016.zip" DATA_FILE="chinese_news.csv" DATA_DIR="data" COL_NAME="headline" LINES=3000 OUTPUT_FILE="headlines.txt" if [ -d ${DATA_DIR} ]; then echo ${DATA_DIR}' exists, please remove it before running the script' exit 1 fi echo "Creating dir" mkdir -p ${DATA_DIR} cd ${DATA_DIR} kaggle datasets download -d ${DATASET} unzip ${ARCHIVE_FILE} echo "Deleting original dataset archive" rm -f ${ARCHIVE_FILE} echo "Extracting, cutting, shuffling data" awk -v col=$COL_NAME -F "\"*,\"*" '{print $COL_NAME}' $DATA_FILE | shuf -n 3000 > ${OUTPUT_FILE}Copy the code

Download – Decompress in one go!

reference

Github.com/Kaggle/kagg…