Kaggle is a popular data science competition platform for data analytics and machine learning beginners. The platform has a lot of data sets that are close to real world business scenarios, making it a great place to practice. Once the Kaggle API is configured, you can write a script so that you can download the data in the future.
The installation
pip install kaggle
Copy the code
Execute after installation
kaggle compeitions list
Copy the code
Json file is not available. Ignore it. This step is mainly to make it run after the configuration folder is generated, generally on disk C – user – user name. Kaggle
configuration
Go to KaggleClick your profile picture in the upper right corner and select AccountOnce inside, scroll down to the bottom API and select Create New API Token
It will then automatically download a kaggle.json file and save it to the.kaggle folder in the first step
Download data set
Execute the following
kaggle compeitions list
Copy the code
You can see some recent contests, focusing on the following prizes 😃
List, Kaggle Competitions and some other uses that I won’t go into.
kaggle competitions {list, files, download, submit, submissions, leaderboard}
Copy the code
Everyone is most concerned about the data set download
kaggle datasets{list,files,download,create,version,init,metadata,status}
Copy the code
The more common ones are: list (list of available datasets), files (data files), Download (download)
kaggle datasets list
Copy the code
usage
usage: kaggle datasets list [-h] [--sort-by SORT_BY]
[--size SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME]
[--tags TaG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v]
Copy the code
There are two common arguments: -s search, which can be followed by a keyword; -p Displays how many rows. The default is 20
kaggle datasets download
Copy the code
usage
usage: kaggle datasets download
[-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
[-o] [-q][dataset]
Copy the code
Truer usage
If simply executing a download command at CMD is overkill, we can also use the kaggleAPI to write shell scripts for more complex uses, such as:
#! /bin/sh DATASET="noxmoon/chinese-official-daily-news-since-2016" ARCHIVE_FILE="chinese-official-daily-news-since-2016.zip" DATA_FILE="chinese_news.csv" DATA_DIR="data" COL_NAME="headline" LINES=3000 OUTPUT_FILE="headlines.txt" if [ -d ${DATA_DIR} ]; then echo ${DATA_DIR}' exists, please remove it before running the script' exit 1 fi echo "Creating dir" mkdir -p ${DATA_DIR} cd ${DATA_DIR} kaggle datasets download -d ${DATASET} unzip ${ARCHIVE_FILE} echo "Deleting original dataset archive" rm -f ${ARCHIVE_FILE} echo "Extracting, cutting, shuffling data" awk -v col=$COL_NAME -F "\"*,\"*" '{print $COL_NAME}' $DATA_FILE | shuf -n 3000 > ${OUTPUT_FILE}Copy the code
Download – Decompress in one go!
reference
Github.com/Kaggle/kagg…