Environment to prepare
Task duration: 5 to 10 minutes
Pre-environment Deployment
Before we can start deploying, we need to do some preparatory work.
Yum update
yum update -y
Install development and build tools
yum install gcc gcc-c++ -y
Installing dependent libraries
yum install python-pip python-devel python-distribute libxml2 libxml2-devel python-lxml libxslt libxslt-devel openssl openssl-devel -y
Upgrade PIP
pip install --upgrade pip
This step is optional, but recommended for deployment stability
This step may take 5 to 10 minutes
Deployed mariadb
Task duration: 10 to 20 minutes
Since MySQL database is removed from the default application list in CentOS 7, we use Mariadb instead.
Install mariadb
yum install mariadb-server mariadb -y
Start the Mariadb service
systemctl start mariadb
Copy the code
Setting the root password
The default root password is empty. You can use the following command to create a root password:
(This step can also be skipped and the password after password can be changed to whatever password you want.)
mysqladmin -u root password "Password"
Check whether the installation is successful
Now you can try to connect to the Mysql server with the following command
mysql -u root -p
Copy the code
Then enter the Password you just set (default: Password), and if all goes well, you should see a prompt starting with MariaDB [(None)]> or mysql> on the command line indicating that the connection was successful.
Enter SHOW DATABASES; And press Enter. You should see output similar to the following, indicating that everything is fine.
| Database |
| mysql |
| test |
2 rows in set (0.13 sec)
After that, you can press Ctrl+C or enter Exit on the command line to exit and go to the next step.
If you do not set a password, use mysql directly
The deployment of redis
Task duration: 10 to 20 minutes
Download and decompress the installation package
Downloading the Installation package
wget http://download.redis.io/redis-stable.tar.gz
Decompress the installation package
tar -xzvf redis-stable.tar.gz
Copy the code
Move the decompression package to /usr/local
mv redis-stable /usr/local/redis
Copy the code
Compile the installation
cd /usr/local/redis
make install
Copy the code
Set up redis configuration
Set the configuration file path
mkdir -p /etc/redis
cp /usr/local/redis/redis.conf /etc/redis/redis.conf
Copy the code
Change the daemonize configuration item in the /etc/redis/redis.conf file to the following:
daemonize yes
Copy the code
Start the Redis service
/usr/local/bin/redis-server /etc/redis/redis.conf
Copy the code
Deploy pyspider
Task duration: 10 to 20 minutes
Install dependencies
PIP install --upgrade chardet easy_install mysql-connector==2.1.3 easy_install redis
Install pyspider
pip install pyspider
Copy the code
Configuration pyspider
Start by creating a configuration directory
mkdir /etc/pyspider
Copy the code
Then create pyspider.conf.json in /etc/pyspider, as shown below.
For details about the configuration, see the official documents
Example code: / etc/pyspider/pyspider. Conf. Json
"taskdb": "Mysql + taskdb: / / root: Password@ / taskdb"."projectdb": "Mysql + projectdb: / / root: Password@ / projectdb,"."resultdb": "Mysql + resultdb: / / root: Password@ / resultdb"."message_queue": "Redis: / / / db"."webui": {
"username": "root"."password": "Password"."need-auth": true}}Copy the code
Root in the mysql configuration is your mysql user name, root: Password is the Password you just set.
Username and password in the webui configuration are the user names required for accessing the webui. You can also set need-auth to false without setting the username and password.
Start the service
pyspider -c /etc/pyspider/pyspider.conf.json
Copy the code
If all is well, now visit http://< your CVM IP address >:5000 and you should see the home page of the PySpider Dashboard.
After the service can start normally, we need to enable it to run in the background. You can run the following command to enable the service to run in the background
nohup pyspider -c /etc/pyspider/pyspider.conf.json &
Copy the code
You can also start with the Supervisor recommended by the Supervisor. We will not go into details here. You can refer to the Supervisor documentation for details
Deployment is complete
Task duration: 1 to 2 minutes
Access the service
At this point you can visit http://< your CVM IP address >:5000 use your crawler to collect data, specific PySpider crawler script preparation and use tutorial can refer to online materials.
