The background,
The Maxcompute platform supports Spark jobs. Spark jobs can be submitted to Maxcompute in a community-compatible way by the Cupid platform based on Maxcompute. Share Project resources with existing SQL/MR jobs in Maxcompute. For details about related products, please refer to the official document help.aliyun.com/document_de… The Maxcompute Spark job is a type of job on the Maxcompute platform. Like other jobs, each Spark job has a unique InstanceId. You can use the InstanceId to manage Spark jobs. For example, with InstanceId we can get the Logview of the job and stop the job. However, the information obtained by InstanceId is too limited for Spark jobs. Some Spark job specific information cannot be displayed, which is a pain point in Spark user operation and maintenance management. The next part of this article introduces a Maxcompute Spark job control tool, Cupid Console.
Introduction to Cupid Console
Cupid Console is a plug-in for the MaxCompute client 0.33.1 and later. Download the latest version of the MaxCompute client: github.com/aliyun/aliy… After the configuration, run odpscMD. A set of Spark commands are added to the Cupid Console to manage and control Spark jobs. You can run help Spark. See how these commands are used;
Usage: spark list [-s <yarnState>(NEW,RUNNING,FINISHED,FAILED,KILLED)];
spark info [-i <instanceId>] [-a <appId>];
spark kill [-i <instanceId>] [-a <appId>];
spark view [-i <instanceId>] [-a <appId>];
spark search <appNameStr>;Copy the code
The Spark commands provided by The Cupid Console provide the InstanceId and ApplicationId management capabilities for Spark jobs, which is closer to the usage habits of Yarn users.
Use the Cupid Console command
1, the spark of the list
This command lists all Spark jobs under the current Project. The results include StartTime, InstanceId, State, RunningMode, and ApplicationName.
odps@ yeshan_test>spark list;
StartTime InstanceId State RunningMode ApplicationName
2020-02-09 20:52:14 20200209125214443gpwta5pr2 FAILED default com.aliyun.odps.spark.benchmark.Benchmark
2020-02-10 20:36:32 20200210123631787gu3325pr2 FINISHED default com.aliyun.odps.spark.examples.sparksql.SparkSQL
2020-02-10 20:38:38 20200210123838453gujojv21 FINISHED default SparkPi
2020-02-10 20:40:19 20200210124018718gt87hssa NEW default SparkPiCopy the code
In addition, the -s parameter is supported to filter the running status of jobs. For example, the following command lists only failed jobs.
odps@ yeshan_test>spark list -s FAILED;
StartTime InstanceId State RunningMode ApplicationName
2020-02-09 20:52:14 20200209125214443gpwta5pr2 FAILED default com.aliyun.odps.spark.benchmark.BenchmarkCopy the code
2, the spark info
This command can be InstanceId or AppId separate check some detailed information about a job, for example, we know that a Spark homework InstanceId is 20200210123631787 gu3325pr2, through the Spark the info command, We can obtain the following information.
odps@ yeshan_test>spark info -i 20200210123631787gu3325pr2;
project: "yeshan_test"
instanceId: "20200210123631787gu3325pr2"
applicationId: "application_1581338179928_1489502267"
applicationTags: ""
runningMode: "default"
applicationType: "SPARK"
yarnApplicationState: 5
finalApplicationStatus: 1
originalTrackingUrl: "http://master396f51a3-3ac1-44c1-937b-450ff524d0c3cupid-11-196-129-13:8088"
trackingUrl: "http://jobview.odps.aliyun.com/proxyview/jobview/?h=http://service.cn.maxcompute.aliyun-inc.com/api&p=yeshan_test&i=202 00210123631787gu3325pr2&t=spark&id=application_1581338179928_1489502267&metaname=20200210123631787gu3325pr2&token=eU8xaW RLWFBYcExyMzB4WE9DcUFWcC95cnNFPSxPRFBTX09CTzpwNF8yNDcwNjM5MjQ1NDg0NDc5NzksMTU4MTU5NzQzMCx7IlN0YXRlbWVudCI6W3siQWN0aW9uIj pbIm9kcHM6UmVhZCJdLCJFZmZlY3QiOiJBbGxvdyIsIlJlc291cmNlIjpbImFjczpvZHBzOio6cHJvamVjdHMveWVzaGFuX3Rlc3QvaW5zdGFuY2VzLzIwMj AwMjEwMTIzNjMxNzg3Z3UzMzI1cHIyIl19XSwiVmVyc2lvbiI6IjEifQ=="
diagnostics: ""
applicationName: "com.aliyun.odps.spark.examples.sparksql.SparkSQL"
startedTime: 1581338192231
finishedTime: 1581338272045Copy the code
In the example above, the Spark Info command provides the basic information about the job in detail, including the project name, appId, running mode, trackingUrl, application name, and start and end time of the job.
3, the spark of the kill
This command is used to stop a running Spark job using InstanceId or AppId. The command to kill a Spark job using InstanceId is equivalent to the command to kill InstanceId directly.
odps@ yeshan_test>spark kill-i 20200210130226166gp1525pr2; please check instance status. [status 20200210130226166gp1525pr2;] odps@ yeshan_test>spark list-s KILLED;
StartTime InstanceId State RunningMode ApplicationName
2020-02-10 21:02:26 20200210130226166gp1525pr2 KILLED default SparkPiCopy the code
4, the spark of the view
This command can be used to retrieve the Logview and Jobview links of a job through InstanceId or AppId. This command can be used to retrieve the Logview/Jobview links if they are not saved or the link fails.
odps@ yeshan_test>spark view -i 20200210123631787gu3325pr2;
Some env might need to set following flags.
set odps.moye.trackurl.host=****
setodps.cupid.webproxy.endpoint=**** jobview: http://jobview.odps.aliyun.com/proxyview/jobview/?h=http://service.cn.maxcompute.aliyun-inc.com/api&p=yeshan_test&i=2020 0210123631787gu3325pr2&t=spark&id=application_1581338179928_1489502267&metaname=20200210123631787gu3325pr2&token=TkpWV0V xZ0tLS29XN2VXd0xMTGRNMVg1elZNPSxPRFBTX09CTzpwNF8yNDcwNjM5MjQ1NDg0NDc5NzksMTU4MTU5ODgxMCx7IlN0YXRlbWVudCI6W3siQWN0aW9uIjp bIm9kcHM6UmVhZCJdLCJFZmZlY3QiOiJBbGxvdyIsIlJlc291cmNlIjpbImFjczpvZHBzOio6cHJvamVjdHMveWVzaGFuX3Rlc3QvaW5zdGFuY2VzLzIwMjA wMjEwMTIzNjMxNzg3Z3UzMzI1cHIyIl19XSwiVmVyc2lvbiI6IjEifQ== logview: http://logview.odps.aliyun.com/logview/?h=http://service.cn.maxcompute.aliyun.com/api&p=yeshan_test&i=20200210123631787g u3325pr2&token=cGREcHlQbkxTQnJDR2hrM1RHaVdCTDNRa3ZRPSxPRFBTX09CTzpwNF8yNDcwNjM5MjQ1NDg0NDc5NzksMTU4MTk0NDQxMCx7IlN0YXRlb WVudCI6W3siQWN0aW9uIjpbIm9kcHM6KiJdLCJFZmZlY3QiOiJBbGxvdyIsIlJlc291cmNlIjpbImFjczpvZHBzOio6KiJdfV0sIlZlcnNpb24iOiIxIn0=Copy the code
5, the spark search
This command is used to find Spark jobs in a Project by their application names. For example, the following command finds instanceids of all Spark jobs whose names are SparkPi. After obtaining the InstanceId of the job, you can run the Spark info command to obtain more detailed job information.
odps@ yeshan_test>spark search SparkPi;
instanceId: 20200210123838453gujojv21, appName: SparkPi
instanceId: 20200210124018718gt87hssa, appName: SparkPiCopy the code
Read more: https://yqh.aliyun.com/detail/6615?utm_content=g_1000106113
On the cloud to see yunqi: more cloud information, on the cloud case, best practices, product introduction, visit: https://yqh.aliyun.com/