The semi problem
Teams are required to design and implement models and algorithms that can predict the types of routine ELECTROcardiogram by using the training set data provided by the organizing committee within the specified time of the rematch. The teams’ predictions on the test set will be used to calculate scores. The routine ecg used in the competition mainly included normal ecg and eight abnormal ECG, including atrial fibrillation, first-degree atrioventricular block, complete right bundle branch block, left anterior branch block, ventricular premature beat, atrial premature beat, early repolarization pattern change and T-wave change.
The semi data
The semi-final data is divided into training set, verification set and test set, among which the training set and verification set are visible to the participating teams. Training sets are mainly used to build models and algorithms. The verification set hides the label, and each team submits the prediction results on the verification set to the Intranet scoring website, which can not only confirm that the file format of the prediction results is correct, but also estimate the relative level among all the teams. The test set is used to calculate the team’s performance and ranking in the rematch. It will not be made public during or after the match and will be used to evaluate the performance of the algorithm.
In order to facilitate the teams to read the data, all electrocardiograms are stored in MAT format. Voltage signals of 12 leads (including I, II, III, aVR, aVL, aVF, V1, V2, V3, V4, V5 and V6) are stored in this file, and variables such as gender and age are also included. Ecg data are measured in millivolts at a sampling rate of 500 Hz. The labels corresponding to the training data are stored in the reference.csv file, and the category numbers and Abbreviations are shown in the following table.
0 | normal | Normal |
---|---|---|
1 | Atrial fibrillation | AF |
2 | First degree atrioventricular block | FDAVB |
3 | Complete right bundle branch block | CRBBB |
4 | Left anterior branch block | LAFB |
5 | Ventricular premature beat | PVC |
6 | Room sex premature beat | PAC |
7 | Early repolarization pattern changes | ER |
8 | T wave change | TWC |
The semi format
The following table lists the basic information about the competition system.
From May 5th to May 8th at 18:00 | The teams who entered the semi-finals were informed of the cloud desktop account and password provided by JINGdong Cloud. Teams familiarize themselves with the cloud desktop and install personal software on the cloud desktop | The cloud desktop is connected to the Internet, and no rematch data exists in the system. A cloud desktop account can access only one PC at a time |
---|---|---|
May 9th to May 12th | The ORGANIZING committee attaches data disks to each cloud desktop system | Teams cannot log in to the cloud desktop during this period |
10 PM May 13 to 10 PM June 10 | The teams are now in the final round, during which time you can submit your results on the validation set to ranking.jdworkspace.com to view the leaderboards | The cloud desktop is disconnected from the Internet and files cannot be downloaded from the cloud desktop to the local PC |
June 10 to June 30 | The organizing committee will test the algorithm for each team through the test set | Teams cannot land on the cloud desktop |
Each team will receive the JINGdong cloud account and password sent by the organizing committee on May 5th, and the “Contest Cloud Desktop User Manual” can be downloaded on the website. After receiving the account and password, the teams can log in and familiarize themselves with the cloud desktop system before 18:00, 8th. According to the software requirements filled in by each team during the registration period, the organizing committee installed the software in high demand, forming a universal cloud desktop system. Teams must use any uninstalled software before 18:00 on the 8th. The organizing committee will mount a data cloud disk to each cloud desktop after 18:00pm on May 8.
Download the contest cloud Desktop user manual here
The official competition time for the semi-finals will be from 10:00 am on May 13 to 10:00 am on June 10. Please note that the cloud desktop cannot access the Internet during the finals. Each team accesses the rematch training set through /media/jdcloud/Train and the validation set data through /media/jdcloud/Val. In order to facilitate the teams to understand the relative level of the algorithm model, the organizing committee set up an Intranet website with the domain name “ranking.jdworkspace.com” for the teams to upload the prediction results on the verification set. The team can view the ranking on the website (it is recommended to submit it at least once to make sure the format is correct; Note that this result will not be included in the calculation of the final result).
Attention, please:
- From May 5 to June 10, teams will be able to upload files to the cloud desktop from their local location.
- All teams should back up their codes on the code cloud disk or data cloud disk in case of code loss caused by accidents.
- In order to ensure the fair and just competition, the organizing committee will check the codes of each team during the test. In order to protect the intellectual property of each team, the organizing committee will not disclose the code; The code will not be given to anyone for use except under legal circumstances and for the purpose of testing and inspection by the organizing Committee.
- In order to ensure the stability of the ranking site, teams are requested not to submit the predicted results on the validation set too frequently.
After the rematch on June 10, the organizing committee used the test set to evaluate each team’s algorithm. The results of the semi-finals will be announced after June 30. The organizing committee will recommend teams to participate in the MICCAI 2019 Workshop “Machine Learning and Medical Engineering for Cardiovascular Healthcare” in Shenzhen in October this year (MLMECh-MICCAI) “, please wait for the details of the contest website or DLAB official account.
score
This competition adopts the scoring method based on multi-label classification [1]. This method is used to measure the prediction accuracy of the algorithm in each ecg. The main reason for using multi-label classification scores is that more than one abnormality may be present on an ECG. The results of each team in the semi-finals are calculated as the arithmetic average across all categories. This approach is detailed below.
Firstly, the following four variables are defined for the JTH category, where 0≤j≤8,
The Precision, Recall, and F1 scores of each class are thus calculated:
The average F1 score is the arithmetic mean of the above 9 scores, namely:
Algorithm testing
The organizing committee will evaluate the algorithm for each team and calculate the score through the test set. In order to facilitate the organizing committee to test the algorithm for 100 teams in a short time, please refer to the sample code and the following process (Python as an example) :
- Based on the training set, Pycharm was used to study the algorithm.
- Based on the validation set, the prediction results are consistent with the format of the sample “answers.csv”.
- Log in at ranking.jdworkspace.com and submit your predictions to see the current ranking (at least once is recommended to ensure the format is correct);
- Update the “run.sh” file to ensure that the predicted results can be obtained by running the bash file directly.
Teams can download the sample code here.
Download the sample code here
Please note that the final score is entirely dependent on the predicted results on the test set. The organizing committee gave up using the validation set to evaluate the results of each team. The main reason was that the data was visible during the semi-finals, so it was inevitable that some teams would mark the data to help train the algorithm model and other potential unfair competition. The main function of validation set is to form a leaderboard, which is convenient for the team to know the relative level of all teams. At the same time, the file format of the predicted results is verified to facilitate the subsequent algorithm testing.
In order to ensure that the organizing committee can evaluate the algorithm for the 100 semifinal teams in a short time, please refer to the run.sh file in the sample code. No matter what language is used to design the algorithm, ensure that the algorithm can be directly run through the run.sh file. The organizing committee can run the algorithm on the test set by modifying “–test_path”, which has the same layout as the verification set. Also refer to the answers.csv file in the sample code. The committee uses the answers.csv of each team to implement the leaderboard based on the verification set, as well as the final score based on the test set.
reference
[1] Zhang, Min-Ling, and Zhi-Hua Zhou. “A review on multi-label learning algorithms.” IEEE transactions on knowledge and data engineering 26.8 (2014) : 1819-1837.