Original link:tecdat.cn/?p=19211 

Original source:Tuo End number according to the tribe public number

 

As the threat of Novel Coronavirus COVID-19 spreads around the world, we live in an era of increasing concern. In this paper, MATLAB is used to analyze COVID-19 data sets.

COVID – 19 data source

We examine the unzipped file. Contains:

  • Data.csv – Daily levels of global cases by province/state, 2020
  • Confirmed.csv – Time series data of confirmed cases
  • Shuffle.csv – Time series data on deaths
  • Recovered. CSV – Time series data of recovered people

Map visualization

We visualized the number of confirmed cases on a map. We start by loading the latitude and longitude variables.

opts = detectImportOptions(filenames(4), "TextType","string");
Copy the code

The dataset contains “province/state” variables, but we aggregate the data at the “country/territory” level. Before we do that, we need to sort out the data a little bit.

times_conf.("Country/Region")(times_conf.("Country/Region") == "China") = "Mainland China";
times_conf.("Country/Region")(times_conf.("Country/Region") == "Czechia") = "Czech Republic";
Copy the code

We can now use groupsummary to add up confirmed cases and average latitude and longitude to aggregate the data by country/region.

country = groupsummary(times_conf,"Country/Region",{'sum','mean'},vars(3:end));
Copy the code

The output contains unnecessary columns, such as the sum of latitude and longitude. Let’s delete these variables.

vars = regexprep(vars,"^(sum_)(? =L(a|o))","remove_"); vars = regexprep(vars,"^(mean_)(? =[0-9])","remove_");Copy the code

 

times_conf_exChina = times_conf_country(times_conf_country.("Country/Region") ~= "Mainland China",:);
Copy the code

Let’s visualize the first and last date data in the dataset using GeoBubble.


for ii = [4, length(vars)]
    times_conf_exChina.Category = categorical(repmat("<100",height(times_conf_exChina),1));
    times_conf_exChina.Category(table2array(times_conf_exChina(:,ii)) >= 100) = ">=100";
    gb.LegendVisible = "off";
Copy the code

We can see that it initially affected only the countries around the continent. It is important to note that we have confirmed cases in the United States as early as January 22, 2020.

Confirmed cases in US

Enter the United States at the provincial/state level.

figure t = tiledlayout("flow"); For ii = [5, length(vars)] gb.BubbleColorList = [1,0,1;1,0,0]; gb.LegendVisible = "off"; gb.Title = "As of " + vars(ii); gb.SizeLimits = [0, max(times_conf_us.(vars{length(vars)}))]; Gb. MapCenter = [44.9669 113.6201]; Gb. ZoomLevel = 1.7678;Copy the code

 

You can see that it started in Washington, and there were big outbreaks in California and New York.

Rank countries/territories by confirmed cases

Let’s compare the number of confirmed cases by country/region using COVID_19_data.csv. There are inconsistencies in the date-time format, so we treat it as text at first.

opts = detectImportOptions(filenames(3), "TextType","string","DatetimeType","text");
Copy the code

Clear date and time format.

Data.nDate = regexprep(Data.Date,"\/20$","/2020");
Data.Date = datetime(Data.Date);
Copy the code

We also need to standardize values in country/region.

Country_Region(Country_Region == "Iran (Islamic Republic of)") = "Iran";
Copy the code

The dataset contains provincial/state variables. Let’s aggregate the data at the country/region level.

countryData = groupsummary(provData,{'ObservationDate','Country_Region'}, ...
    "sum",{'Confirmed','Deaths','Recovered'});
Copy the code

CountryData contains cumulative daily data. We just need the latest numbers.

 

Increase in confirmed cases by country/territory

We can also examine the rate of increase in cases in these countries.

figure
plot(countryData.ObservationDate(countryData.Country_Region == labelsK(2)), ...
hold on
for ii = 3:length(labelsK)
    plot(countryData.ObservationDate(countryData.Country_Region == labelsK(ii)), ...
Copy the code

 

Although South Korea is showing signs of slowing growth, it is accelerating elsewhere.

Increase in new cases by country/region

We can calculate the number of new cases by subtracting the cumulative number of confirmed cases between the two dates.


for ii = 1:length(labelsK)
    country = provData(provData.Country_Region == labelsK(ii),:);
    country = groupsummary(country,{'ObservationDate','Country_Region'}, ...

    if labelsK(ii) ~= "Others"
        nexttile
Copy the code

As you can see, China and South Korea are not seeing many new cases. We can see that the epidemic has been contained.

China

As the rate of infection in China is slowing, let’s take a look at how many active cases there are still. You can count active cases by subtracting recovered cases and deaths from confirmed cases.

for ii = 1:length(labelsK)
    by_country{ii}.Active = by_country{ii}.Confirmed - by_country{ii}.Deaths - 

figure
Copy the code

 

Fitting curve

The number of valid cases is falling, and the curve looks roughly gaussian. Can we fit the Gaussian model and predict when the activity case will be zero?

I use the curve fitting toolbox for Gaussian fitting.


ft = fittype("gauss1");

[fobj, gof] = fit(x,y,ft,opts);
gof
Copy the code
Gof = struct with fields: sse: 4.4145e+08 rsquare: 0.9743 dfe: 47 adjrsquare: 0.9732 rmse: 3.0647e+03Copy the code

Let’s output the forecast by adding 20 days.

Now let’s plot the result.

figure
area(ObservationDate,by_country{1}.Active)
hold on
plot(xdates,yhat,"lineWidth",2)
Copy the code

 

 

South Korea

Let’s look at the number of active cases, recovered cases and deaths in South Korea.

 

It is impossible to obtain any suitable results using the Gaussian model.


 

Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction