Original link:tecdat.cn/?p=19211

Original source:Tuo End number according to the tribe public number

As the threat of Novel Coronavirus COVID-19 spreads around the world, we live in an era of increasing concern. In this paper, MATLAB is used to analyze COVID-19 data sets.

COVID – 19 data source

We examine the unzipped file. Contains:

Data.csv – Daily levels of global cases by province/state, 2020
Confirmed.csv – Time series data of confirmed cases
Shuffle.csv – Time series data on deaths
Recovered. CSV – Time series data of recovered people

Map visualization

We visualized the number of confirmed cases on a map. We start by loading the latitude and longitude variables.

opts = detectImportOptions(filenames(4), "TextType","string");
Copy the code

The dataset contains “province/state” variables, but we aggregate the data at the “country/territory” level. Before we do that, we need to sort out the data a little bit.

times_conf.("Country/Region")(times_conf.("Country/Region") == "China") = "Mainland China";
times_conf.("Country/Region")(times_conf.("Country/Region") == "Czechia") = "Czech Republic";
Copy the code

We can now use groupsummary to add up confirmed cases and average latitude and longitude to aggregate the data by country/region.

country = groupsummary(times_conf,"Country/Region",{'sum','mean'},vars(3:end));
Copy the code

The output contains unnecessary columns, such as the sum of latitude and longitude. Let’s delete these variables.

vars = regexprep(vars,"^(sum_)(? =L(a|o))","remove_"); vars = regexprep(vars,"^(mean_)(? =[0-9])","remove_");Copy the code

times_conf_exChina = times_conf_country(times_conf_country.("Country/Region") ~= "Mainland China",:);
Copy the code

Let’s visualize the first and last date data in the dataset using GeoBubble.


for ii = [4, length(vars)]
    times_conf_exChina.Category = categorical(repmat("<100",height(times_conf_exChina),1));
    times_conf_exChina.Category(table2array(times_conf_exChina(:,ii)) >= 100) = ">=100";
    gb.LegendVisible = "off";
Copy the code

We can see that it initially affected only the countries around the continent. It is important to note that we have confirmed cases in the United States as early as January 22, 2020.

Confirmed cases in US

Enter the United States at the provincial/state level.

figure t = tiledlayout("flow"); For ii = [5, length(vars)] gb.BubbleColorList = [1,0,1;1,0,0]; gb.LegendVisible = "off"; gb.Title = "As of " + vars(ii); gb.SizeLimits = [0, max(times_conf_us.(vars{length(vars)}))]; Gb. MapCenter = [44.9669 113.6201]; Gb. ZoomLevel = 1.7678;Copy the code

You can see that it started in Washington, and there were big outbreaks in California and New York.

Rank countries/territories by confirmed cases

Let’s compare the number of confirmed cases by country/region using COVID_19_data.csv. There are inconsistencies in the date-time format, so we treat it as text at first.

opts = detectImportOptions(filenames(3), "TextType","string","DatetimeType","text");
Copy the code

Clear date and time format.

Data.nDate = regexprep(Data.Date,"\/20$","/2020");
Data.Date = datetime(Data.Date);
Copy the code

We also need to standardize values in country/region.

Country_Region(Country_Region == "Iran (Islamic Republic of)") = "Iran";
Copy the code

The dataset contains provincial/state variables. Let’s aggregate the data at the country/region level.

countryData = groupsummary(provData,{'ObservationDate','Country_Region'}, ...
    "sum",{'Confirmed','Deaths','Recovered'});
Copy the code

CountryData contains cumulative daily data. We just need the latest numbers.

Increase in confirmed cases by country/territory

We can also examine the rate of increase in cases in these countries.

figure
plot(countryData.ObservationDate(countryData.Country_Region == labelsK(2)), ...
hold on
for ii = 3:length(labelsK)
    plot(countryData.ObservationDate(countryData.Country_Region == labelsK(ii)), ...
Copy the code

Although South Korea is showing signs of slowing growth, it is accelerating elsewhere.

Increase in new cases by country/region

We can calculate the number of new cases by subtracting the cumulative number of confirmed cases between the two dates.


for ii = 1:length(labelsK)
    country = provData(provData.Country_Region == labelsK(ii),:);
    country = groupsummary(country,{'ObservationDate','Country_Region'}, ...

    if labelsK(ii) ~= "Others"
        nexttile
Copy the code

As you can see, China and South Korea are not seeing many new cases. We can see that the epidemic has been contained.

China

As the rate of infection in China is slowing, let’s take a look at how many active cases there are still. You can count active cases by subtracting recovered cases and deaths from confirmed cases.

for ii = 1:length(labelsK)
    by_country{ii}.Active = by_country{ii}.Confirmed - by_country{ii}.Deaths - 

figure
Copy the code

Fitting curve

The number of valid cases is falling, and the curve looks roughly gaussian. Can we fit the Gaussian model and predict when the activity case will be zero?

I use the curve fitting toolbox for Gaussian fitting.


ft = fittype("gauss1");

[fobj, gof] = fit(x,y,ft,opts);
gof
Copy the code

Gof = struct with fields: sse: 4.4145e+08 rsquare: 0.9743 dfe: 47 adjrsquare: 0.9732 rmse: 3.0647e+03Copy the code

Let’s output the forecast by adding 20 days.

Now let’s plot the result.

figure
area(ObservationDate,by_country{1}.Active)
hold on
plot(xdates,yhat,"lineWidth",2)
Copy the code

South Korea

Let’s look at the number of active cases, recovered cases and deaths in South Korea.

It is impossible to obtain any suitable results using the Gaussian model.

Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Matlab was used to analyze novel Coronavirus COVID-19 data using Gaussian curve fitting model

Original link:tecdat.cn/?p=19211

Original source:Tuo End number according to the tribe public number

COVID – 19 data source

Map visualization

Confirmed cases in US

Rank countries/territories by confirmed cases

Increase in confirmed cases by country/territory

Increase in new cases by country/region

China

Fitting curve

South Korea

Matlab was used to analyze novel Coronavirus COVID-19 data using Gaussian curve fitting model

Original link:tecdat.cn/?p=19211

Original source:Tuo End number according to the tribe public number

COVID – 19 data source

Map visualization

Rank countries/territories by confirmed cases

Increase in new cases by country/region

Fitting curve

Related Posts