background

When processing GPS data, there are often a lot of data with similar time stamps and distances in the record, which on the one hand wastes a lot of storage space, and on the other hand causes the graph to be expressed not smooth or does not conform to the standard. Therefore, it is necessary to minimize the number of data points by some rules while ensuring the shape of the vector curve is unchanged, which is called thinning.

Commonly used thinning algorithms, including but not limited to (this section is from the web)

  • Step size compression algorithm: extract a point every certain step length along the continuous curve, compress all the other points, and then use straight line continuity or curve fitting to approximate between adjacent extraction points.
  • Segment filtering algorithm: When the length of a segment is less than a certain filter value, the midpoint of the segment is replaced by the midpoint of the segment, just as the two ends of the segment degenerate to the midpoint.
  • Douglas-peuker algorithm (also known as Lamer-Douglas-Peuker algorithm, iterative adaptive point algorithm, splitting and merging algorithm) is an algorithm that approximates the curve as a series of points and reduces the number of points.

PostGIS has some common algorithms built in, so let’s do a simple test.

Use PostGIS functions

The PostGIS functions used are as follows

  • ST_Simplify
  • ST_SimplifyVW
  • ST_SimplifyPreserveTopology

Track smoke dilute

The function ST_Simplify() performs trajectory thinning based on douglas-Peucker algorithm, and the specific principle is not explained here. However, it should be noted that when WGS84 is used, the tolerance parameter of the algorithm is in degrees.

Latitude and longitude vary by degree in different regions, and if the Earth is assumed to be a perfect sphere (thus assuming that the error is not very great), the region of latitude B:

  • The latitude changes by one degree, and the spherical north-south distance changes by PI R/180…….. 111.7 km
  • The longitude changes by one degree, and the distance between the east and west directions of the sphere changes: πR/180cosB …. 111.7cosB

For example, if Beijing B = 40, cosB = 0.766 and longitude changes by 1 degree, the distance from east to west changes by 85.567km.

For simplicity of calculation, let’s assume that one degree in China is about 100 kilometers.

Douglas Peucker algorithm

After thinning, the trajectory distance will be lost due to topological shape changes. We want the result to be as small a proportion of space as possible and as large a proportion of distance as possible.

To test the comparison of thinning results when the volume gap is 1000 m, 100 m and 10 m respectively.

with simplified_trajectory as 
(
	-- 1 degree = 100 km
	select 
		tr.tid, tr.dt,
		st_npoints(traj) as np_orgin,
		(st_length(traj::geography)/1000) : :int as km_orgin,
		-- 0.01 degree = 1000 m
		st_npoints( ST_Simplify(traj,0.01.true))as np_01,
		(st_length(ST_Simplify(traj,0.01.true)::geography)/1000) : :int as km_01,
		-- 0.001 degree = 100 m
		st_npoints( ST_Simplify(traj,0.001.true))as np_001,
		(st_length(ST_Simplify(traj,0.001.true)::geography)/1000) : :int as km_001,
		-- 0.0001 degree = 10 m
		st_npoints( ST_Simplify(traj,0.0001.true))as np_0001,
		(st_length(ST_Simplify(traj,0.0001.true)::geography)/1000) : :int as km_0001,
		1 as endflag
	from demo.t_taxi_trajectory tr
)
select st.tid, st.dt, st.np_orgin, st.km_orgin,
	(st.np_01 / st.np_orgin::float) : :decimal(4.2) as np_01_pct,
	(st.km_01 / st.km_orgin::float) : :decimal(4.2) as km_01_pct,
	(st.np_001 / st.np_orgin::float) : :decimal(4.2) as np_001_pct,
	(st.km_001 / st.km_orgin::float) : :decimal(4.2) as km_001_pct,
	(st.np_0001 / st.np_orgin::float) : :decimal(4.2) as np_0001_pct,
	(st.km_0001 / st.km_orgin::float) : :decimal(4.2) as km_0001_pct
from simplified_trajectory st
where 1=1	
	and st.dt = '2008-02-06'
	and st.tid  in (28.10012)
	order by st.tid, st.dt
;
Copy the code

Visvalingam – Whyatt algorithm

ST_SimplifyVW has another built-in thinning algorithm, not as well known as DP. Use the same method, here I don’t write the statement, interested in the official documentation.

During the test, it was found that the tolerance in VW algorithm was different from DP, for unknown reasons.

Step size compression algorithm

For the original track points, the step size algorithm of picking one out of every ten is used, that is, picking one out of every ten original points.

The normal practice is to start with raw GPS point data, but for demonstration and testing, choose to restore the track to GPS and then exhaust it
with dump_pt as
(
	select tid, 
		(st_dumppoints(tr.traj)).path[1] as sn, 
		(st_dumppoints(tr.traj)).geom as pt 
	from demo.t_taxi_trajectory tr
),
Every 10 primitive points are divided into a group
rank_pt as
(
	select 
		tid, to_char(to_timestamp(st_m(pt)), 'yyyy-mm-dd') as ts ,
		st_setsrid( st_makepointm( st_x(pt), st_y(pt), st_m(pt) ), 4326) pt,
		row_number() over (partition by tid, sn/10 order by sn) as rn
	from dump_pt 
)
select rpt.tid, rpt.ts,
	st_makeline( array_agg(rpt.pt order by rpt.ts )::geometry[] ) as traj
from rank_pt rpt
where rpt.rn = 1
group by rpt.tid, rpt.ts
;
Copy the code

The influence of tolerance threshold in DP algorithm

tid | dt | np_orgin | km_orgin | np_01_pct | km_01_pct | np_001_pct | km_001_pct | np_0001_pct | km_0001_pct -------+------------+----------+----------+-----------+-----------+------------+------------+-------------+------------- 28 947 | 2008-02-06 | | 180 | | | | | 0.97 0.03 0.94 0.01 0.15 0.98 10012 | 2008-02-06 | | 687 | | | | 0.00 0.00 0.00 | | | 0.14 0.00 0.40 (2 rows)Copy the code

You can see from this sample data

  • Long original distance (947km)
    • When tolerance is 1000 m, the number of coordinate points is reduced to 1%, and the distance is reduced to 94%.
    • When tolerance is 100 m, the number of coordinate points is reduced to 3% and the distance is reduced to 97%.
    • When tolerance is 10 meters, the number of coordinate points is reduced to 15%, and the distance is reduced to 98%.
  • Short original distance (5km)
    • When tolerance is 1000m, the number of coordinate points is reduced to 0%, and the distance is reduced to 0% (serious deviation);
    • When tolerance is 100 meters, the number of coordinate points is reduced to 0%, and the distance is reduced to 0% (serious deviation).
    • When tolerance is 10 meters, the number of coordinate points is reduced to 15%, and the distance is reduced to 40%.

Therefore, when the original track distance is long, the effect of tolerance of 100 meters is very good. If the original distance is very short, smaller tolerances are required.

Comparison of the effect of track thinning

A sample track is processed by different thinning algorithms, and the volume and accuracy are compared.

drop table if exists demo.t_02;
create table if not exists demo.t_02 (id text, traj geometry(linestringm, 4326));


insert into demo.t_02 
select 'src', traj from demo.t_taxi_trajectory where tid=28 and dt='2008-02-02'
union all
select 'dp', ST_Simplify(traj,0.001.true)  from demo.t_taxi_trajectory where tid=28 and dt='2008-02-02'
union all
select 'vw', ST_SimplifyVW(traj,0.00001)  from demo.t_taxi_trajectory where tid=28 and dt='2008-02-02'
;

--
with dump_pt as
(
	select tid, 
		(st_dumppoints(tr.traj)).path[1] as sn, 
		(st_dumppoints(tr.traj)).geom as pt 
	from demo.t_taxi_trajectory tr
	where tr.tid in (28) 
		and tr.dt = '2008-02-02' 
),
--
rank_pt as
(
	select 
		tid, to_char(to_timestamp(st_m(pt)), 'yyyy-mm-dd') as ts ,
		st_setsrid( st_makepointm( st_x(pt), st_y(pt), st_m(pt) ), 4326) pt,
		row_number() over (partition by tid, sn/10 order by sn) as rn
	from dump_pt 
)
--
insert into demo.t_02 
select 10-1 ' ',
	st_makeline( array_agg(rpt.pt order by rpt.ts )::geometry[] ) as traj
from rank_pt rpt
where rpt.rn = 1
group by rpt.tid, rpt.ts
;
Copy the code
algorithm Number of track points Track kilometers
The original trajectory 387 190
DP 0.001 130 188
VW 0.00001 107 174
Take 1 out of 10 steps 39 140

The result comparison figure is as follows (the conclusion is only valid for this data). It can be seen that DP has the best accuracy and the worst step size is 10 and 1.

Topology extraction dilute

Douglas Peucker algorithm

Similar to ST_Simplify, the function ST_SimplifyPreserveTopology also uses the DP algorithm for thinning. But ST_SimplifyPreserveTopology ensures that the data is a valid topology.

I don’t know exactly, but I do understand

  • M values are not supported, so cannot be used to dilute lineStringm-based trace data.
  • The Polygon object is thinned out, preferably using this function. (Don’t ask why, because I don’t understand)
-- 0.001 tolerance of DP topology dilution
select 
	st_multi(st_simplifypreservetopology(fence_polygon, 0.001)) as fence_polygon
from demo.t_area_fence where id = 510107;
Copy the code

Topology thinning effect comparison

A sample fence is processed with different thinning algorithms, and the capacity and accuracy are compared.

Warning: data extraction of national territorial boundaries is only a research activity to verify the technical feasibility and cannot be used in any other scenarios!!

drop table if exists demo.t_03;
create table if not exists demo.t_03 (id text, fence_polygon geometry(multipolygon, 4326));


insert into demo.t_03 
select 'src', fence_polygon from demo.t_area_fence where id = 510107
union all
select 'dp', st_multi(ST_Simplify(fence_polygon,0.001.true)) from demo.t_area_fence where id = 510107
union all
select 'dppt', st_multi(st_simplifypreservetopology(fence_polygon, 0.001)) as fence_polygon from demo.t_area_fence where id = 510107
;
Copy the code
algorithm Number of fence points
The original fence 830
DP 0.001 126
DP topology 0.001 127

The effect comparison figure is as follows (the conclusion is only valid for this data), and the part in the red box has slight differences, which can be considered in GIS business scenarios with technical research nature.

Warning: data extraction of national territorial boundaries is only a research activity to verify the technical feasibility and cannot be used in any other scenarios!!