background

Due to the reasons of GPS devices and networks, the location information collected may have various anomalies, including but not limited to

The device goes offline, resulting in a large time difference and distance (jump point) between adjacent points.
The device is abnormal, resulting in a sudden large deviation (drift point) of GPS point position under continuous timestamp.
The device is abnormal. As a result, multiple data (duplicate data) are returned with the same timestamp.

Combined with our business scenario, we can think roughly

It is easy to remove duplicate data through a simple time-stamp de-duplication algorithm.
Jump points caused by data interrupts (or by fitting algorithms to simulate missing data) can be tolerated within an acceptable time range.
The drift point will have a great influence on the subsequent analysis, and significant drift (such as the speed of light across regions) cannot be accepted.

In the process of data preprocessing, we need to exclude the drift point data

Sliding window filtering method (median, mean);
Kalman filtering method;
Particle filtering method;

Velocity calculation method

Algorithm description and code

Based on specific business scenarios (the speed of ordinary small vehicles must be less than 180KM/H, and the speed of heavy trucks must be less than 100), a simple algorithm is used here to filter based on the estimated speed. Ideas as follows

It is assumed that in a period of time, the vehicle speed satisfies normal distribution;
Sort GPS points according to time series;
The velocity of the current GPS point can be calculated by the time difference and distance difference between two adjacent GPS points.
Because there are missing points, drift points and so on, so the calculated speed is not the same as the actual speed;
The data with the calculated velocity greater than 50 m/s (180KM/H) is suspected to be the drift point;
The drift points are removed from the trajectory sequence;

Note: This algorithm assumes a normal distribution and a maximum speed threshold, so the results are not rigorous.

alter table demo.t_taxi_trajectory add column traj_clean geometry(linestringm, 4326); /* Determine the abnormal trajectory points based on the calculation speed, and return to the new trajectory after clearing the abnormal points. @traj @mps_max Speed threshold in meters/second */ Create or replace function f_GEt_clean_trajectory (Traj Geometry, mps_max int=50) returns geometry as ? declare traj_clean geometry; pt geometry; p0 geometry=null; dur_meters float=0; dur_seconds int=0; begin for pt in select (st_dumppoints(traj)).geom loop -- 1st point if p0 is null then p0 := pt; traj_clean := st_makeline(pt); -- other points else dur_meters := st_distance(pt::geography, p0::geography); dur_seconds := st_m(pt) - st_m(p0); -- Calculate the speed of the current point based on the distance and time between two adjacent points. The calculation speed is less than the threshold, If (dur_meters/dur_seconds) <= mps_max then --raise notice '% - % - %', dur_meters, dur_seconds, ( dur_meters / dur_seconds ); traj_clean := st_addpoint(traj_clean, pt); p0 := pt; else --raise notice '% - % - %', dur_meters, dur_seconds, ( dur_meters / dur_seconds ); null; end if; end if; end loop; return traj_clean; end; ? language plpgsql strict;Copy the code

Algorithm testing

The effect of the algorithm was evaluated by observing the trajectory pattern before and after cleaning. Because the sample data is the taxi track in Beijing, the speed of taxi in the city will not exceed 120KM/H, so we set the threshold of 30M/S.

-- Added a post-cleaning trace field
alter table demo.t_taxi_trajectory add column traj_clean geometry;

update demo.t_taxi_trajectory 
  set traj_clean = demo.f_get_clean_trajectory(traj,30)
where traj_clean is null
;
Copy the code

Restore track points and display them in QGIS

-- Restore the trace point
with dump_pt as
(
  select 
    tid, dt,
    --(st_dumppoints(tr.traj)) as DPT -- Original trace
    (st_dumppoints(tr.traj_clean)) as dpt -- Track after cleaning
  from demo.t_taxi_trajectory tr
  where tr.tid = 1353 and tr.dt = '2008-02-03' 
),
--
pt_list as
(
  select 
    (dpt).path[1] as rn,
    (dpt).geom as pt ,
    *
  from dump_pt
)
--
select 
  tid, dt, rn,
  to_timestamp(st_m(pt)) as ts,
  st_distance(pt::geography, (lag(pt) over w)::geography)::int as len_m ,
  to_timestamp(st_m(pt)) - (lag(to_timestamp(st_m(pt))) over w)  as dur,
  pt,
  1 as endflag
from pt_list 
window w as (partition by tid order by dt)
order by tid, ts
;
Copy the code

Visual effect. – Drift point

The contours of the normal points coincide completely, indicating that the whole is not deformed;
Point A, which is the obvious drift point, has been removed;
Point B is not a drift point, but there is an obvious jump in the track. It should be a long time missing point near point B.

Visual effects – Missing points

rn  |           ts           | len_m |   dur
-----+------------------------+-------+----------910 | 2008-02-03 20:25:28 + 08 67 | | 911 | 2008-02-03 00:04:28 20:28:55 + 08 | 2 | 00:03:27 | 912 | 2008-02-03 20:29:36 + 08 355 | 913 | 2008-02-03 00:00:41 20:46:53 + 08 5945 | | 914 | 2008-02-03 00:17:17 20:58:17 + 08 12019 | | 915 | 00:11:24 21:02:01 + 08 2008-02-03 | 1636 | 916 | 2008-02-03 00:03:44 21:02:06 + 20 | | 08 00:00:05Copy the code

From the query results

The normal collection interval of track points should be within 3 minutes;
At 913,914, there was an obvious time interval (17 minutes, 11 minutes);
The hypothesis that there are long time missing points is verified.

Extended thinking: if it is a strict application scenario, this section of track can be split near the jump point to form continuous multi-section track data.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

PostGIS_ small white note _ Clean track drift point

background

Velocity calculation method

Algorithm description and code

Algorithm testing

Visual effect. – Drift point

Visual effects – Missing points

PostGIS_ small white note _ Clean track drift point

background

Velocity calculation method

Algorithm description and code

Algorithm testing

Visual effect. – Drift point

Visual effects – Missing points

Related Posts

In the factory for two years of screw, I rely on the “test” monthly income over ten thousand, the reverse attack……

Code Generation Tool (1)

What is high cohesion, low coupling?