preface

Two days ago has just written a article about performance optimization, described in view of the optimization of a function point, and some ideas of further optimization, this several days on a business trip, evening is idle to nothing, and then the optimization ideas with your colleagues realized, but discovered before inference is not correct, here chat this problem.

Antecedents to review

For the function of data import (N), a key step is to find the region where the data is located by making spatial matching from dozens of regions (R). We discussed three versions of optimization last time, as shown below:

  • The first version is the direct-insert version
  • The second version is batch insert version and then modify
  • The third version is the batch (B) point sequence matching region + batch insert version

Finally, the improvement idea of the fourth version is to batch insert first, use the region to batch match data, and then do the batch modification of the region. The basis for this is that V3 requires N spatial matches of R regions, while V4 requires N/B*R spatial matches of B points. (N > > R, B > R). It is speculated that there is at least half of the optimization effect, and maybe even an order of magnitude.

Show me the code

V3 code


    @Override
    public void transformCSV(Path filePath) {
        // Delete previous lightning data
        flushDb();
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath.toFile())));
            CSVParser parser = CSVFormat.DEFAULT.parse(reader);
            Iterator<CSVRecord> iterator = parser.iterator();
            // Skip the header
            if (iterator.hasNext()) {
                iterator.next();
            }

            List<List<String>> csvList = new ArrayList<>();
            while (iterator.hasNext()) {
                CSVRecord record = iterator.next();
                // The contents of each line
                List<String> value = new ArrayList<>();
                for (int j = 0; j < record.size(); j++) {
                    value.add(record.get(j));
                }
                if (csvList.size() >= BATCH_SIZE) {
                    thunderBoltDataFilter(csvList);
                    csvList.clear();
                }
                csvList.add(value);
            }
            thunderBoltDataFilter(csvList);

        } catch (FileNotFoundException e) {
            log.error("Temporary file not found", e);
        } catch (Exception e) {

            log.error("Undefined exception", e); flushDb(); }}private void flushDb(a) {
        remove(new QueryWrapper<>());
    }

    private List<OriginalThunderbolt> loadDataFromCSV(List<List<String>> csvList) {


        List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();

        for (List<String> csvString : csvList) {
            OriginalThunderbolt originalThunderbolt = new OriginalThunderbolt();

            originalThunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());

            originalThunderbolt.setCode(csvString.get(0));

            if (StringUtils.isNotEmpty(csvString.get(1))) {
                Long time = TimeUtils.stringToDateLong(csvString.get(1));
                originalThunderbolt.setTime(time);
            }

            originalThunderbolt.setType(csvString.get(2));
            originalThunderbolt.setHeight(Double.valueOf(csvString.get(3)));
            originalThunderbolt.setStrength(Double.valueOf(csvString.get(4)));
            originalThunderbolt.setLatitude(Double.valueOf(csvString.get(5)));
            originalThunderbolt.setLongitude(Double.valueOf(csvString.get(6)));
            originalThunderbolt.setProvinces(csvString.get(7));
            originalThunderbolt.setCities(csvString.get(8));
            originalThunderbolt.setCounties(csvString.get(9));
            originalThunderbolt.setLocationMode(csvString.get(10));
            originalThunderbolt.setSteepness(csvString.get(11));
            originalThunderbolt.setDeviation(csvString.get(12));
            originalThunderbolt.setLocatorNumber(csvString.get(13));

            originalThunderbolt.setStatus("Out of sync");
            originalThunderbolt.setCreateTime(System.currentTimeMillis());
            originalThunderbolt.setUpdateTime(System.currentTimeMillis());
            originalThunderbolt.setDeleted(false);

            originalThunderboltList.add(originalThunderbolt);

        }

        return originalThunderboltList;
    }

    private QueryWrapper<Thunderbolt> buildWrapper(ThunderboltPageModel thunderboltPageModel) {
        QueryWrapper<Thunderbolt> wrapper = new QueryWrapper<>();

        if (StringUtils.isNotEmpty(thunderboltPageModel.getForestryBureauName())) {
            wrapper.likeRight("forestry_bureau", thunderboltPageModel.getForestryBureauName());
        }
        if (StringUtils.isNotEmpty(thunderboltPageModel.getForestryBureauName())) {
            wrapper.likeRight("forest_farm", thunderboltPageModel.getForestryBureauName());
        }
        if (StringUtils.isNotEmpty(thunderboltPageModel.getCode())) {
            wrapper.likeRight("code", thunderboltPageModel.getCode());
        }
        if (StringUtils.isNotEmpty(thunderboltPageModel.getType())) {
            wrapper.eq("type", thunderboltPageModel.getType());
        }
        if (StringUtils.isNotEmpty(thunderboltPageModel.getLocatorNumber())) {
            wrapper.likeRight("locator_number", thunderboltPageModel.getLocatorNumber());
        }
        if (StringUtils.isNotEmpty(thunderboltPageModel.getStatus())) {
            wrapper.eq("status", thunderboltPageModel.getStatus());
        }
        if(thunderboltPageModel.getDiscoverStartTime() ! =null) {
            wrapper.ge("time", thunderboltPageModel.getDiscoverStartTime());
        }
        if(thunderboltPageModel.getDiscoverEndTime() ! =null) {
            wrapper.le("time", thunderboltPageModel.getDiscoverEndTime());
        }

        wrapper.orderByDesc("update_time");

        wrapper.eq(DELETED_COLUMN, DEFAULT_STATUS);

        return wrapper;
    }

    private void thunderBoltDataFilter(List<List<String>> csvList) {
        List<OriginalThunderbolt> originalThunderbolts = loadDataFromCSV(csvList);

        List<Point> points = new ArrayList<>();
        List<OriginalThunderbolt> dmzThunderboltList = new ArrayList<>();
        originalThunderbolts.forEach(th -> {
            // The lightning spot is inside the area
            if (SpatialUtils.isInPolygon(th.getLongitude(), th.getLatitude(), findParasFromResource())) {
                Point point = newPoint(String.valueOf(th.getLongitude()), String.valueOf(th.getLatitude())); dmzThunderboltList.add(th); points.add(point); }});// Pass in SIM service to match mine point with forest
        List<OrganizationVO> matchedData = simOrgClient.matchAlarmPointWithOrgRegion(points);
        List<Thunderbolt> list = new ArrayList<>();
        // Model transformation, insert database
        for (int i = 0; i < dmzThunderboltList.size(); i++) {
            OrganizationVO organizationVO = matchedData.get(i);
            OriginalThunderbolt originalThunderbolt = dmzThunderboltList.get(i);
            Thunderbolt thunderbolt = new Thunderbolt(organizationVO, originalThunderbolt);
            list.add(thunderbolt);

        }
        insertBatchThunderbolt(list);

    }
Copy the code

V4 code

 @Override
    public void transformCSV(Path filePath) {
        List<SynOrganizationVO> organizationVOList = simOrgClient.findAllByType("Forest");
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath.toFile())));
            CSVParser parser = CSVFormat.DEFAULT.parse(reader);
            Iterator<CSVRecord> iterator = parser.iterator();
            // Skip the header
            if (iterator.hasNext()) {
                iterator.next();
            }

            List<List<String>> csvList = new ArrayList<>();
            while (iterator.hasNext()) {
                CSVRecord record = iterator.next();
                // The contents of each line
                List<String> value = new ArrayList<>();
                for (int j = 0; j < record.size(); j++) {
                    value.add(record.get(j));
                }
                if (csvList.size() >= BATCH_SIZE) {
                    thunderBoltDataFilter(csvList,organizationVOList);
                    csvList.clear();
                }
                csvList.add(value);
            }
            thunderBoltDataFilter(csvList,organizationVOList);

        } catch (FileNotFoundException e) {
            log.error("Temporary file not found", e);
        } catch (Exception e) {
            log.error("Undefined exception", e);
            deleteByStatus("Out of sync"); }}private void flushDb(a) {
        remove(new QueryWrapper<>());
    }

    private void deleteByStatus(String status) {
        QueryWrapper<Thunderbolt> wrapper = new QueryWrapper<>();
        if (StringUtils.isNotEmpty(status)) {
            wrapper.eq("status", status);
        }
        remove(wrapper);
    }

    private List<OriginalThunderbolt> loadDataFromCSV(List<List<String>> csvList) {


        List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();

        for (List<String> csvString : csvList) {
            OriginalThunderbolt originalThunderbolt = new OriginalThunderbolt();

            originalThunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());

            originalThunderbolt.setCode(csvString.get(0));

            if (StringUtils.isNotEmpty(csvString.get(1))) {
                Long time = TimeUtils.stringToDateLong(csvString.get(1));
                originalThunderbolt.setTime(time);
            }

            originalThunderbolt.setType(csvString.get(2));
            originalThunderbolt.setHeight(Double.valueOf(csvString.get(3)));
            originalThunderbolt.setStrength(Double.valueOf(csvString.get(4)));
            originalThunderbolt.setLatitude(Double.valueOf(csvString.get(5)));
            originalThunderbolt.setLongitude(Double.valueOf(csvString.get(6)));
            originalThunderbolt.setProvinces(csvString.get(7));
            originalThunderbolt.setCities(csvString.get(8));
            originalThunderbolt.setCounties(csvString.get(9));
            originalThunderbolt.setLocationMode(csvString.get(10));
            originalThunderbolt.setSteepness(csvString.get(11));
            originalThunderbolt.setDeviation(csvString.get(12));
            originalThunderbolt.setLocatorNumber(csvString.get(13));

            originalThunderbolt.setStatus("Out of sync");
            originalThunderbolt.setCreateTime(System.currentTimeMillis());
            originalThunderbolt.setUpdateTime(System.currentTimeMillis());
            originalThunderbolt.setDeleted(false);

            originalThunderboltList.add(originalThunderbolt);

        }

        return originalThunderboltList;
    }
Copy the code

The test results

Without any further improvements to the database,

20K data volume v3,42s V4,41s

To make matters worse, the time in V4 was erratic, sometimes reaching nearly 2 minutes, and the more data there was, the more time it didn’t grow linearly. In contrast, V3 time has always maintained a linear growth.

See this test result, really slap in the face, hurriedly analyze the question again. There are several problems with the previous inference:

  1. It is assumed that the SQL execution time of point-finding and point-finding is not very different. In fact, this is not true, V3 version of each match only R query space. The V4 query is not a constant, and the query space increases with batch size (i-1)*B. If you add state to the data and adjust it before and after batch processing, indexing the state at the database level should alleviate this problem to some extent.
  2. In a single batch of data, the spatial matching time of V4 is much larger than that of V3, which may be caused by the difference of the underlying spatial analysis algorithm, which is a black box to some extent. I hadn’t thought about it before.

In response to the test results, we rolled back the V4 changes…

conclusion

  • Any optimization should not just stay in theory, must be hands-on, on the actual data test;
  • Test should be established under the test benchmark, do not optimize by feeling;
  • Branch and protect your code, and your optimizations are likely to roll back. PS IDEA localHistory function really smells good;