Iterators and batch queries
Iterator pattern
There are many articles and examples on the web that explain the iterator pattern.
The iterator pattern provides a way to access the elements of an aggregate object sequentially without exposing its internal representation. Put the task of wandering on iterators, not aggregators. This simplifies the interface and implementation of aggregation, and puts the responsibility in place.
It is clear that the concepts of iterators and collections are closely linked. Collections classes in the JDK typically implement the Iterator interface internally and are exposed to callers.
The core methods of the Iterator interface are hasNext and Next. The former determines if there are any untraversed elements in the Iterator, and the latter returns the next traversed element.
public interface Iterator<E> {
boolean hasNext(a);
E next(a);
default void remove(a) {
throw new UnsupportedOperationException("remove");
}
default void forEachRemaining(Consumer<? super E> action) {
Objects.requireNonNull(action);
while(hasNext()) action.accept(next()); }}Copy the code
Partial query
Batch query is very common in practical application development, where the caller needs to obtain a large amount of data from the database. The data provider loading all the data to the caller at once is very unfriendly to system memory and network IO. Therefore, data providers generally provide batch query methods, which can return data from multiple batches without overlap. In order to ensure that data is ordered and not overlapped between batches, data providers must use a uniform ordering method when obtaining data.
Batch query
There are many ways to implement batch queries, and the most common ones are described below
Paging query
The core parameter of paging query is to provide the current number of pages and the number of pages per page. The interface is as follows
public interface PageGet<T> {
/** * Get data based on page location and number of pages per page */
List<T> pageGetFunc(int page, int pageSize, QueryParam queryParam);
}
Copy the code
Rolling pages query
The core parameter of a page scroll query is to provide the location of the cursor. Usually, the cursor information is obtained from the last data obtained in the last batch. The interface is as follows
public interface ScrollGet<T> {
/** * Get data based on cursor position */
List<T> scrollGetFunc(T cursor, QueryParam queryParam);
}
Copy the code
Special paging/scrolling query
In either paging or scrolling queries, you must specify a sort when retrieving data. Sometimes sorting large amounts of data is itself a stressful thing for the database. For example, in the ES search scenario, sorting millions of hit data is a time-consuming operation. In addition, deep paging caused by batch queries with large amounts of data may put some pressure on the database.
One way to solve this problem is to reduce the range of data that can be matched by a query condition and slice the large amount of data that can be matched by a query condition by ID or time (usually the same dimension as the sorting field). This sharding and then batching method can reduce the amount of data participating in the sorting each time, thus reducing the pressure on the database. One of the benefits of shard is that it provides a means for parallel batching of data, especially for paging queries.
Taking the paging interface as an example, the following batch fetching method adds the Range parameter. Range is essentially a query condition, and from an implementation point of view can be tucked into QueryParam. As a single parameter here, we want to show that the data demand side ultimately needs is the data hit by QueryParam conditions, and Range is just a sharding method in the data acquisition process.
public interface SpecPageGet<T> {
/** * gets data based on the location of pages and the number of pages per page, specifying a data range */
List<T> specPageGetFunc(int page, int pageSize, Range range, QueryParam queryParam);
}
Copy the code
Iterator wrapping
In the scenario mentioned above, the caller needs to query for all the data hit by the condition and retrieve the data through the paging interface in multiple non-overlapping batches. We can think of all the data hit by the query as a collection, and the data retrieved by the caller is returned in a certain order. As you can see, this is very consistent with the idea of iterators. We can wrap the above batching interface as an iterator.
Paging query iterator wrapper
Utility class
/** * Wrap the paging interface as an iterator@paramPageGetFunc Specifies the interface for retrieving pages. The input parameter is page and the list data * is returned@paramSize Size of the page, which is provided to the external specified *@param <T>
* @return* /
public static <T> Iterator<T> wrapPageGetApiToIterator(
Function<Integer, List<T>> pageGetFunc,
int size) {
return new Iterator<T>() {
private int page = 0;
private boolean pageGetHasMore = true;
private Iterator<T> currentIterator = Collections.emptyIterator();
public void tryStorageGet(a) {
if (currentIterator.hasNext()) {
return;
}
if(! pageGetHasMore) {return;
}
List<T> list = pageGetFunc.apply(page);
pageGetHasMore = list.size() >= size;
currentIterator = list.iterator();
page++;
}
@Override
public boolean hasNext(a) {
tryStorageGet();
return currentIterator.hasNext();
}
@Override
public T next(a) {
tryStorageGet();
returncurrentIterator.next(); }}; }Copy the code
Interface package
public interface PageGet<T> {
int DEFAULT_PAGE_SIZE = 100;
/** * Get data based on page location and number of pages per page */
List<T> pageGetFunc(int page, int pageSize, QueryParam queryParam);
default Iterator<T> getDataIterator(QueryParam queryParam) {
return getDataIterator(queryParam, DEFAULT_PAGE_SIZE);
}
default Iterator<T> getDataIterator(QueryParam queryParam, int pageSize) {
returnIteratorWrapUtils.wrapGetApiToIterator( (page) -> pageGetFunc(page, pageSize, queryParam), pageSize ); }}Copy the code
Scroll query iterator wrapper
Utility methods
/** * Wrap the page roll interface as an iterator@paramScrollGetFunc retrieves the interface, takes a cursor as an input, and returns a list *@paramSize Size of the page, which is provided to the external specified *@param <T>
* @return* /
public static <T> Iterator<T> wrapScrollGetApiToIterator(
Function<T, List<T>> scrollGetFunc,
int size) {
return new Iterator<T>() {
private T lastScrollGetItem = null;
private boolean pageGetHasMore = true;
private Iterator<T> currentIterator = Collections.emptyIterator();
public void tryStorageGet(a) {
if (currentIterator.hasNext()) {
return;
}
if(! pageGetHasMore) {return;
}
List<T> list = scrollGetFunc.apply(lastScrollGetItem);
if(! list.isEmpty()) { lastScrollGetItem = list.get(list.size() -1);
}
pageGetHasMore = list.size() >= size;
currentIterator = list.iterator();
}
@Override
public boolean hasNext(a) {
tryStorageGet();
return currentIterator.hasNext();
}
@Override
public T next(a) {
tryStorageGet();
returncurrentIterator.next(); }}; }Copy the code
Interface package
public interface ScrollGet<T> {
int DEFAULT_PAGE_SIZE = 100;
/** * Get data based on cursor position */
List<T> scrollGetFunc(T cursor, QueryParam queryParam);
default Iterator<T> getDataIterator(QueryParam queryParam) {
return getDataIterator(queryParam, DEFAULT_PAGE_SIZE);
}
default Iterator<T> getDataIterator(QueryParam queryParam, int pageSize) {
returnIteratorWrapUtils.wrapScrollGetApiToIterator( (lastItem) -> scrollGetFunc(lastItem, queryParam), pageSize ); }}Copy the code
Special paging/scrolling query
How do I wrap both sharding and batching data retrieval processes into iterators? It is critical to bring the sharding process inside the iterator and provide non-aware sharding externally. The implementation of sharding is usually related to the characteristics of the data itself. Whether the data is evenly distributed or closely distributed in the specified range, and the maximum number of sorts allowed by the database have great influence on the determination of the sharding range. You can combine this function with specific scenarios.