Another conjecture of list to map
Java8 uses lambda expressions for functional programming to make it very easy to manipulate collections. A common operation is to convert a List into a Map. The Collectors toMap() method is commonly used. A common problem is that when a list contains the same elements, an exception will be thrown if you do not specify which one to take. Therefore, this designation is required.
Of course, another overloaded method, toMap(), can be specified directly. Here, we want to talk about a different approach: Can distinct() be used to filter out duplicate elements from a list before a map is transferred, and then do not have to worry about duplicate elements when a map is transferred?
Use distinct() to de-duplicate a list
Failed to use distinct() directly
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class ListToMap { @AllArgsConstructor @NoArgsConstructor @ToString private static class VideoInfo { @Getter String id; int width; int height; } public static void main(String [] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // preferred: handle duplicated data when toMap() Map<String, VideoInfo> id2VideoInfo = list.stream().collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); System.out.println("No Duplicated1: "); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); // handle duplicated data using distinct(), before toMap() Map<String, VideoInfo> id2VideoInfo2 = list.stream().distinct().collect( Collectors.toMap(VideoInfo::getId, x -> x) ); System.out.println("No Duplicated2: "); id2VideoInfo2.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); }}Copy the code
There are three elements in the list, two of which we consider to be duplicates. The first transformation uses toMap() to specify the handling of duplicate keys directly, so it can be converted to a Map as normal. The second attempt to unduplicate a list before converting it to a map failed again, throwing IllegalStateException, so distinct() should have failed.
No Duplicated1:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
Exception in thread "main" java.lang.IllegalStateException: Duplicate key ListToMap.VideoInfo(id=123, width=1, height=2)
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1253)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.stream.DistinctOps$1$2.accept(DistinctOps.java:175)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at example.mystream.ListToMap.main(ListToMap.java:79)
Copy the code
Reason: Distinct () depends on equals()
If you look at the APIS for Distinct (), you can see the following description:
Returns a stream consisting of the distinct elements (according to {@link Object#equals(Object)}) of this stream.
Obviously, distinct() objects are de-duplicated according to their equals() method. If our VideoInfo class does not overrride the equals() method of the superclass Object, Object will be used.
But Object’s equals() method returns true only if two objects are exactly the same. The desired effect is to treat two VideoInfo objects as the same as long as the id/width/height of the VideoInfo object are the same. So let’s say we override equals() belonging to videoInfo. Related article: Figure out Equals and hashCode once
Override the note for equals()
We design VideoInfo’s equals() as follows:
@Override public boolean equals(Object obj) { if (! (obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; return this.id.equals(vi.id) && this.width == vi.width && this.height == vi.height; }Copy the code
This way, two videoInfo objects are the same as long as all three of their properties are the same. Happy to run the program, still failed! According to?
Effective Java is a great book. James Gosling, the father of Java, said it was the Java tutorial he needed. In this book, the authors point out that if you override equals() on a class, you must override its hashCode() method as well! Must be! There is no room for negotiation!
The overridden equals() must satisfy the following conditions:
-
Comparing equals(), two objects that are equal must have the same value in hashCode();
-
HashCode () can have the same or different values for two objects that are not equal, compared to equals();
Because these are Java rules, violating these rules will cause Java programs to stop running properly.
For more details, I suggest you read the original book, which will benefit a lot. Highly recommended!
Finally, I designed VideoInfo’s hashCode() method as follows:
@Override
public int hashCode() {
int n = 31;
n = n * 31 + this.id.hashCode();
n = n * 31 + this.height;
n = n * 31 + this.width;
return n;
}
Copy the code
Finally, distinct() successfully filters the duplicate elements in the list, and it is ok to use either toMap() to convert a list to a map:
No Duplicated1:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
No Duplicated2:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
Copy the code
extended
Since distinct() is compared to equals(), it is my understanding that three elements in a list need to be compared at least three times. Is that three calls to equals()?
Add a print to equals() so you know. Equals () equals()
@Override
public boolean equals(Object obj) {
if (! (obj instanceof VideoInfo)) {
return false;
}
VideoInfo vi = (VideoInfo) obj;
System.out.println("<===> Invoke equals() ==> " + this.toString() + " vs. " + vi.toString());
return this.id.equals(vi.id) && this.width == vi.width && this.height == vi.height;
}
Copy the code
Results:
No Duplicated1:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
<===> Invoke equals() ==> ListToMap.VideoInfo(id=123, width=1, height=2) vs. ListToMap.VideoInfo(id=123, width=1, height=2)
No Duplicated2:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
Copy the code
Turns out equals() was only called once. Why not three times? If you think about it, hashCode() is the same as hashCode() only once, i.e. the first element of the list and the third element (both VideoInfo(id=123, width=1, height=2)).
So it’s safe to assume that equals() is called only if hashCode() returns the same hashCode. If even hashCode() returns a different hashCode, then you can assume that the two objects must be different!
Test the conjecture:
Change hashCode() to the following:
@Overridepublic
int hashCode() {
return 1;
}
Copy the code
In this way, all objects return the same hashCode() value. Of course, this is consistent with the Java specification, since Java only states that equals() objects must have the same hashCode, but different objects do not necessarily have different Hashcodes.
Results:
No Duplicated1:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
<===> Invoke equals() ==> ListToMap.VideoInfo(id=456, width=4, height=5) vs. ListToMap.VideoInfo(id=123, width=1, height=2)
<===> Invoke equals() ==> ListToMap.VideoInfo(id=456, width=4, height=5) vs. ListToMap.VideoInfo(id=123, width=1, height=2)
<===> Invoke equals() ==> ListToMap.VideoInfo(id=123, width=1, height=2) vs. ListToMap.VideoInfo(id=123, width=1, height=2)
No Duplicated2:
<123, ListToMap.VideoInfo(id=123, width=1, height=2)>
<456, ListToMap.VideoInfo(id=456, width=4, height=5)>
Copy the code
Sure enough, equals() was called three times! It does appear that equal() is called only when hashCode is the same to further determine whether two objects are identical; If the hashCode is different, the two objects are obviously different. The conjecture is correct.
Search the public account “AIO life”, reply “1024”, send you a Java treasure book ebook. PDF
conclusion
-
ToMap () is recommended for list toMap, and it is easy but beneficial to specify a trade-off rule for duplicates, regardless of whether duplicates occur.
-
Using distinct() for a custom class, be sure to override the equals() method;
-
Override equals(), be sure to override hashCode();
-
While it’s possible to design a hashCode() that simply returns 1 without violating Java rules, doing so can have a number of consequences. For example, when such objects are stored in a hashMap, all objects have the same hashCode, and eventually all objects are stored in the same bucket of the hashMap, directly deteriorating the hashMap into a linked list. Thus, the complexity of O(1) is rounded into O(n), which naturally degrades performance greatly.
-
A good book is a ladder to progress. — Gorky. Effecctive Java, for example.
Final reference procedure:
package example.mystream;
import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.ToString;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class ListToMap {
@AllArgsConstructor
@NoArgsConstructor
@ToString
private static class VideoInfo {
@Getter
String id;
int width;
int height;
public static void main(String [] args) {
System.out.println(new VideoInfo("123", 1, 2).equals(new VideoInfo("123", 1, 2)));
}
@Override
public boolean equals(Object obj) {
if (!(obj instanceof VideoInfo)) {
return false;
}
VideoInfo vi = (VideoInfo) obj;
return this.id.equals(vi.id)
&& this.width == vi.width
&& this.height == vi.height;
}
/**
* If equals() is override, hashCode() must be override, too.
* 1\. if a equals b, they must have the same hashCode;
* 2\. if a doesn't equals b, they may have the same hashCode;
* 3\. hashCode written in this way can be affected by sequence of the fields;
* 3\. 2^5 - 1 = 31\. So 31 will be faster when do the multiplication,
* because it can be replaced by bit-shifting: 31 * i = (i << 5) - i.
* @return
*/
@Override
public int hashCode() {
int n = 31;
n = n * 31 + this.id.hashCode();
n = n * 31 + this.height;
n = n * 31 + this.width;
return n;
}
}
public static void main(String [] args) {
List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2),
new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2));
// preferred: handle duplicated data when toMap()
Map<String, VideoInfo> id2VideoInfo = list.stream().collect(
Collectors.toMap(VideoInfo::getId, x -> x,
(oldValue, newValue) -> newValue)
);
System.out.println("No Duplicated1: ");
id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">"));
// handle duplicated data using distinct(), before toMap()
// Note that distinct() relies on equals() in the object
// if you override equals(), hashCode() must be override together
Map<String, VideoInfo> id2VideoInfo2 = list.stream().distinct().collect(
Collectors.toMap(VideoInfo::getId, x -> x)
);
System.out.println("No Duplicated2: ");
id2VideoInfo2.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">"));
}
}
Copy the code
To develop
Assume that the class belongs to someone else and cannot be modified
Above, VideoInfo makes it our own class to which we can add equals() and hashCode() methods. If VideoInfo is a class in a dependency we reference and we have no right to modify it, is there no way to use distinct() to custom filter objects based on whether some elements are the same?
Use the wrapper
One possible answer to StackOverflow is to use a wrapper.
Assuming that in a dependency (we have no right to modify the class), VideoInfo is defined as follows:
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class VideoInfo {
@Getter
String id;
int width;
int height;
}
Copy the code
Using the wrapper idea, write the program as follows (of course, for the sake of the program’s runnability, VideoInfo is still included, assuming that it cannot be modified and no methods can be added to it) :
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class DistinctByWrapper { private static class VideoInfoWrapper { private final VideoInfo videoInfo; public VideoInfoWrapper(VideoInfo videoInfo) { this.videoInfo = videoInfo; } public VideoInfo unwrap() { return videoInfo; } @Override public boolean equals(Object obj) { if (! (obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; return videoInfo.id.equals(vi.id) && videoInfo.width == vi.width && videoInfo.height == vi.height; } @Override public int hashCode() { int n = 31; n = n * 31 + videoInfo.id.hashCode(); n = n * 31 + videoInfo.height; n = n * 31 + videoInfo.width; return n; } } public static void main(String [] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // VideoInfo --map()--> VideoInfoWrapper ----> distinct(): VideoInfoWrapper --map()--> VideoInfo Map<String, VideoInfo> id2VideoInfo = list.stream() .map(VideoInfoWrapper::new).distinct().map(VideoInfoWrapper::unwrap) .collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); } } /** * Assume that VideoInfo is a class that we can't modify */ @AllArgsConstructor @NoArgsConstructor @ToString class VideoInfo { @Getter String id; int width; int height; }Copy the code
The whole wrapper idea is simply to construct another class VideoInfoWrapper and add hashCode() and equals() to the wrapper so that the Wrapper objects can be filtered according to custom rules.
Search the public account “AIO life”, reply “1024”, send you a Java treasure book ebook. PDF
We can’t custom filter VideoInfo, but we can custom filter VideoInfoWrapper.
All you need to do is convert all of VideoInfo to VideoInfoWrapper, filter out some Of the VideoInfoWrapper, and transfer the rest of the VideoInfoWrapper back to VideoInfo to filter VideoInfo. Very clever!
Replace distinct() with “filter() + custom functions”
A more subtle way to do this is to customize a function:
private static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Map<Object, Boolean> map = new ConcurrentHashMap<>();
return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
Copy the code
The input element is of type T and its superclass, keyExtracctor is a mapping function that returns Object, and the whole function passed in should be to extract keys. The distinctByKey function returns a Predicate function of type T.)
This function passes in a function (lambda) that fetches the key from the passed object and attempts to place the key in the concurrentHashMap. If the key is inserted, the function returns false. If it cannot be inserted, the key is the same as a previous key, and the function returns true.
This function ends up as an input parameter to filter(). According to the Java API, the filter(func) rule is as follows: If func is true, the filter is performed. Otherwise, the filter is not performed. Therefore, with filter() + custom function, any duplicate key returns true and is filtered out by filter(), leaving nothing duplicate.
The final implementation program is as follows:
package example.mystream;
import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.ToString;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
public class DistinctByFilterAndLambda {
public static void main(String[] args) {
List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2),
new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2));
// Get distinct only
Map<String, VideoInfo> id2VideoInfo = list.stream().filter(distinctByKey(vi -> vi.getId())).collect(
Collectors.toMap(VideoInfo::getId, x -> x,
(oldValue, newValue) -> newValue)
);
id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">"));
}
/**
* If a key could not be put into ConcurrentHashMap, that means the key is duplicated
* @param keyExtractor a mapping function to produce keys
* @param <T> the type of the input elements
* @return true if key is duplicated; else return false
*/
private static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Map<Object, Boolean> map = new ConcurrentHashMap<>();
return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
/**
* Assume that VideoInfo is a class that we can't modify
*/
@AllArgsConstructor
@NoArgsConstructor
@ToString
class VideoInfo {
@Getter
String id;
int width;
int height;
}
Copy the code
END
Original is not easy, burning hair output content, if there is a lost harvest, a praise to encourage it! There is also a little surprise, to help you sort out some technical e-books, follow the public account reply “1024” can be obtained ~