More excellent articles.
“Microservices are not all, just a subset of a specific domain”
Selection and process should be careful, otherwise it will be out of control.
Out of all the monitoring components, there’s always one for you
What are we Developing with Netty?
This is probably the most pertinent Redis specification
Portrait of a Programmer: A Decade of Ups and Downs
Most useful series:
The Most Common Set of “Vim” Techniques for Linux Production
The most Common set of “Sed” Techniques for Linux Production
The Most Common Set of “AWK” Tips for Linux Production
Recently, an interesting young colleague came on board and submitted lots and lots of code. When you look at the git record, it turns out that the code has been refactored using a lot of java8 syntactically features. The most commonly used ones are maps and flatmaps. But others were reluctant, and while some felt the code was easier to understand, many felt it was more obscure. It was like: Cut your pants and fart.
The scope of these functions, I think, can be divided into three categories, depending on their level. It’s everywhere.
Don’t overuse
I don’t know when these functions became popular, but they must be very closely related to functional programming. I think it started with Scala in 2004.
There’s nothing magical about them, they’re all syntax candy, and their purpose is to make your programs simpler. You could have done it with a little more code if you wanted to. Don’t use it deliberately to show off your skills. If you don’t use it well, the effect will be very negative. Java, for example, is not a functional programming language, so lambda is just an aid; And if you write Lisp code in Java, it will be neither fish nor fish.
But languages have to merge, because that’s the way it is. Instead of looking at the design behind them, let’s take a horizontal look at what they represent from the semantic surface of the API.
Let’s start by looking at the commonalities (note: logical commonalities, not applicable to all scenarios), and then take a few typical implementations to see how programmers perform on the planet.
These abstract concepts
The object of these functions is said to be something called a stream. What exactly is flow? Please forgive me for my unprofessional explanation.
Whether at the language level or distributed data structure, it is actually a simple array. Sometimes it’s really a simple array, sometimes it’s a distributed array across multiple machines. For the rest of this article, we refer to them collectively as array streams.
We simply fall into two categories.
Language: For example, Java Stream distributed: for example, Spark RDDCopy the code
They all have the following important points.
Functions can be arguments
C of course is fine, you can pass in functions as Pointers. But not long ago, in Java, this was a roundabout way to do it (use the Java concept of Class to simulate functions, and you’ll see lots of weird Java classes Func1, Func0).
Function arguments are a necessary condition for making code concise. Our usual approach to programming is to do things sequentially.
array = new Array()
array = func1(array)
if(func2(array)){
array = func3(array)
}
array = func4(array)
Copy the code
If the function can be arguments, I can tile the operation as much as possible. Ultimately, the above statement is translated into execution.
array = new Array()
array.stream()
.map(func1)
.filter(func2)
.flatMap(func3)
.sorted(func4)
...
Copy the code
The programming model has completely changed, and the functions have semantics.
sequential & parallel
If our array stream is too large, for a single machine, there are both sequential processing and parallel processing.
In general, parallel processing mode can be entered through the parallel function. For most local operations, parallelism is not necessarily faster. ForkJoin is used in Java, the speed of threads, you know…
For distributed data streams, which are inherently parallel, this parameter makes little sense.
The function type
There are generally two types of functions that operate on a data stream.
Conversion. The Transformation actions. ActionCopy the code
Transformations, typically characterized by lazy. Only when the action is executed does it participate in the operation. So, you can think of these transitions as a set of buffered actions. Typical functions are map and flatMap. They’re strung together like kebabs, waiting to be wanked.
The action. The sequence of transformations that actually trigger the code to run will flow like a flood of floodgates. A typical reduce function is this.
The above description is not always true, for example with Python maps, which can be executed and output. It’s very embarrassing.
map & reduce
When it comes to Map and Reduce, hadoop comes to mind. However, it’s not just a concept in big data.
The following two lines describe the two concepts.
map
The passed function is applied to each element of the sequence in turn, and the result is returned as a new array stream.
reduce
Reduce is similar to a recursive concept. It will eventually reduce to a value. Look at this formula 🙂
reduce([p1,p2,p3,p4],fn) = reduce([fn(p2,p4),fn(p1,p3)])
Copy the code
For details, look at Google’s classic papers.
MapReduce: Simplified Data Processing on Large Clusters AI.Google/Research/PU…
Can you access it? 🙂
map & flatMap
These two functions are often used. They have the following differences:
map
Each value in the array stream is executed using the provided function, one by one. Get an array stream with the same number of elements.
flatMap
Flat means flat. It executes each value in the array stream using the provided function, one to one. Get an array stream with the same elements. Except that the element inside is also a subarray stream. When you combine these subarrays into one array, the number of elements will most likely differ from the number of streams in the original array.
Programmers performing
Java8 kind of Stream
As java8 began, a new abstraction was added, something called a Stream: Stream. With lambda syntax, you can make your code extremely clean and clean (if you haven’t noticed that it’s almost Scala).
A very good guide: stackify.com/streams-gui…
RDD operation of Spark
The core data model of Spark is RDD, which is a directed acyclic graph. It represents an immutable, partitioned set whose elements can be computed in parallel. It’s distributed, but we can look at the next example of WordCount.
JavaRDD<String> textFile = sc.textFile("hdfs://...");
JavaPairRDD<String, Integer> counts = textFile
.flatMap(s -> Arrays.asList(s.split("")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1))
.reduceByKey((a, b) -> a + b);
counts.saveAsTextFile("hdfs://...");
Copy the code
What a familiar Api that you must have seen in Hadoop.
The Flink DataStream
Flink programs are routine programs for performing distributed set transformations (e.g., Filtering, Mapping, Updating state, joining, Grouping, defining Windows, Aggregating). DataStream in Flink is a transformation on a DataStream.
Let’s also look at a piece of its code.
DataStream<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
Copy the code
Kafka Stream operation
Kafka has become a distributed streaming computing platform. He abstracts a KStream and a KTable, similar to Spark’s RDD, with similar operations.
KStream can be thought of as a changlog for KTable, where each record in the data stream corresponds to each update in the database.
Let’s take a look at a piece of code.
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, value) -> value)
.count();
wordCounts.toStream().to("streams-wordcount-output", Produced.with(stringSerde, longSerde));
Copy the code
RxJava
RxJava is an observer pattern-based asynchronous task framework that is often seen in Android development (and increasingly on the server side).
RxJava has made some innovations at the language level and has a few true believers.
Lambda at the language level
Of course, Haskell, a functional programming language by nature, has a halo. But other languages, including scripting and compiled languages, have also absorbed these lessons.
Collectively, they are called lambda.
Python
Python, the most popular scripting language, also has its lambda syntax. Basic functions such as Map, Reduce, and filter also exist.
JavaScript
You can’t drop js, such as array.prototype.*(). It has everything it needs.
End
And there are many, many more, not a list. So, are these functions patentable? I like it very much, though I seldom use it.