Array.prototype.sort sort principle

Sort (array.prototype.sort ()); array.prototype.sort ();

arr.sort([compareFunction]);
Copy the code

The sort method takes an optional sort callback, compareFunction, which has two arguments for comparison. If omitted, the element is sorted by the Unicode loci of each character in the converted string. The whole sort method returns the sorted array, but the original array has been sorted in place.

Sort () : array.prototype.sort () : array.prototype.sort () : array.prototype.sort () : array.prototype.sort () : array.prototype.sort ()

ECMA262

ECMA262 defines the size relation as follows:

Size relation comparison text

Comparing x < y (where x and y are values) yields true, false, or undefined (indicating that at least one operand is a NaN). In addition to x and y, the algorithm takes a Boolean flag named LeftFirst as a parameter. This flag is used to control the order in which operations with potential side effects are performed on X and Y. This is necessary because ECMAScript specifies left-to-right evaluation of expressions. The default value of LeftFirst is true, indicating that the expression is evaluated first at the x parameter position. If LeftFirst is false, the opposite is true and the operation must now be performed on y. The comparison is performed as follows:

ifLeftFirstMark istrue,
1. letpx 为 ? ToPrimitive(x, number).
2. letpy 为 ? ToPrimitive(y, number).
Otherwise,
1. letpy 为 ? ToPrimitive(y, number).
2. letpx 为 ? ToPrimitive(x, number).
ifType(px) 为 StringandType(py) 为 String,
1. ifIsStringPrefix(py, px) 为 trueTo return tofalse.
2. ifIsStringPrefix(px, py) 为 trueTo return totrue.
3. makekIs the smallest non-negative integer that can makepx 上 kThe code unit at the index andpy 上 kThe code unit at the index is different.
4. letm 为 px 上 kThe code unit integer value at the index.
5. letn 为 py 上 kThe code unit integer value at the index.
6. ifm < nTo return totrueOtherwise, returnfalse.
Otherwise,
1. ifType(px) 为 BigIntandType(py) 为 String,
  1. letny 为 ! StringToBigInt(py).
  2. ifny 为 NaNTo return toundefined.
  3. returnBigInt::lessThan(px, ny).
2. ifType(px) 为 StringandType(py) 为 BigInt,
  1. letnx 为 ! StringToBigInt(px).
  2. ifnx 为 NaNTo return toundefined.
  3. returnBigInt::lessThan(nx, py).
3. letnx 为 ! ToNumeric(px).
4. letny 为 ! ToNumeric(py).
5. ifType(nx) 与 Type(ny)Same, returnType(nx)::lessThan(nx, ny).
6. Assertion:Type(nx) 为 BigIntandType(ny) 为 NumberOr,Type(nx) 为 NumberandType(ny) 为 BigInt.
7. ifnx 或 ny 为 NaNTo return toundefined.
8. ifnx 为 - up 𝔽 或 ny 为 + up 𝔽To return totrue.
9. ifnx 为 + up 𝔽 或 ny 为 - up 𝔽To return tofalse.
10. ifℝ (nx) < ℝ (ny)To return totrue; Otherwise returnsfalse.

Note: String comparisons use a simple lexicographical order of code unit values. There is no attempt to use the more complex, semantic-oriented definition of character or string equality and collation order defined in the Unicode specification. Therefore, string values that are equal according to the Unicode standard specification may be tested as unequal. In effect, the algorithm assumes that both strings are already in normalized form. Also note that for strings containing supplementary characters, the lexicographical order of the UTF-16 code unit value sequence is different from the lexicographical order of the code point value sequence.

Summary of size relation comparison

The steps above x < y are briefly summarized as follows:

ifx 和 yforStringThe type,
1. ifx 是 y 或 y 是 xThe prefix of, the short one is small
2. Otherwise the comparisonx 和 y 的 Code unitsThe size of the value
Otherwise,x 和 yAre converted todigitalType and then compare the size if it occurs during conversionNaN, it returnsundefined(But it isOther chaptersAs mentioned in theundefinedintofalse)

Note: the encoding format of JS is UTF-16, so all characters are represented by 16-bit binaries. When the code point is too large, such as some Chinese characters and various emoji characters, two 16-bit binaries are needed to represent, that is, two code units are used to represent. (Specific learning can be seen in Teacher Ruan’s blog).

With an understanding of the standards given by ECMA262, you can understand the size comparison in the following code:

/** * the prefix */ is first compared between strings
'lin' < 'linjingyi'; // if x is a prefix of y, true (3.2)
'linjingyi' < 'lin'; // y is a prefix of x, false (3.1)

/** * compare code units */ between strings
'm' < 'n'; // Code unit value, 006d < 006e is true (3.6, k is 0)
'm' < 'l'; // Code unit value, 006d < 006c is false (3.6, k is 0)
'you' < '我'; // code unit value, 4f60 < 6211 is true (3.6, k is 0)
'ml' < 'mn'; // Code unit value, 006d 006c < 006d 006e is true (3.6, k is 1)
'hello' < 'I'; // code unit value, 4f60 597d < 6211 597d is true (3.6, k is 1)
'hello' < '我'; 4f60 597d < 6211 = true (3.6, k = 1)
'10' < '2'; // This is also a string comparison, so it is also a code unit value comparison, 0031 0030 < 0032 is true (3.6, k is 0)
'🍎' < '😙'; D83c df4e < d83D de19 = true (3.6, k = 0)
'😍' < '😙'; D83d de0d < d83d de19 = true (3.6, k = 1)

/** * string and number comparison first type conversion */
'10' < 2; // 10 < 2 is false (4.1.3)
'm' < 2 // m is converted to the number NaN, NaN < 2 is false (4.1.2)


* undefined -> NaN * NULL -> 0 * true -> 1 * false -> 0 * String -> Corresponding number or NaN * Object -> call toString toString -> corresponding number or NaN */
null < 1 // 0 < 1, true (4.5)
false < 1 // 0 < 1, true (4.5)
undefined < 1 // NaN < 1, false (4.7)
true < 2 // 1 < 2, true (4.5)
true < '2' // 1 < 2, true (4.5)
null < true // 0 < 1, true (4.5)
({}) < ([]) // '[object object]' < ', false (4.5)
({}) < ([1.2.3]) // '[object object]' < '1,2,3', false (4.5)
({}) < (['hello'.2.3]) // '[object object]' < 'hello, 2,3' equivalent to '[' < 'you' equivalent to 005b < 4f60, true (4.5)
Copy the code

Now that you understand how the size comparison works, you can move on. The array.prototype.sort () section is also found in the ECMA262 standard. The prototype.sort() section is found in the ECMA262 standard.

Array.prototype.sort(comparefn)

Sort elements in an array. Sorting must be stable (that is, equal elements must retain their original order). If comparefn is not undefined, it should be a function that takes two arguments x and y. The function returns a negative number if x < y, a positive number if x > y, and 0 otherwise.

When the sort method is called, the following steps are performed:

ifcomparefnDon’t forundefinedAnd,IsCallable(comparefn)returnfalseThrown,TypeErrorError.
letobj 为 ? ToObject(this value).
letlen 为 ? LengthOfArrayLike(obj).
letitemsAs a new emptyList.
letk 为 0.
Cycle, whilek < len.
1. letPk 为 ! The ToString (𝔽 (k)).
2. letkPresent 为 ? HasProperty(obj, Pk).
3. ifkPresent 为 true,
  1. letkValue 为 ? Get(obj, Pk).
  2. addkValue 到 itemsIn the.
4. setk 为 k + 1.
letitemCount 为 itemsThe number of elements of.
By the specificThe engine implementation decides for itselfSort method and callSortCompareCome to the rightitemsSort. If the call returns oneAbnormal complete, stops further callsSortCompareOr stop the next step in the algorithm and return to completion.
letj 为 0.
Cycle, whilej < itemCount.
1. perform? Set(obj, ! The ToString (𝔽 (j)), the items [j], true).
2. setj 为 j + 1.
Cycle, whilej < len.
1. perform? DeletePropertyOrThrow(obj, ! The ToString (𝔽 (j))).
2. setj 为 j + 1.
returnobj.

Sort order refers to the order in which the obj attribute values whose integer index is less than len are sorted. The result of sort is determined as follows:

If any of the following conditions are true, the sort order is determined by the particular engine implementation itself:

If comparefn is not undefined and is not a consistent comparison function for the Items element (see below).
If comparefn is undefined and SortCompare is not a consistent comparison function.
If compareFn is undefined and all applications of ToString (passing any particular value argument to SortCompare) do not produce the same result.

Unless the sorting order is specified above as determined by the engine implementation, items must satisfy all of the following conditions after performing step 8 of the algorithm above:

There has to be some less thanitemCountA mathematical arrangement of the nonnegative integers ofPI., in the permutation, for each less thanitemCountIs a non-negative integerjElements,old[j] 与 New [PI (j)]The same.
And then, for all less thanitemCountIs a non-negative integerj and kIf theSortCompare(old[j], old[k]) < 0,"PI PI (j) (k).

Where old[j] refers to items before step 8 and new[j] refers to items after Step 8.

The comparefn function is a consistent comparison function, which is defined as follows:

Specify that a, b, and c are all values in set S (values may be equal), a

CF B Indicates compareFN (a, b) > 0.

Pass a specific pair of valuesa 和 bTo invoke thecomparefn(a, b)Always return the same valuev. In addition,Type(v) 为 NumberAnd,vDon’t forNaN. Note that this also means that for a given pair of valuesa 和 bOn the conditiona <CF b.a =CF b 和 a >CF bOne and only one of them must be true.
callcomparefn(a, b)Not modifyobj 或 objAny object on the prototype chain.
a =CF a(Reflexivity)
ifa =CF b,b =CF a(Symmetry)
ifa =CF b 且 b =CF c,a =CF c (=CFTransitivity of
ifa <CF b 且 b <CF c,a <CF c (<CFTransitivity of
ifa >CF b 且 b >CF c,a >CF c (>CFTransitivity of

NOTE 1 A sufficient and necessary condition for the above condition is that CompareFN divides the set S into equivalence classes and ensures that these equivalence classes are completely ordered.

NOTE 2 The sort function is intentionally generic; It doesn’t need its this value to be an Array object. Therefore, it can be moved as a method to other types of objects.

SortCompare(x, y)

The abstract operation SortCompare takes arguments x and y. It also has access to the comparefn argument that the sort method is currently calling. When called, it performs the following steps:

ifx 和 yforundefinedTo return to+ 0 𝔽.
ifx 为 undefinedTo return to1 𝔽.
ify 为 undefinedTo return to1 𝔽.
ifcomparefnDon’t forundefined,
1. letv 为 ? ToNumber(? Call(comparefn, undefined, « x, y »).
2. ifv 为 NaNTo return to+ 0 𝔽.
3. returnv.
letxString 为 ? ToString(x).
letyString 为 ? ToString(y).
letxSmallerTo carry outAbstract Relational Comparison xString < yStringResults.
ifxSmaller 为 trueTo return to1 𝔽.
letySmallerTo carry outAbstract Relational Comparison yString < xStringResults.
ifySmaller 为 trueTo return to1 𝔽.
return+ 0 𝔽.

NOTE 1 Because non-existent attributes are always greater than undefined and undefined is always greater than any other value, undefined is always at the end of the result when sorting, followed by non-existent attributes.

NOTE 2 the method calls performed by the ToString abstract operation in Steps 5 and 6 May cause SortCompare not to behave as a consistent comparison function.

Summary of sort specification

The text of the ECMA262 specification is complex, but the two sections as a whole highlight the following points:

SortCompare(x, y) SortCompare(x, y)

If onlyx 为 undefinedTo return to1; If onlyy 为 undefinedTo return to- 1; If isundefinedTo return to+ 0(No callback function is executed at allcomparefn)
If a callback function is passed incomparefnExecutes the callback function and returns after digitization (ifNaNreturn+ 0)
If no callback function is returnedcomparefn,x 和 yStringing, and then performing interstringSize comparisonIf thexLittle return1If theyLittle return- 1, equal returns+ 0

Array.prototype.sort(comparefn)

sortWill modify the original array
sortTo return the modified array
sortThe choice of sorting algorithm isJSThe engine takes care of itself
sortFunctions are generic and can be used in other array-like objects
comparefnIt has to be a uniform comparison function, reflexive, symmetric and transitive

If the value is undefined, the comparison will not be performed at all and the single comparison will be returned, and all undefined will be placed last after sorting.

V8 engine implementation

Since ECMA262 provides the engine with a choice when implementing sort, let’s dive into the source code and see how the most used V8 implements it.

V8 engine is now changed to Google research tQ language to achieve, sort page fully 1400 lines, the implementation of the core algorithm is timsort sorting method. Tim Sort, the fastest sort Used in V8 and Python

Tim sort, the fastest sort used in v8 and Python

Tim Sort was created in 2002 by Tim Peters, one of Python’s major contributors, to improve Python list sorting performance. It is a combination of algorithms implemented by analyzing real-world data. The best time complexity O(n) and the worst time complexity O(nlogn) are much better than the worst time complexity O(n²) of fast sorting.

Binary insertion sort and merge sort

In the implementation, binary insertion sort is used if the amount of data is small. The following figure shows the insertion sort and binary insertion sort:

When there is a large amount of data, the idea of merge sort is used to split the long list into smaller runs and then merge them as shown in the following figure:

Divided into runs

In the traditional merging algorithm, two data are usually divided into a set of RUN. However, Tim found in the experiment that properly increasing the initial size of run would lead to significant performance improvement, but each group should not be too large, and it is appropriate to set the group at about 32 ~ 64.

At the same time, he found that in real-world data, there are many pieces of data that are sorted in ascending or descending order, requiring no additional sorting. So he decided to divide runs with a “natural” size — the size of the initial runs is not a fixed number, but a minimum length requirement — “min run”.

Here are the guidelines:

By comparing the first twoitemSet up adescending flagIf there is only oneitemIs set tofalse.
Remainder of the cycleitemsAnd check if they are still inascending 或 Strictly descendingAlign until reversed.
Now we have oneascending 或 Strictly descendingArrangement ofrun. ifrunIf it is strictly descending, reverse it.strictIn order to maintain the stability of the algorithm.
And then check thisrunIs the length ofmin run“. If not, the rest is compensateditemsTo achieve the minimum length and perform binary insertion sort starting from the compensation item.

Here is the pseudo-code for the JS implementation:

let nremaining = list.length;
const runs = [];
const minrun = merge_compute_minrun(nremaining);

function count_run(list) {
  let descending = false;
  let n = 1;

  if (list.length === 1) return n;
  if (list[n] < list[n - 1]) {
    descending = true;
  }
  n++;
  while (n < list.length && (list[n] < list[n - 1) | |! desending)) { n++; }return {
    size: n,
    descending,
  };
}

do {
  const startIndex = list.length - nremaining;
  let { size, descending } = count_run();
  if (descending)
    reverse(
      list,
      // start index
      startIndex,
      // length to revert
      size
    );
  if (n < minrun) {
    const force = nremaining <= minrun ? nremaining : minrun;
    binarysort(
      list,
      // sort start index
      startIndex,
      // sort length
      force,
      // index to start binary search
      list.length - nremaining + size
    );
    size = force;
  }
  nremaining -= n;
  runs.push({ startIndex, size });
} while (nremaining);
Copy the code

To make the merge operation more balanced, it is better to have a power of 2 runs. So Tim keeps dividing the total length by 2 until the value is less than 64 and rounding up or down to get min run:

function merge_compute_minrun(n) {
  var r = 0; /* becomes 1 if any 1 bits are shifted off */
  while (n >= 64) {
    r |= n & 1;
    n >>= 1;
  }
  return n + r;
}
Copy the code

Balanced merge

The classic merge sort simply merges two neighboring runs. Because you need to compare the size of the elements in two ordered arrays one by one, and then store the combined list in memory, it takes up quite a bit of memory. Tim found that there are ways to deal with this unnecessary memory waste.

In order to achieve this goal, the first question that needs to be solved is which two runs of the following have the best combined performance?

Suppose there are three runs:

A: 1,000 elements
B: 2,000 elements
C: 1000 elements

Obviously merging A and C first would be more balanced. However, there is A problem. If there is A same element in A, B and C, the order of the elements will be changed when A and C are combined first, and then the order of the elements will be changed and the stability of the algorithm will be affected.

Therefore, Tim sort still only merges two neighboring runs, but in a more subtle way:

Compare the length of 3 adjacent runs. If they meet the following rules:

A > B + C
B > C
Copy the code

That means that B and C are relatively small runs. So merge the smaller runs first, then get closer to the length of A, and continue merging.

If there are still a few runs left after the above, merge in the normal order.

Merge using gallop mode

When you try to merge large runs, it can take a long time to find the right placement because they are all ordered. Such as:

A: [0, 2, 4, 6, 8, 10, …, 100]
B: [57, 77, 97, …]

Take B-57 and A-0, 57 will compare 58/2 = 29 times to find the correct position. And then the next one, B minus 77, I have to compare 78 minus 58 over 2 is equal to 10 times.

Not only is the comparison itself meaningless, it also wastes a lot of extra memory. So Tim Sort uses Gallop to speed up the process. While the complexity is the same as binary search, Gallop is better at finding data not far from where the search started.

It first selects a location (not necessarily a leading or trailing location) to start the search, and then selects the search direction based on the size of the starting element compared to the target data. If one search fails, the next time it tries a further location by multiplying by two.

For example, if the search starts from A[7] and A[7] is smaller than the target data, the next step is to compare A[9], then A[11], A[15], A[23], A[35]… Until an element greater than or equal to the target data is found. Assuming that A[35] is large, it is known that the target data must be somewhere between A[23] and A[35].

Once you have this rough range, then switch to binary search to find the exact location.

Here is the pseudocode:

function gallop_left(
  key, // the data to merge
  a, // the run to merge to
  startIndex
) {
  let lastofs = 0; // starting point for binary search
  let ofs = 1; // end point for binary search
  const n = a.length;
  if (a[hint] < key) {
    // compare from left to right
    const maxofs = n - hint;
    while (ofs < maxofs && a[ofs] < key) {
      lastofs = ofs;
      ofs = (ofs << 1) + 1;
    }
    if (ofs > maxofs) ofs = maxofs;
    lastofs += hint;
    ofs += hint;
  } else {
    // compare from right to left
    const maxofs = hint + 1;
    while (ofs < maxofs && a[hint - ofs] < key) {
      lastofs = ofs;
      ofs = (ofs << 1) + 1;
    }
    if (ofs > maxofs) ofs = maxofs;
    let tmp = lastofs;
    lastofs = hint - ofs;
    ofs = hint - tmp;
  }

  // do binary search at the end
  ++lastofs;
  while (lastofs < ofs) {
    let m = lastofs + ((ofs - lastofs) >> 1);
    if (a[m] < key) lastofs = m + 1;
    else ofs = m;
  }
  return ofs; // return the found index
}
Copy the code

When to use gallop mode

When you start merging, there may be a large gap between two runs. For example, there are still two arrays:

A: [0, 2, 4, 6, 8, 10, …, 100]
B: [57, 77, 97, …]

So here we can use gallop_left(b[0], a, 0) (starting somewhere on the left) to quickly locate where B[0] should be in A; Then use gallop_right(a[A.length-1], B, B.Length-1) (starting somewhere on the right) to locate where a[0] should be in B. This allows you to shorten the size of the data you need to compare before you actually need to compare and merge.

Gallop_left and gallop_right are implemented basically the same, with some different handling of the boundary cases.

Here is the pseudocode:

function merge(ms /* MergeState */, a, b) {
  let k = gallop_right(b[0], a, 0);
  if (k < 0) throw -1;
  let merged = a.splice(0, k);
  if (a.length == 0) return merged.concat(b);
  else merged.push(b.shift());

  let nb = gallop_left(a[a.length - 1], b, b.length - 1);
  if (nb <= 0) throw nb;
  const lastPart = b.splice(nb);
  lastPart.unshift(a.pop());

  if (a.length <= nb) merged = merged.concat(merge_lo(ms, a, b));
  else merged = merged.concat(merge_hi(ms, a, b));
  return merged.concat(lastPart);
}
Copy the code

Gallop mode is great, but only efficient when there is a large gap in the length of runs, otherwise it only degrades performance. So Tim Sort introduced some rules for using the Gallop pattern.

In Python, Gallop mode is only entered when a data is compared seven times and still not found where it should be placed. The more interesting thing is that if you find that after merging runs you continue into Gallop mode, the “7 times” constraint becomes smaller. If Gallop mode is not triggered, then the constraints are reset.

Here is the pseudocode:

const MIN_GALLOP = 7;

function merge_lo(
  ms, // MergeState
  a, // the shorter run
  b // the longer run
) {
      let min_gallop = ms.min_gallop;
    let merged = [];

    for (;;) {
        let acount = 0;          /* # of times A won in a row */
        let bcount = 0;          /* # of times B won in a row */

        // do sequential comparing first, util it fulfils the galloping constrain
        do {
            if (b[0] < a[0]) {
                merged.push(b.shift());
                ++bcount;
                acount = 0;
                if (b.length == 0) return merged.concat(a);
            } else {
                merged.push(a.shift());
                ++acount;
                bcount = 0;
                if (a.length == 0) returnmerged.concat(b); }}while ((acount || bcount) >= min_gallop))

        // galloping and merging
        ++min_gallop;
        do {
            ms.min_gallop = min_gallop -= min_gallop > 1;
            let k = gallop_right(b[0], a, 0);
            acount = k;
            if (k) {
                if (k < 0) return -1;
                merged = merged.concat(a.splice(0, k));
                if (a.length == 0) return merged.concat(b);
            }
            merged.push(b.shift());
            if (b.length == 0) return merged.concat(a);

            k = gallop_left(a[0], b, 0);
            bcount = k;
            if (k) {
                if (k < 0) return -1;
                merged = merged.concat(b.splice(0, k));
                if (b.length == 0) return merged.concat(a);
            }
              merged.push(a.shift());
            if (a.length == 0) return merged.concat(b);
        } while (acount >= MIN_GALLOP || bcount >= MIN_GALLOP);
        ms.min_gallop = ++min_gallop;
    }
    return merged;
 }
Copy the code

There’s actually a function called merge_hi. Because it is very similar to merge_lo, skip it.

conclusion

Array.prototype.sort () array.prototype.sort () array.prototype.sort ()

ECMA262In the specificationArray.prototype.sortWhat is the logic of? Start by defining a size comparison standardSortComparetheSortCompareReflexivity, symmetry and transitivity; To define thesortNeed to modify the original array and return the modified array; The final definition is up to the engine developer to decide which sorting algorithm to implement.
Array.prototype.sortWhat’s the sorting method? Not any of the usual sorting algorithms, but learning python’s sorting implementation, using TimTim sortAlgorithms that bring together various optimizations.
What’s the complexity? The best time complexity isO(n), average time complexityO(nlogn), the worst time complexity isO(nlogn), more than the worst time complexity of the most commonly used quicksortO (n squared)To be better; The space complexity isO(n)And for stable sorting algorithm.

Front-end notepad, not regularly updated, welcome to pay attention to!

Wechat official account: Lin Jingyi’s notepad
Blog: Lin Jingyi’s notebook
Nuggets column: Lin Jingyi’s notebook
Zhihu column: Lin Jingyi’s notebook
Github: MageeLin

ECMA262

Size relation comparison text

Summary of size relation comparison

Array.prototype.sort(comparefn)

SortCompare(x, y)

Summary of sort specification

V8 engine implementation

Tim sort, the fastest sort used in v8 and Python

Binary insertion sort and merge sort

Divided into runs

Balanced merge

Merge using gallop mode

When to use gallop mode

conclusion

Related Posts

Promise – Rip up the code by hand and understand promise once and for all

Comparison of core functions between BabylonJs and three.js

ES6 understanding (Updated ~)