Data Structures for Beginners: Arrays, HashMaps, and Lists

Data structures beginners should know: Array, HashMap, and List

Proofreading: Volong

It is recommended that students who are not familiar with data structures read this article slowly. I hope it will help you.

When developing programs, we (usually) need to store data in memory. Depending on how you operate on the data, different data structures may be selected. There are many commonly used data structures, such as Array, Map, Set, List, Tree, Graph, and so on. Choosing the right data structure for your program may not be easy. So hopefully this article has helped you understand how they behave so that you can use them properly in your work.

This article focuses on linear data structures such as arrays, Sets, lists, Sets, Stacks, Queues, and so on.


This article is part of the following tutorial (if you find it interesting, I will translate the whole series) :

Data Structures and Algorithms (DSAS) beginners should Know

  1. The time complexity of the algorithm and big O notation
  2. Eight Time complexities Every programmer should Know
  3. Data structures beginners should know: Array, HashMap, and List 👈
  4. Data structure beginners should know: Graph
  5. Data structures beginners should know: Tree (stay tuned)
  6. Appendix I: Analysis of recursive algorithms

The time complexity of the data structure

The following table summarizes what has been discussed in this article.

Bookmark, bookmark, or share this article in case you need it.

* = Runtime allocation

The data structure insert access To find the delete note
Array O(n) O(1) O(n) O(n) Insert last position complexity isO(1).
(Hash)Map O(1)* O(1)* O(1)* O(1)* Recalculating the hash affects the insert time.
Map O(log(n)) O(log(n)) O(log(n)) Through binary search tree implementation
Set(Using HashMap) The O (1) * O(1)* O(1)* By the HashMap implementation
Set(use the List) O(n) O(n)] O(n) Implement with List
Set(Using binary search trees) O(log(n)) O(log(n)) O(log(n)) Through binary search tree implementation
Linked List(one-way) O(n) O(n) O(n) Add or remove elements at the start with complexity ofO(1)
Linked List(two-way) O(n) O(n) O(n) Add or remove elements at the beginning or end with complexity ofO(1). And in the middle isO(n).
Stack(Implemented by Array) O(1) O(1)] Insertions and deletions follow last in First out (LIFO)
Queue(Implemented simply by Array) O(n) O(1) The complexity of the insert (array.shift) operation isO(n)
Queue(Implemented by Array, but improved) O(1)* O(1) The worst-case complexity of the insert operation isO(n). But the apportionment isO(1)
Queue(implemented by List) O(1) O(1) Use two-way linked lists

Note: Binary search trees and other tree structures, graph structures, will be discussed in another article.

Raw data type

Primitive data types are the most basic elements of a data structure. Here are some of the original primitive data types:

  • Integers such as 1, 2, 3…
  • Characters such as a, b, “1”, “*”
  • Boolean values, true and false.
  • Floating-point numbers, such as 3.14159, 1483e-2.

Array

An array may consist of zero or more elements. Arrays are one of the most commonly used data structures because of their ease of use and superior retrieval performance.

You can think of an array as a drawer in which you can store data.

Arrays are like drawers that store things in boxes

Array is like a drawer that stores things on bins

When you want to find an element, you can simply open the box with the corresponding number (time complexity O(1)). However, if you forget what’s in the boxes, you have to open them one by one (O(n)) until you find what you need. The same is true of arrays.

There are some differences between arrays depending on the programming language. For dynamic languages like JavaScript and Ruby, arrays can contain different data types: numbers, strings, objects, and even functions. In strongly typed languages like Java, C, and C ++, you must determine the length and data type of an array before using it. JavaScript will automatically increase the length of the array as needed.

Built-in methods for arrays

Arrays (methods) are implemented slightly differently depending on the programming prologue.

In JavaScript, for example, we can use unshift and push to add elements to the top or bottom of an array, and shift and pop to remove the first or last element of an array. Let’s define some of the common array methods used in this article.

Common JS array built-in functions

function The complexity of the describe
array.push(element1 [,… [, elementN]]) O(1) Adds one or more elements to the end of an array
array.pop(a) O(1) Removes the element at the end of the array
array.shift(a) O(n) Removes the beginning element of the array
array.unshift(element1 [,… [, elementN]]) O(n) Adds one or more elements to the beginning of an array
array.slice([beginning[, end]]) O(n) Returns a shallow copy of the original array frombeginningend(not includingend) part of a new array
array.splice(start [deleteCount [, item1 [,…]]]) O(n) Alter (insert or delete) an array

Insert elements into an array

There are many ways to insert elements into an array. You can add new data to the end of the array or to the beginning of the array.

First look at how to add to the end:

function insertToTail(array, element) {
  array.push(element);
  returnarray; } const array = [1, 2, 3]; console.log(insertToTail(array, 4)); // => [1, 2, 3, 4]Copy the code

According to the specification, the push operation simply adds a new element to the end of the array. As a result,

The time complexity of Array. Push is O(1).

Now look at the example added to the beginning:

function insertToHead(array, element) {
  array.unshift(element);
  returnarray; } const array = [1, 2, 3]; console.log(insertToHead(array, 0)); // => [0, 1, 2, 3,]Copy the code

What do you think the time complexity of insertToHead is? It looks pretty much the same, except that the method is called unshift instead of push. But there’s a problem with that. Unshift is a way of moving each item in the array to the next, freeing up space for the first item to accommodate the newly added element. So it goes through the array once.

The time complexity of array. unshift is O(n).

Access the elements in an array

If you know the index of the element in the array, you can access the element directly by:

function access(array, index) {
  return array[index];
}

const array = [1, 'word', 3.14, { a: 1 }];
access(array, 0);// => 1
access(array, 3);// => {a: 1}
Copy the code

As you can see in the code above, the time to access the elements in the array is constant:

The time to access the elements in the array is O(1).

Note: The time it takes to change the array value by index is also constant.

Look for elements in an array

If you want to find an element and you don’t know the index, you have to go through every element in the array until you find it.

function search(array, element) {
  for (let index = 0;
       index < array.length;
       index++) {
    if (element === array[index]) {
      return index;
    }
  }
}

const array = [1, 'word', 3.14, {a: 1}]; console.log(search(array,'word')); // => 1 console.log(search(array, 3.14)); / / = > 2Copy the code

Since the for loop is used, then:

The time it takes to find elements in an array is order n.

Delete an element from an array

What do you think is the time complexity of removing an element from an array?

Consider these two scenarios:

  1. The time required to remove an element from the end of the array is constant, i.eO(1).
  2. However, whether you remove an element from the beginning or middle of the array, you need to adjust the position of the element. So the complexity isO(n).

Talk is cheap, let’s do the code!

function remove(array, element) {
  const index = search(array, element);
  array.splice(index, 1);
  returnarray; } const array1 = [0, 1, 2, 3]; console.log(remove(array1, 1)); // => [0, 2, 3]Copy the code

We use the search function defined above to find the index of the element in order n. Then use the JS built-in splice method, which is also O(n) in complexity. Isn’t the total time complexity O(2n)? Remember, we don’t care about constants.

For the two scenarios listed above, consider the worst case:

It takes O(n) to remove an element from an array.

The time complexity of array methods

The following table summarizes the time complexity of arrays (methods) :

The time complexity of array methods

Operation method The worst
Access (Array.[]) O(1)
Add a new element to the beginning (Array.unshift) O(n)
Add a new element to the end (Array.push) O(1)
Lookup (by value, not by index) O(n)
Delete (Array.splice) O(n)

HashMaps

HashMap has many names, such as HashTableHashMap, Map, Dictionary, Associative Array, etc. Conceptually they are all the same, but the implementation is slightly different.

A hash table is a data structure that maps keys to values.

Think back to the drawer metaphor. Boxes now have labels and are no longer in numerical order.

A HashMap also stores things like a drawer, with different markings to distinguish between boxes.

HashMap is like a drawer that stores things on bins and label them

In this case, if you’re looking for a toy, you don’t have to open the first box, the second box, and the third box in turn to see if the toy is inside. Open the box labeled “toy” directly. This is a huge improvement. The time required to find elements is reduced from O(n) to O(1).

The number is the index of the array, and the identity serves as the key by which the HashMap stores the data. HashMap internally converts keys (that is, identities) into indexes through hash functions.

There are at least two ways to implement a HashMap:

  1. Array: Maps the key to the index of the array using a hash function. Worst case: O(n), average: O(1).
  2. Binary search tree: Find values using self-balancing binary search trees (more on this in another article). (Search) Worst case:O(log n), the average:O(log n).

We’ll talk about trees and binary search trees, but don’t worry too much about that for now. The most common way to implement a Map is to use array and hash conversion functions. So let’s do that

Implement a HashMap through an array

HashMap: hash function translates keys into bucket (array) indexes

As shown in the figure above, each key is converted to a hash code. Since the size of the array is finite (10 in this case), we must use the modulo function to find the corresponding bucket and loop through the bucket. In each bucket, we store a set of key pairs. If more than one key pair is stored in the bucket, we store them as a collection.

As we walk through the components of a HashMap, let’s start with the hash function.

The hash function

The first step in implementing a HashMap is to write out a hash function. This function maps the key to the corresponding (index) value.

A perfect hash function maps to a different index for each different key.

With the help of the ideal hash function, access and search can be achieved in constant time. However, a perfect hash function is difficult to achieve in practice. You will most likely encounter situations where two different keys are mapped to the same index, i.e. conflicts.

When using data structures such as arrays as a HashMap implementation, conflicts are unavoidable. Therefore, one way to resolve conflicts is to store multiple values in the same bucket. When we try to access the value corresponding to a key, if we find multiple sets of key-value pairs in the corresponding bucket, we need to traverse them (to find the value corresponding to the key) with a time complexity of O(n). However, in most implementations, HashMap dynamically adjusts the length of arrays to avoid too many collisions. So we can say that the apportioned search time is order one. In this article, we will use an example to show what contribution means.

A simple implementation of HashMap

A simple (but bad) hash function could look like this:

class NaiveHashMap {

  constructor(initialCapacity = 2) {
    this.buckets = new Array(initialCapacity);
  }

  set(key, value) {
    const index = this.getIndex(key);
    this.buckets[index] = value;
  }

  get(key) {
    const index = this.getIndex(key);
    return this.buckets[index];
  }

  hash(key) {
    return key.toString().length;
  }

  getIndex(key) {
    const indexHash = this.hash(key);
    const index = indexHash % this.buckets.length;
    returnindex; }}Copy the code

The complete code

We use buckets instead of drawers and boxes. You get the idea

The initial capacity of a HashMap is 2 (two buckets). When we store more than one element in it, we compute the number of buckets that the key should be stored in by modulo % (, and store the data in that bucket).

Note line 18 (return key.tostring ().length;) . We’ll talk a little bit about that later. Let’s use this new HashMap first.

// Usage:
const assert = require('assert');
const hashMap = new NaiveHashMap();
hashMap.set('cat', 2);
hashMap.set('rat', 7);
hashMap.set('dog', 1);
hashMap.set('art'And 8); console.log(hashMap.buckets);
/*
  bucket #0: <1 empty item>,
  bucket #1: 8
*/
assert.equal(hashMap.get('art'), 8); // this one is ok
assert.equal(hashMap.get('cat'), 8); // got overwritten by art 😱
assert.equal(hashMap.get('rat'), 8); // got overwritten by art 😱
assert.equal(hashMap.get('dog'), 8); // write by art 😱Copy the code

This HashMap allows us to set a set of key-value pairs using the set method and get the corresponding value by passing a key to the get method. The key is the hash function, to see how this HashMap behaves when we store multiple sets of keys.

Can you name the problem with this simple implementation of HashMap?

1) The Hash function converts too many identical indexes. Such as:

hash('cat'/ / 3hash('dog'/ / 3Copy the code

It creates a lot of conflict.

2) Don’t handle conflict situations at all. Cat and dog override each other’s values in the HashMap (they are both in bucket #1).

3 Array length. Even if we had a better hash function, since the length of the array is 2, which is less than the number of elements stored, there would still be a lot of collisions. We want the initial capacity of the HashMap to be greater than the amount of data we store.

Improve the hash function

The main goal of HashMap is to reduce the time complexity of array lookup and access from O(n) to O(1).

To do this, we need:

  1. A proper hash function that minimizes conflict as much as possible.
  2. An array large enough to hold data.

Let’s redesign the hash function so that instead of using the length of the string as the hash code, we use the sum of the ASCII codes for each character in the string as the hash code.

hash(key) {
  let hashValue = 0;
  const stringKey = key.toString();
  for (let index = 0; index < stringKey.length; index++) {
    const charCode = stringKey.charCodeAt(index);
    hashValue += charCode;
  }
  return hashValue;
}
Copy the code

Try again:

hash('cat') // 312  (c=99 + a=97 + t=116)
hash('dog') // 314 (d=100 + o=111 + g=103)
Copy the code

This function is better than the previous one! This is because words of the same length consist of different letters, so the sum of ASCII codes is different.

However, there are still problems! The words “rat” and “art” are 327. 💥

This can be solved by offsetting the ASCII code for each character and summing it:

hash(key) {
  let hashValue = 0;
  const stringKey = `${key}`;
  for (let index = 0; index < stringKey.length; index++) {
    const charCode = stringKey.charCodeAt(index);
    hashValue += charCode << (index * 8);
  }
  return hashValue;
}
Copy the code

Now, continuing with the experiment, the hexadecimal numbers are listed below, so that we can observe the displacement conveniently.

// r = 114 or 0x72; a = 97 or 0x61; t = 116 or 0x74

hash('rat'); (r: 114 * 1 + a: 97 * 256 + t: 116 * 65,536) or (r: 114 * 1 + a: 97 * 256 + t: 116 * 65,536) orin hex: 0x726174 (r: 0x72 + a: 0x6100 + t: 0x740000)

hash('art'); / / 7631457/0 x617274Copy the code

However, what happens if the data types are different?

hash(1); / / 49hash('1'); / / 49hash('1, 2, 3'); / / 741485668hash([1, 2, 3]); / / 741485668hash('undefined') / / 3402815551hash(undefined) // 3402815551
Copy the code

Oh my god, there’s still a problem!! Different data types should not return the same hash code!

What can be done about it?

One way to do this is to use the type of the data as part of the conversion hash code in the hash function.

hash(key) {
  let hashValue = 0;
  const stringTypeKey = `${key}${typeof key}`;
  for (let index = 0; index < stringTypeKey.length; index++) {
    const charCode = stringTypeKey.charCodeAt(index);
    hashValue += charCode << (index * 8);
  }
  return hashValue;
}
Copy the code

Let’s let’s try again:

console.log(hash(1)); // 1843909523
console.log(hash('1')); // 1927012762
console.log(hash('1, 2, 3')); // 2668498381
console.log(hash([1, 2, 3])); // 2533949129 console.log(hash('undefined')); // 5329828264
console.log(hash(undefined)); / / 6940203017Copy the code

Yay!!! 🎉 We finally have a better hash function!

In the meantime, we can change the original capacity of the HashMap to reduce conflicts, so let’s optimize the HashMap in the next section.

More sophisticated Implementation of HashMap

With a well-optimized hash function, a HashMap can perform even better.

Although conflicts can still occur, there are ways to manage them well.

For our HashMap, we want to make the following improvements:

  • Hash function that checks the type and calculates the characters (the sum of ASCII codes) to reduce collisions.
  • Resolve conflicts by adding values to the collection, and a counter is needed to track the number of conflicts.

More complete HashMap implementation complete code

class DecentHashMap {
  constructor(initialCapacity = 2) {
    this.buckets = new Array(initialCapacity);
    this.collisions = 0;
  }
  set(key, value) {
    const bucketIndex = this.getIndex(key);
    if(this.buckets[bucketIndex]) {
      this.buckets[bucketIndex].push({key, value});
      if(this.buckets[bucketIndex].length > 1) { this.collisions++; }}else {
      this.buckets[bucketIndex] = [{key, value}];
    }
    return this;
  }
  get(key) {
    const bucketIndex = this.getIndex(key);
    for (let arrayIndex = 0; arrayIndex < this.buckets[bucketIndex].length; arrayIndex++) {
      const entry = this.buckets[bucketIndex][arrayIndex];
      if(entry.key === key) {
        return entry.value
      }
    }
  }
  hash(key) {
    let hashValue = 0;
    const stringTypeKey = `${key}${typeof key}`;
    for (let index = 0; index < stringTypeKey.length; index++) {
      const charCode = stringTypeKey.charCodeAt(index);
      hashValue += charCode << (index * 8);
    }
    return hashValue;
  }
  getIndex(key) {
    const indexHash = this.hash(key);
    const index = indexHash % this.buckets.length;
    returnindex; }}Copy the code

See how this HashMap behaves:

// Usage:
const assert = require('assert');
const hashMap = new DecentHashMap();
hashMap.set('cat', 2);
hashMap.set('rat', 7);
hashMap.set('dog', 1);
hashMap.set('art'And 8); console.log('collisions: '.hashMap.collisions); // 2
console.log(hashMap.buckets);
/*
  bucket #0: [ { key: 'cat', value: 2 }, { key: 'art', value: 8 } ]
  bucket #1: [ { key: 'rat', value: 7 }, { key: 'dog', value: 1 } ]
*/
assert.equal(hashMap.get('art'), 8); // this one is ok
assert.equal(hashMap.get('cat'), 2); // Good. Didn't got overwritten by art
assert.equal(hashMap.get('rat'), 7); // Good. Didn't got overwritten by art
assert.equal(hashMap.get('dog'), 1); // Good. Didn't got overwritten by art
Copy the code

The improved HashMap does its job well, but there are still a few problems. Using the modified hash function makes it less likely to produce duplicate values, which is nice. However, there are two values in both bucket #0 and bucket #1. Why is that?

Since the HashMap has a capacity of 2, although the calculated hash code is different, when the remainder is calculated to figure out the number of buckets to put in, the result is either bucket #0 or bucket #1.

hash('cat') = > 3789411390; bucketIndex => 3789411390 % 2 = 0hash('art') = > 3789415740; bucketIndex => 3789415740 % 2 = 0hash('dog') = > 3788563007; bucketIndex => 3788563007 % 2 = 1hash('rat') = > 3789411405; bucketIndex => 3789411405 % 2 = 1Copy the code

It’s natural to think that you can solve this problem by increasing the raw capacity of the HashMap, but what should the raw capacity be? Let’s first look at how capacity affects the performance of a HashMap.

If the initial capacity is 1, all key-value pairs will be stored in the same bucket, bucket #0. The lookup operation is no simpler than the time complexity of storing data purely in arrays; they are all O(n).

Let’s say the initial capacity is 10:

const hashMapSize10 = new DecentHashMap(10);
hashMapSize10.set('cat', 2);
hashMapSize10.set('rat', 7);
hashMapSize10.set('dog', 1);
hashMapSize10.set('art'And 8); console.log('collisions: '.hashMapSize10.collisions); // 1
console.log('hashMapSize10\n'.hashMapSize10.buckets);
/*
  bucket#0: [ { key: 'cat', value: 2 }, { key: 'art', value: 8 } ],
            <4 empty items>,
  bucket#5: [ { key: 'rat', value: 7 } ],
            <1 empty item>,
  bucket#7: [ { key: 'dog', value: 1 } ],
            <2 empty items>
*/
Copy the code

Look at it this way:

HashMap: hash function translates keys into bucket (array) indexes

As you can see, by increasing the size of a HashMap, you can effectively reduce the number of collisions.

How about a bigger one, like 💯:

const hashMapSize100 = new DecentHashMap(100);
hashMapSize100.set('cat', 2);
hashMapSize100.set('rat', 7);
hashMapSize100.set('dog', 1);
hashMapSize100.set('art'And 8); console.log('collisions: '.hashMapSize100.collisions); // 0
console.log('hashMapSize100\n'.hashMapSize100.buckets);
/*
            <5 empty items>,
  bucket#5: [ { key: 'rat', value: 7 } ],
            <1 empty item>,
  bucket#7: [ { key: 'dog', value: 1 } ],
            <32 empty items>,
  bucket#41: [ { key: 'art', value: 8 } ],
            <49 empty items>,
  bucket#90: [ { key: 'cat', value: 2 } ],
            <9 empty items>
*/
Copy the code

Yay! 🎊 No conflict!

By increasing the initial capacity, conflicts are reduced, but more memory is consumed, and it is likely that many buckets will go unused.

Wouldn’t it be nice if our HashMap could automatically adjust its capacity as needed? This is called a rehash, and we’ll implement it in the next section!

Optimize the implementation of HashMap

If the size of the HashMap is large enough, there will be no collisions, so the time complexity of the lookup operation is O(1). But how do we know how big is enough? 100? 1000? Or a million?

It doesn’t make sense to allocate a lot of memory (to build arrays). So what we can do is dynamically adjust the capacity according to the loading factor. This operation is called rehash.

The load factor is a measure of how full a HashMap is and can be obtained by dividing the number of key-value pairs stored by the HashMap’s capacity.

With this in mind, we will implement the final version of HashMap:

Best HasnMap implementation

Class HashMap {constructor(initialCapacity = 16, loadFactor = 0.75) {this. Buckets = new Array(initialCapacity); this.loadFactor = loadFactor; this.size = 0; this.collisions = 0; this.keys = []; }hash(key) {
    let hashValue = 0;
    const stringTypeKey = `${key}${typeof key}`;
    for (let index = 0; index < stringTypeKey.length; index++) {
      const charCode = stringTypeKey.charCodeAt(index);
      hashValue += charCode << (index * 8);
    }
    return hashValue;
  }
  _getBucketIndex(key) {
    const hashValue = this.hash(key);
    const bucketIndex = hashValue % this.buckets.length;
    return bucketIndex;
  }
  set(key, value) {
    const {bucketIndex, entryIndex} = this._getIndexes(key);
    if(entryIndex === undefined) {
      // initialize array and save key/value
      const keyIndex = this.keys.push({content: key}) - 1; // keep track of the key index
      this.buckets[bucketIndex] = this.buckets[bucketIndex] || [];
      this.buckets[bucketIndex].push({key, value, keyIndex});
      this.size++;
      // Optional: keep count of collisions
      if(this.buckets[bucketIndex].length > 1) { this.collisions++; }}else {
      // override existing value
      this.buckets[bucketIndex][entryIndex].value = value;
    }
    // check if a rehash is due
    if(this.loadFactor > 0 && this.getLoadFactor() > this.loadFactor) {
      this.rehash(this.buckets.length * 2);
    }
    return this;
  }
  get(key) {
    const {bucketIndex, entryIndex} = this._getIndexes(key);
    if(entryIndex === undefined) {
      return;
    }
    return this.buckets[bucketIndex][entryIndex].value;
  }
  has(key) {
    return!!!!! this.get(key); } _getIndexes(key) { const bucketIndex = this._getBucketIndex(key); const values = this.buckets[bucketIndex] || [];for (let entryIndex = 0; entryIndex < values.length; entryIndex++) {
      const entry = values[entryIndex];
      if(entry.key === key) {
        return{bucketIndex, entryIndex}; }}return {bucketIndex};
  }
  delete(key) {
    const {bucketIndex, entryIndex, keyIndex} = this._getIndexes(key);
    if(entryIndex === undefined) {
      return false;
    }
    this.buckets[bucketIndex].splice(entryIndex, 1);
    delete this.keys[keyIndex];
    this.size--;
    return true;
  }
  rehash(newCapacity) {
    const newMap = new HashMap(newCapacity);
    this.keys.forEach(key => {
      if(key) { newMap.set(key.content, this.get(key.content)); }}); // update bucket this.buckets = newMap.buckets; this.collisions = newMap.collisions; // Optional: both `keys` has the same content except that the new one doesn't have empty spaces from deletions this.keys = newMap.keys; } getLoadFactor() { return this.size / this.buckets.length; }}Copy the code

The complete code.

Notice lines 99 through 114 (the rehash function), where the rehash magic happens. We created a new HashMap that has twice the capacity of the original HashMap.

Test the new implementation above:

const assert = require('assert');
const hashMap = new HashMap();
assert.equal(hashMap.getLoadFactor(), 0);
hashMap.set('songs', 2);
hashMap.set('pets', 7);
hashMap.set('tests', 1);
hashMap.set('art'And 8); assert.equal(hashMap.getLoadFactor(), 4/16);
hashMap.set('Pineapple'.'Pen Pineapple Apple Pen');
hashMap.set('Despacito'.'Luis Fonsi');
hashMap.set('Bailando'.'Enrique Iglesias');
hashMap.set('Dura'.'Daddy Yankee');
hashMap.set('Lean On'.'Major Lazer');
hashMap.set('Hello'.'Adele');
hashMap.set('All About That Bass'.'Meghan Trainor');
hashMap.set('This Is What You Came For'.'Calvin Harris ');
assert.equal(hashMap.collisions, 2);
assert.equal(hashMap. GetLoadFactor (), 0.75); assert.equal(hashMap.buckets.length, 16);
hashMap.set('Wake Me Up'.'Avicii'); // <--- Trigger REHASH
assert.equal(hashMap.collisions, 0);
assert.equal(hashMap. GetLoadFactor (), 0.40625); assert.equal(hashMap.buckets.length, 32);
Copy the code

Note that by the time the 12th element is stored into the HashMap, the load factor will exceed 0.75, thus triggering a rehash and doubling the HashMap capacity (from 16 to 32). At the same time, we can also see the conflict decrease from 2 to 0.

This version of HashMap implements common operations such as insert, find, delete, edit, and so on with very low time complexity.

To summarize, the performance of a HashMap depends on:

  1. Hash functions can output different values for different keys.
  2. Size of the HashMap capacity.

We have finally solved all kinds of problems 🔨. With a nice hash function, you can return different outputs for different inputs. We also have a rehash function that dynamically adjusts the size of the HashMap as needed. That’s great!

The time complexity of inserting elements into a HashMap

Inserting elements into a HashMap requires two things: a key and a value. You can use the optimized HashMap developed above or the built-in objects:

function insert(object, key, value) {
  object[key] = value;
  return object;
}
const hash = {};
console.log(insert(hash.'word', 1)); // => { word: 1 }

Copy the code

In the new version of JavaScript, you can use Map.

function insertMap(map, key, value) {
  map.set(key, value);
  return map;
}
const map = new Map();
console.log(insertMap(map, 'word', 1)); // Map { 'word' => 1 }
Copy the code

Pay attention to. We’ll use a Map instead of a normal object because the key of a Map can be anything whereas the key of an object can only be a string or a number. In addition, the Map preserves the order of insertion.

Further, map.set simply inserts elements into an Array (as shown in decenthashMap.set above), similar to array.push, so you can say:

To insert an element into a HashMap, the time is O(1). If rehash is required, then the complexity is O(n).

Rehash minimizes the possibility of conflict. The rehash operation is O(n) in time, but is not performed on every insert operation, only when needed.

The time complexity of finding or accessing elements in a HashMap

This is the hashMap.get method, and we get the corresponding value by passing a key into it. Let’s review the implementation of DecenthashMap.get:

Get (key) {consthashIndex = this .getIndex(key);
  const values = this .array [hashIndex];
  for(let index = 0 ; index <values.length; index ++){
    const entry = values [index];
    if(entry.key === key) {return entry.value}}}Copy the code

If no conflict occurs, values will have only one value and the time complexity of access is O(1). But we also know that there will always be conflicts. If the initial size of the HashMap is too small or the hash function is poorly designed, then the time complexity for most element access is O(n).

The time complexity of a HashMap access operation is O(1) on average and O(n) in the worst case.

** Another way to reduce the time complexity from O(n) to O(log N) is to use binary search trees instead of arrays for the underlying storage. In fact, the underlying implementation of Java HashMap transforms from an array to a tree when more than eight elements are stored.

The time complexity of modifying or deleting an element in a HashMap

To modify (hashMap.set) or delete (HashMap.delete) key-value pairs, the time complexity after allocation is O(1). If you have a lot of conflicts, you’re going to have a worst-case case of O(n). However, with rehash operations, the worst case scenario can be greatly reduced.

The time complexity of a HashMap modify or delete operation is O(1) on average and O(n) in the worst case.

The time complexity of the HashMap method

The following table summarizes the time complexity of the HashMap (method) :

The time complexity of the HashMap method

Operation method The worst On average, note
To access or find (HashMap.get) O(n) O(1) O(n)Are extreme situations where there are too many conflicts
Insert or modify (HashMap.set) O(n) O(1) O(n)Occurs only when the load factor exceeds 0.75, triggering a rehash
Delete (HashMap.delete) O(n) O(1) O(n)Are extreme situations where there are too many conflicts

Sets

Collections are very much like arrays. The difference is that the elements of a set are unique.

How do we implement a collection (that is, an array without duplicates)? You can use an array to check for the presence of a new element before inserting it. But the time to check for existence is O(n). Can this be optimized? The previously developed Map (insert operation) has a time complexity of O(1)!

The realization of the Set

You can use the Built-in Set in JavaScript. However, by implementing it yourself, you can understand its time complexity more intuitively. We will implement this using the optimized HashMap above with the rehash function.

const HashMap = require('.. /hash-maps/hash-map');
class MySet {
  constructor() {
    this.hashMap = new HashMap();
  }
  add(value) {
    this.hashMap.set(value);
  }
  has(value) {
    return this.hashMap.has(value);
  }
  get size() {
    return this.hashMap.size;
  }
  delete(value) {
    return this.hashMap.delete(value);
  }
  entries() {
    return this.hashMap.keys.reduce((acc, key) => {
      if(key ! == undefined) { acc.push(key.content); }returnacc }, []); }}Copy the code

(Note: The has method of the HashMap has a problem, so the has method of the Set has a problem.)

We use HashMap.set to add elements to a collection without repeating them. We use the value to be stored as the key of the HashMap, and since the hash function maps the key to a unique index, it has the effect of reordering.

To check whether an element already exists in the collection, use the HashMap.has method, which averages O(1). The post-allocation time complexity of most methods in the set is O(1), except for the entries method, whose event complexity is O(n).

Note: When using JavaScript’s built-in collections, its set.has method time is O(n). This is due to its use of List as an internal implementation and the need to check each element. You can check out the details here.

Here are some examples of how to use this collection:

const assert = require('assert');
// const set = new Set(); // Using the built-in
const set = new MySet(); // Using our own implementation
set.add('one');
set.add('uno');
set.add('one'); // should NOT add this one twice
assert.equal(set.has('one'), true);
assert.equal(set.has('dos'), false);
assert.equal(set.size, 2);
// assert.deepEqual(Array.from(set),'one'.'uno']);
assert.equal(set.delete('one'), true);
assert.equal(set.delete('one'), false);
assert.equal(set.has('one'), false);
assert.equal(set.size, 1);
Copy the code

In this example, both MySet and the built-in Set in JavaScript can be used as containers.

Time complexity of the Set method

Based on the Set implemented by HashMap, the time complexity can be summarized as follows (very similar to HashMap) :

Time complexity of the Set method

Operation method The worst On average, note
To access or find (Set.has) O(n) O(1) O(n)Are extreme situations where there are too many conflicts
Insert or modify (Set.add) O(n) O(1) O(n)Occurs only when the load factor exceeds 0.75, triggering a rehash
Delete (Set.delete) O(n) O(1) _O(n)_ is the extreme case with a lot of conflict.

Linked Lists

A linked list is a data structure in which one node links to the next.

LinkedList

A linked list is the first data structure to be implemented without an array. We do this using nodes, which store an element and point to the next node (or null if there is no next node).

class Node { constructor(value) { this.value = value; this.next = null; }}Copy the code

When each node points to its next node, we have a chain of nodes, a unidirectional list.

Singly Linked Lists

For a unidirectional linked list, we only care that each node has a reference to the next node.

Build from the first node, or root node (unidirectional linked list).

class LinkedList {
  constructor() {
    this.root = null;
  }
  // ...
}
Copy the code

Each linked list has four basic operations:

  1. AddLast: Adds an element to the end of the list.
  2. RemoveLast: Remove the last element of the list.
  3. AddFirst: Adds an element to the head of the list.
  4. RemoveFirst: Remove the first element of the list.

Adding or removing an element from the end of a linked list

There are two cases (for add operations). 1) If the root node of the list does not exist, set the new node as the root node of the list. 2) If there is a root node, the next node must be queried until the end of the list, and the new node is added to the end.

addLast(value) { // similar Array.push
  const node = new Node(value);
  if(this.root) {
    let currentNode = this.root;
    while(currentNode && currentNode.next) {
      currentNode = currentNode.next;
    }
    currentNode.next = node;
  } else{ this.root = node; }}Copy the code

What is the time complexity of the above code? If added as the root node, the time complexity is O(1), whereas the time complexity of finding the last node is O(n).

Removing the last node is similar to the above code.

removeLast() {
  let current = this.root;
  let target;
  if(current && current.next) {
    while(current && current.next && current.next.next) {
      current = current.next;
    }
    target = current.next;
    current.next = null;
  } else {
    this.root = null;
    target = current;
  }
  if(target) {
    returntarget.value; }}Copy the code

The time complexity is also order n. This is because we have to go down until we find the penultimate node and point its reference to next to null.

Adds or removes an element from the beginning of a linked list

To add an element to the beginning of the list looks like this:

addFirst(value) {
  const node = new Node(value);
  node.next = this.first;
  this.first = node;
}
Copy the code

The time it takes to add or delete to the beginning of the list is constant because we hold a reference to the root element:

removeFirst(value) {
  const target = this.first;
  this.first = target ? target.next : null;
  return target.value;
}
Copy the code

(Translator’s note: The code for removeFirst is misplaced, the translator implemented the code above)

As you can see, adding or deleting the beginning of a list always takes order one.

Removes an element from any part of the list

To remove the first and last elements of a linked list, use either removeFirst or removeLast. However, if the node to be removed is in the middle of the list, it needs to be removed from the list by pointing the previous node to the next node:

remove(index = 0) {
  if(index === 0) {
    return this.removeFirst();
  }
  let current;
  let target = this.first;
  for (let i = 0; target;  i++, current = target, target = target.next) {
    if(i === index) {
      if(! target.next) { //if it doesn't have next it means that it is the last return this.removeLast(); } current.next = target.next; this.size--; return target.value; }}}Copy the code

(Translator’s note: the original implementation is a bit of a problem, the translator slightly modified. The call to removeLast is actually a waste of performance, but it increases readability, so I won’t change it here.)

Notice that index starts at 0:0 is the first node, 1 is the second, and so on.

Deleting a node at any point in the list takes O(n) time.

Look up elements in a linked list

Finding an element in a linked list is similar to deleting an element:

contains(value) {
  for (let current = this.first, index = 0; current;  index++, current = current.next) {
    if(current.value === value) {
      returnindex; }}}Copy the code

This method finds the first node in the list that is equal to the given value.

To find an element in a linked list, the time complexity is order n.

Time complexity of one-way linked list operation method

The time complexity of unidirectional linked lists (methods) is summarized in the following table:

Operation method Time complexity annotation
addFirst O(1) Inserts the element at the beginning of the list
addLast O(n) Inserts the element at the end of the list
add O(n) Inserts an element anywhere in the list
removeFirst O(1) Removes the first element of a linked list
removeLast O(n) Removes the last element from the list
remove O(n) Delete any element in the linked list
contains O(n) Look up any element in a linked list

Note that when we add or delete the last element of the list, the time complexity of the operation is O(n)…

But as long as you have a reference to the last node, you can go from O(n) to O(1)!, consistent with adding or deleting the first element.

We’ll implement this in the next section!

Doubly Linked Lists

When we have a list of nodes, each of which has a reference to the next node, we have a unidirectional list. When a list of nodes, each of which has a reference to the next node as well as a reference to the last node, is a bidirectional list.

Doubly Linked List

Nodes in a bidirectional list have two references (one to the previous and one to the latter), so you need to keep track of the first and last nodes.

class Node {
  constructor(value) {
    this.value = value;
    this.next = null;
    this.previous = null;
  }
}
class LinkedList {
  constructor() {
    this.first = null; // head/root element
    this.last = null; // last element of the list
    this.size = 0; // total number of elements in the list
  }
  // ...
}
Copy the code

(The complete code for a two-way linked list)

Adds or removes the first element of a linked list

Since it holds a reference to the first node, adding or removing the first element is simple:

addFirst(value) {
  const node = new Node(value);
  node.next = this.first;
  if(this.first) {
    this.first.previous = node;
  } else {
    this.last = node;
  }
  this.first = node; // update head
  this.size++;
  return node;
}
Copy the code

(LinkedList. Prototype. AddFirst complete code

Note that we need to be very careful to update the previous reference to the node, the size of the bidirectional list, and the reference to the last node of the bidirectional list.

removeFirst() {
  const first = this.first;
  if(first) {
    this.first = first.next;
    if(this.first) {
      this.first.previous = null;
    }
    this.size--;
    return first.value;
  } else{ this.last = null; }}Copy the code

(LinkedList. Prototype. RemoveFirst complete code

What is the time complexity?

The operation time of adding and deleting the first node is constant in both unidirectional and bidirectional lists, and the time complexity is O(1).

Adds or removes the last element of the list

Adding or removing an element from the end of a bidirectional list is a little cumbersome. When you query a unidirectional list (the time complexity of the operation), both operations are O(n) because you need to traverse the entire list until you find the last element. However, a bidirectional linked list holds a reference to the last node:

addLast(value) {
  const node = new Node(value);
  if(this.first) {
    node.previous = this.last;
    this.last.next = node;
    this.last = node;
  } else {
    this.first = node;
    this.last = node;
  }
  this.size++;
  return node;
}
Copy the code

The complete code (LinkedList. Prototype. AddLast)

Also, we need to be careful about updating references and dealing with special cases, such as when there is only one element in a linked list.

removeLast() {
  let current = this.first;
  let target;
  if(current && current.next) {
    target = this.last;
    current = target.previous;
    this.last = current;
    current.next = null;
  } else {
    this.first = null;
    this.last = null;
    target = current;
  }
  if(target) {
    this.size--;
    returntarget.value; }}Copy the code

The complete code (LinkedList. Prototype. RemoveLast)

With bidirectional linked lists, we no longer have to traverse the entire list until we find the penultimate element. You can use this.last.previous directly to find it, and the time complexity is O(1).

Queues, which are described below, are implemented in this article using two arrays. Instead, you can implement the queue using a bidirectional linked list because the time to add the first element and to remove the last element is O(1).

Adds an element to any part of the list

With addFirst and addLast, you can add an element to any part of the list. The implementation is as follows:

add(value, index = 0) {
  if(index === 0) {
    return this.addFirst(value);
  }
  for (let current = this.first, i = 0; i <= this.size;  i++, current = (current && current.next)) {
    if(i === index) {
      if(i === this.size) { // if it doesn't have next it means that it is the last return this.addLast(value); } const newNode = new Node(value); newNode.previous = current.previous; newNode.next = current; current.previous.next = newNode; if(current.next) { current.next.previous = newNode; } this.size++; return newNode; }}}Copy the code

(Complete code for linkedList.prototype.add)

If the element is added in the middle of the list, we must update the next and previous references of the nodes before and after the element.

The time required to add an element anywhere on the list is order n.

Time complexity of the bidirectional linked list method

The time complexity of each method in a two-way linked list is shown below:

Operation method Time complexity annotation
addFirst O(1) Inserts the element at the beginning of the list
addLast O(1) Inserts the element at the end of the list
add O(n) Inserts an element anywhere in the list
removeFirst O(1) Removes the first element of a linked list
removeLast O(1) Removes the last element from the list
remove O(n) Delete any element in the linked list
contains O(n) Look up any element in a linked list

Compared to the unidirectional linked list, this is a great improvement! (addLast and removeLast) operation time reduced from O(n) to O(1) due to:

  • Adds a reference to the previous node.
  • Holds a reference to the last node in the list.

Deleting the first or last node can be done in constant time, while deleting the intermediate node is still O(n) time.

Stacks

A stack is a data structure in which the more elements are added later, the more they pop up first. Last in, first out, or LIFO.

Stack: push and pop

Let’s implement a stack from scratch!

class Stack {
  constructor() {
    this.input = [];
  }
  push(element) {
    this.input.push(element);
    return this;
  }
  pop() {
    returnthis.input.pop(); }}Copy the code

As you can see, it’s quite simple to implement a stack using the built-in array. push and array. pop. The time complexity of both methods is order one.

Here’s a look at how stacks are used:

const stack = new Stack();
stack.push('a');
stack.push('b');
stack.push('c');
stack.pop(); // c
stack.pop(); // b
stack.pop(); // a
Copy the code

The element a that is added first is not ejected until the end. Stacks can also be implemented with linked lists, and the time complexity of the corresponding methods is the same.

That’s all there is to the stack!

Queues

A queue is a data structure where the more elements are added, the more they are dequeued. Or FIFO. Just like people in a line in real life, those who are first in line are first served.

Queue: enqueue and dequeue

You can implement a queue through an array, and the code is similar to the stack implementation.

Implement queues through arrays

Array.push and array. shift enable a simple (not optimal) queue:

class Queue {
  constructor() {
    this.input = [];
  }
  add(element) {
    this.input.push(element);
  }
  remove() {
    returnthis.input.shift(); }}Copy the code

What is the time complexity of queue. add and queue. remove?

  • Queue.adduseArray.pushImplementation, can be done in constant time. This is great!
  • Queue.removeuseArray.shiftThe implementation,Array.shiftThe time is linear (i.eO(n)). We can reduceQueue.removeIs it time consuming?

Imagine if you could implement a queue using only array. push and array. pop.

class Queue {
  constructor() {
    this.input = [];
    this.output = [];
  }
  add(element) {
    this.input.push(element);
  }
  remove() {
    if(! this.output.length) {while(this.input.length) { this.output.push(this.input.pop()); }}returnthis.output.pop(); }}Copy the code

Now, we implement a queue using two arrays instead of one.

const queue = new Queue();
queue.add('a');
queue.add('b');
queue.remove() // a
queue.add('c');
queue.remove() // b
queue.remove() // c
Copy the code

The output array was empty when we first performed the outlisting operation, so we appended the contents of the input array back to the output, where the output value is [‘b’, ‘a’]. The element is then ejected from the output. As you can see, in the queue implemented by this technique, the elements are also output in first-in, first-out (FIFO) order.

What is the time complexity?

If the output array already has elements, the exit operation is constant O(1). When the output needs to be filled (that is, there are no elements in it), the time complexity becomes O(n). After the output is filled, the output operation time becomes constant again. So it’s O(1).

Queues can also be implemented through linked lists, and the related operation time is constant. The next section will cover the implementation.

The queue is implemented through a two-way linked list

If you want the best performance from queues, you need to do it with bidirectional lists rather than arrays.

const LinkedList = require('.. /linked-lists/linked-list');
class Queue {
  constructor() {
    this.input = new LinkedList();
  }
  add(element) {
    this.input.addFirst(element);
  }
  remove() {
    return this.input.removeLast();
  }
  get size() {
    returnthis.input.size; }}Copy the code

With queues implemented by bidirectional lists, we hold references to the first and last nodes, so the time complexity of entry and exit is O(1). This is the importance of choosing the right data structure for the problem encountered 💪.

conclusion

We talked about mostly linear data structures. As you can see, the same data structure will have different time complexity depending on the implementation method.

The following is a summary of the discussion:

Time complexity

* = Runtime allocation

The data structure insert access To find the delete note
Array O(n) O(1) O(n) O(n) Insert last position complexity isO(1).
(Hash)Map O(1)* O(1)* O(1)* O(1)* Recalculating the hash affects the insert time.
Map O(log(n)) O(log(n)) O(log(n)) Through binary search tree implementation
Set(Using HashMap) The O (1) * O(1)* O(1)* By the HashMap implementation
Set(use the List) O(n) O(n)] O(n) Implement with List
Set(Using binary search trees) O(log(n)) O(log(n)) O(log(n)) Through binary search tree implementation
Linked List(one-way) O(n) O(n) O(n) Add or remove elements at the start with complexity ofO(1)
Linked List(two-way) O(n) O(n) O(n) Add or remove elements at the beginning or end with complexity ofO(1). However in other locations it isO(n).
Stack(Implemented by Array) O(1) O(1)] Insertions and deletions follow last in First out (LIFO)
Queue(Implemented simply by Array) O(n) O(1) The complexity of the insert (array.shift) operation isO(n)
Queue(Implemented by Array, but improved) O(1)* O(1) The worst-case complexity of the insert operation isO(n). But the apportionment isO(1)
Queue(implemented by List) O(1) O(1) Use two-way linked lists

Note: Binary search trees and other tree structures, graph structures, will be discussed in another article.