This article is based on the source code to analyze the implementation principle of the standard template library vector and some special considerations.

To clarify, I am using the GCC7.1.0 compiler, and the standard library source code is also in this version.

Many years ago, I was asked about the underlying implementation of vector in STL for the first time in the interview. At that time, I was really low and could not answer the question. Later, when I came back from the interview, I searched some underlying implementation of vector on the Internet and learned that its underlying implementation is dynamic array, but knowing dynamic array is not enough. How to write full dynamic array, its implementation using c++ what technology, some special scenarios how to use vector more efficient and so on, these few people clearly, today I based on GCC inside the source to analyze these more in-depth problems.

First take a look at the mind map of this article, as follows:

I. Implementation principle of Vector

1. Introduction to vector base classes

Stl_vector.h: stl_vector.h: stl_vector.h: stl_vector.h: stl_vector.h: stl_vector.h: stl_vector.h

// Two template arguments, the first is a data type, and the second STD ::allocator is a library dynamic memory allocator, which ultimately calls the new operator
template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
    class vector : protected_Vector_base<_Tp, _Alloc> {... };Copy the code

Vector

is a derived class. Vector

is a derived class. It is a template class. Its base class is _Vector_base, so let’s first look at the implementation of this base class.

The public members of the base class also become protected members of the derived class. Protected members and private members of the base class have the same permissions on the derived class as the base class.

Again, the stl_vector.h header captures an implementation of the base class as follows:

  template<typename _Tp, typename _Alloc>
    struct _Vector_base
    {
      typedef typename __gnu_cxx::__alloc_traits<_Alloc>::template
	rebind<_Tp>::other _Tp_alloc_type;
      typedef typename __gnu_cxx::__alloc_traits<_Tp_alloc_type>::pointer
       	pointer;
      struct _Vector_impl
      : public _Tp_alloc_type
      {
	pointer _M_start;// Start position of container
	pointer _M_finish;// End position of container
	pointer _M_end_of_storage;// Next to the last dynamic memory location requested by the container

	_Vector_impl()
	: _Tp_alloc_type(), _M_start(), _M_finish(), _M_end_of_storage()
	{ }
	...  // Some code is not truncated with ellipsis, the same as the following
      };
    public:
      typedef_Alloc allocator_type; .// Omit some source code here
      _Vector_base()
      : _M_impl() { }
        _Vector_base(size_t__n) : _M_impl() { _M_create_storage(__n); }...// Omit some source code here
    public:
      _Vector_impl _M_impl;
	      pointer
      _M_allocate(size_t __n)
      {
	typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
	return__n ! =0 ? _Tr::allocate(_M_impl, __n) : pointer(a); }void
      _M_deallocate(pointer __p, size_t __n)
      {
	typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
	if (__p)
	  _Tr::deallocate(_M_impl, __p, __n);
      }

    private:
      void
      _M_create_storage(size_t __n)
      {
	this->_M_impl._M_start = this->_M_allocate(__n);
	this->_M_impl._M_finish = this->_M_impl._M_start;
	this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n; }};Copy the code

The base class declares a struct _Vector_impl and uses that structure to declare a public member variable _M_impl. In the case of the base class’s no-argument constructor, it simply calls the struct _Vector_impl constructor. The constructors of the individual member variables of the struct _Vector_impl are then called.

It is important to note that the three member variables of the _Vector_impl structure are important. They appear multiple times in the vector implementation, and their roles are stated in the comments. These three member variables hold the beginning and end of the vector, and the next location of the requested memory.

At this point, we are still confused by the fact that this base class does nothing. What does it do, in fact, for things like vector

vec; Vector calls the base class’s no-argument constructor. It does nothing, and does not apply for dynamic memory.

Why not apply for dynamic memory, no arguments structure here involves the principle of saving resource, hypothesis here to apply for a dynamic memory, but behind you didn’t use this vector, that this application for and release of this piece of the action of dynamic memory virtually leads to waste of time and space, this does not conform to the principle of the STL performance preference.

What is STL performance first? The c++ standard states that STL should take performance first, so other fault tolerance and more features can be discarded.

We can also see that if the vector is constructed by passing size n to the base class, the member function _M_create_storage is called to allocate dynamic memory and assign values to the member variables.

Here we get at least two functions of the base class:

  • Stores the start and end locations of the container and the next location of the requested memory space.
  • Apply for dynamic memory space.
2. What happens when the vector inserts elements from the very back
2.1 Insert an element into the empty vector

We stated in the previous section that if a vector is constructed with a container size specified, dynamic memory is allocated when it is declared. But if the construct is the default and dynamic memory is not allocated, what happens when we insert an element into a vector that has no space?

We found an implementation of vector push_back as follows:

void
push_back(const value_type& __x)
{
	if (this->_M_impl._M_finish ! =this->_M_impl._M_end_of_storage)
	{
		_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
             __x);
		++this->_M_impl._M_finish;
	}
	else
		_M_realloc_insert(end(), __x);
}
Copy the code

This function inserts elements directly into the vector when memory is not full. If memory is full, it calls the vector member function _M_realloc_insert. Obviously inserting an element into a vector that has no space calls _M_realloc_insert, which looks like this:

#if __cplusplus >= 201103L
  template<typename _Tp, typename _Alloc>
    template<typename. _Args>void
      vector<_Tp, _Alloc>::
      _M_realloc_insert(iterator __position, _Args&&... __args)
#else
  template<typename _Tp, typename _Alloc>
    void
    vector<_Tp, _Alloc>::
    _M_realloc_insert(iterator __position, const _Tp& __x)
#endif
{
      _M_check_len returns 1 each time it doubles the size of the current container, as long as it does not exceed the maximum memory size specified by the STL
      const size_type __len = _M_check_len(size_type(1), "vector::_M_realloc_insert");
      const size_type __elems_before = __position - begin(a);// According to section 1 above, _M_allocate allocates memory space based on the passed length
      pointer __new_start(this->_M_allocate(__len));
      pointer __new_finish(__new_start);
      __try
	{
      // write x to the corresponding position
	  _Alloc_traits::construct(this->_M_impl,
				   __new_start + __elems_before,
#if __cplusplus >= 201103Lstd::forward<_Args>(__args)...) ;#else
				   __x);
#endif
	  __new_finish = pointer(a);// Copy the original data into the new memory
	  __new_finish
	    = std::__uninitialized_move_if_noexcept_a
	    (this->_M_impl._M_start, __position.base(),
	     __new_start, _M_get_Tp_allocator());

	  ++__new_finish;
	  // Insert elements into the middle of vector
	  __new_finish
	    = std::__uninitialized_move_if_noexcept_a
	    (__position.base(), this->_M_impl._M_finish,
	     __new_finish, _M_get_Tp_allocator());
	}
      __catch(...)
	{
		...;
    }
        // This destroys the original memory and assigns new values to the member variables
        std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
		    _M_get_Tp_allocator());
      _M_deallocate(this->_M_impl._M_start,
		    this->_M_impl._M_end_of_storage
		    - this->_M_impl._M_start);
      this->_M_impl._M_start = __new_start;
      this->_M_impl._M_finish = __new_finish;
      this->_M_impl._M_end_of_storage = __new_start + __len;
};
Copy the code

From the above code, we can see the flow of inserting an element into a vector that has no space:

  • Get a length that is 1 on the first insert, and then multiply it by 2 if it exceeds the space requested by the container, and then apply for new memory.
  • Insert the element to be inserted into the corresponding position;
  • Copy all the elements in the old memory into the new memory.
  • Calls the destructor of all elements in the old memory and destroys the old memory;

According to the above logic, inserting an element into a vector with no space actually requires space for 1 element and inserts that element into the vector.

If we are certain that a vector will be used and has data, we should specify the number of elements in the declaration to avoid the initial dynamic memory consumption, which may affect performance.

2.2 Vector Is inserted when the current memory runs out

So if it runs out of memory is how, in fact, the existing memory space is actually finished with the first insert the first element of call route is consistent, that is to say, if you run out of existing space, will be based on the current space are superior to 2, then put the original copies all the data in the memory space to the new memory, Finally, the current data to be inserted is inserted at the next position after the last element.

3. What happens when vector inserts an element in the middle

We use the vector member function insert to insert elements. A basic implementation of insert is as follows:

  template<typename _Tp, typename _Alloc>
    typename vector<_Tp, _Alloc>::iterator
    vector<_Tp, _Alloc>::
#if __cplusplus >= 201103L
    insert(const_iterator __position, const value_type& __x)
#else
    insert(iterator __position, const value_type& __x)
#endif
    {
      const size_type __n = __position - begin(a);if (this->_M_impl._M_finish ! =this->_M_impl._M_end_of_storage)
	if (__position == end())
	  {
	    _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
				     __x);
	    ++this->_M_impl._M_finish;
	  }
	else
	  {
#if __cplusplus >= 201103L
	    const auto __pos = begin() + (__position - cbegin());
	    // __x could be an existing element of this vector, so make a
	    // copy of it before _M_insert_aux moves elements around.
	    _Temporary_value __x_copy(this, __x);
	    _M_insert_aux(__pos, std::move(__x_copy._M_val()));
#else
	    _M_insert_aux(__position, __x);
#endif
	  }
      else
#if __cplusplus >= 201103L
	_M_realloc_insert(begin() + (__position - cbegin()), __x);
#else
	_M_realloc_insert(__position, __x);
#endif

      return iterator(this->_M_impl._M_start + __n);
    }
Copy the code

Insert does exactly the same thing as push_back when it runs out of space. You can see the comment on _M_realloc_insert in section 2. The second call to STD :: __uninitialized_move_if_noEXCEPt_a is used to insert elements into the middle. If push_back is used, this second call has no effect.

So what happens if I insert it in the middle when I have enough space? Before c++11, we called the _M_insert_aux function directly.

#if __cplusplus >= 201103L
template<typename _Tp, typename _Alloc>
    template<typename _Arg>
      void
      vector<_Tp, _Alloc>::
      _M_insert_aux(iterator __position, _Arg&& __arg)
#else
  template<typename _Tp, typename _Alloc>
    void
    vector<_Tp, _Alloc>::
    _M_insert_aux(iterator __position, const _Tp& __x)
#endif
    {
      _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
			       _GLIBCXX_MOVE(*(this->_M_impl._M_finish
					       - 1)));
      ++this->_M_impl._M_finish;
#if __cplusplus < 201103L
      _Tp __x_copy = __x;
#endif
      _GLIBCXX_MOVE_BACKWARD3(__position.base(),
			      this->_M_impl._M_finish - 2.this->_M_impl._M_finish - 1);
#if __cplusplus < 201103L
      *__position = __x_copy;
#else
      *__position = std::forward<_Arg>(__arg);
#endif
    }
Copy the code

As you can see from the code, you simply move the element behind the current position and insert the element into the corresponding position.

4. Will memory be freed when vector deletes elements
4.1 Last Delete from container

To delete from the last container, call pop_back. Let’s look at the implementation of this function:

void
      pop_back(a) _GLIBCXX_NOEXCEPT
      {
	__glibcxx_requires_nonempty();
	--this->_M_impl._M_finish;
	_Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
      }
Copy the code

This is easier, just move the last element forward one bit, and then destroy the last element.

4.2 Deleting a Container From the Middle

To remove an element from the middle of a container is to remove an element at a specified position. This is done by erase. The simplest way to erase is as follows:

iterator
#if __cplusplus >= 201103L
      erase(const_iterator __position)
      { return _M_erase(begin() + (__position - cbegin())); }
#else
      erase(iterator __position)
      { return _M_erase(__position); }
#endif
Copy the code

Call _M_erase. Let’s look at the implementation of this function:

template<typename _Alloc>
    typename vector<bool, _Alloc>::iterator
    vector<bool, _Alloc>::
    _M_erase(iterator __position)
    {
      if (__position + 1! =end())
        std::copy(__position + 1.end(), __position);
      --this->_M_impl._M_finish;
      return __position;
    }
Copy the code

This function, without deleting the last element, moves all elements following the element forward one bit, which is a copy action, moves the end of the container forward one bit, and returns an iterator to the current position.

In summary, deleting elements does not free up existing allocated dynamic memory.

5. How does vector change the value of an element at a location

Vector does not support modifying elements at all. If it does not, it must delete and then insert elements.

6. How efficient is vector at reading the value of an element

To access elements directly, vector provides several functions. To access elements in a specified position, we can use operator[] and at functions.

const_reference
      operator[](size_type __n) const _GLIBCXX_NOEXCEPT
      {
	__glibcxx_requires_subscript(__n);
	return* (this->_M_impl._M_start + __n);
      }
      const_reference
      at(size_type __n) const
      {
	_M_range_check(__n);
	return (*this)[__n];
      }
Copy the code

As you can see, the at function is actually the operator[] function that is called with an extra action to check whether it is out of bounds, while the operator[] function directly jumps to the location to access elements, so the speed is very fast. In terms of time complexity, it is O(1).

7. What did c++11 add to vector

As you can see from the code above, full of precompiled options such as #if __cplusplus >= 201103L, which actually represents a version of c++. For example, the c++11 standard was released in March 2011. So this line of code means that if we specify __cplusplus as 201103 at compile time, then expand the code below it, otherwise expand the code after #else.

What has been added to vector since c++11? Let’s see:

  • For iterators, add cBEGIN functions that return constant iterators, namely read-only iterators.
  • Move constructors and move assignment functions have been added to almost all types of the standard library.
  • Added public member function shrink_to_fit to allow the release of unused memory.
  • Added public member functions emplace and emplace_back, which support the construction of elements at specified locations. Since they pass arguments by rvalue reference, they have one less copy action than functions such as push_back.
8. Summary of vector underlying implementation

In general, the vector is a dynamic array, it maintains a continuous dynamic memory space, and then there are three member variables respectively starting position, the current has saved using the position and application of dynamic memory of the last position of the next position, when the current application of dynamic memory has been used up, it shall, in accordance with the original space size double apply again, And copy all the original elements.

Based on the previous sections, the time complexity for vector operations is shown as follows:

  • Access elements, time complexity is O(1);
  • At the end of the insert or delete elements, the time complexity is O(1);
  • Inserting or deleting elements in the middle takes O(n) time.

Ii. Precautions when using vector

1. Use at instead of operator in uncertain situations []

In the previous access elements section, we said that at checks to see if the access action is out of bounds. Assuming we are not sure if the current access action is out of bounds, we should use the AT function.

2. What type cannot be used as a template type for vector

For vector template specializations, vector template types should have a public copy constructor and overloaded assignment operator functions because variables are often copied or assigned during vector implementation.

3. When does a vector iterator become invalid
  • That is, we call erase. The iterator is invalid because the current position will be overwritten by later elements. But we can retrieve the correct iterator from erase.
  • Second, iterator invalidation occurs when the vector needs to reapply for memory, such as expanding capacity, freeing unused memory, etc. Since the memory is changed, iterators need to be reacquired.

Some people say that inserting data into a vector invalidates iterators. This is not the case. As we can see from the insert implementation, the current iterator is reassigned after the insertion.

4. How does vector free memory quickly

We can call reserve(0) to free memory, because the reserve function will reallocate memory based on the specified size. This function will only act if the size of the memory passed in is larger than the original size.

Resize, clear, etc. These functions only destruct all elements currently stored in the container, but do not free the memory space in which the container itself resides.

4.1 Using the swap function

If a vector is empty, we can swap an empty vector with the current vector. If a vector is empty, we can swap an empty vector with the current vector. Swap (v) with a vector

().swap(v) swap the memory represented by v with an empty vector. The empty vector, since it is a temporary variable, is freed after the end of this line. Vector’s destructor is automatically called to free dynamic memory, and a vector’s dynamic memory is quickly freed.

4.2 Using shrink_to_fit

Shrink_to_fit = shrink_to_fit = shrink_to_fit = shrink_to_fit = shrink_to_fit = shrink_to_fit That frees up the vector.