How could I create my own string-type vector with new string instead of allocator?

I wrote a very simple code just to see if the allocator would work as allocating memory for string. And it works. However, I would like to know if I could achieve the same effect with new keyword. If I could, which method would be a better practice ? My code sample is as following:

#include<iostream>
#include<string>
#include<memory>
class MyVector{
private:
    int size;
    int capacity;
    std::allocator<std::string> str;
    std::string*a;
    void allocate(){
        capacity = (size-1)*2;
        std::string*temp = str.allocate(capacity);
        for(int i=0; i<this->getSize();++i){
            temp[i] = a[i];
        }
        str.deallocate(a, 1);
        a = temp;
    }
public:
    MyVector():size(0), capacity(1), a(str.allocate(1)){};
    int getSize(){return size;};
    int getCapacity(){return capacity;};
    void pushBack(std::string input){
        ++size;
        if(this->getSize()>this->getCapacity()){
            this->allocate();
        }
        a[this->getSize()-1]=input;
    }
    std::string at(int index){
        for(int i=0; i<this->getSize();++i){
            if(i==index){
                return a[i];
            }
        }
        return 0;
    }
};
int main(){
    MyVector v;
    v.pushBack("Sam");
    std::cout<<v.at(0)<<std::endl;
return 0;
}

1 answer

  • answered 2018-01-11 22:29 AliaumeM

    If I understand your question correctly, you're trying to allocate a bunch of strings dynamically, and would like to know whether you can use operator new() to perform the allocation, and in case you can, which method is the best.

    There are a few problems with your implementation I have to point out before being able to answer your question. So let's walk through it!

    • std::string* a doesn't get deallocated when MyVector is destroyed

      This is the most obvious one. There is no destructor, and you are manually managing a raw pointer, which therefore must be deallocated! Right now, when MyVector goes out of scope, the memory pointed to by a is going to become unreachable, and there will be no garbage collector to clean it up.

      Therefore we'd have to add a method like this:

      ~MyVector() { str.deallocate(a, capacity); }

    • What if this->allocate() can't allocate, and throws std::bad_alloc when you're doing a push_back?

      size has already been incremented, and the capacity has been updated but the string you were trying to push hasn't been copied, and the buffer hasn't grown, which leaves your container in an invalid state. You would probably end up accessing memory outside of the bounds of your buffer, which is undefined behavior (this could work, crash, or even get your male cat pregnant)

      Alright, this can be fixed easily by putting the assignments after the actual allocation has happened. No big deal. Right?

    • What if when trying to copy the strings from the old buffer to the new, one of them can't allocate its own internal buffer?

      Well it's going to throw std::bad_alloc in the middle of the loop. And your temp buffer is never going to be deallocated. Ouch.

      This is getting serious. We could put a catch clause in there, but this is starting to be a lot of code just for maintaining the pointer in a good state.

    • Isn't temp[i] = a[i]; calling the assignment operator on uninitialized memory, which would be another undefined behavior?

      I'm not 100% certain on this one, but my C++ instinct tells me this is pretty risky.

    So, how do I get out of this long list of issues?

    Use new instead? Maybe new[] because this is an array of strings? Well, this would be cleaner, especially if you're going to use the default std::allocator anyway.

    But wait, there's better!

    Looking at the C++ Core Guidelines, we can see under P.8 that we shouldn't leak any resources, it is advised to use RAII, and to look for "naked new". Basically, what that means is that you should avoid using new in normal code to allocate resources dynamically. Instead, the guidelines encourage you to use unique_ptr and to use make_unique() to construct objects owned by unique_ptrs

    As a reference, here is unique_ptr's page on cppreference. You can also read a bit more about it here, or watch one of the designers of the language explain the concepts I touched on much better than I could on YouTube

    Following these guidelines, your code could become much more modern-conforming. It would look like this:

    #include<string>
    #include<memory>
    class MyVector{
    
    private:
        int size;
        int capacity;
        std::unique_ptr<std::string[]> a;
        void allocate(){
            size_t new_capacity = size*2;
            auto temp = std::make_unique<std::string[]>(new_capacity);
            std::copy(a.get(), a.get()+size, temp.get());
            capacity = new_capacity; // We have finished all the operations that could throw!
            std::swap(a, temp); // Because this can't throw
        }
    public:
        MyVector():size(0), capacity(1) {}
        // Since unique_ptr<>'s destructor is called automatically
        // we don't need to do it explicitely!
        int getSize(){return size;};
        int getCapacity(){return capacity;};
        void pushBack(std::string input){
            if(this->getSize() == this->getCapacity()){ // We have to change the comparison
                this->allocate();
            }
            a[this->getSize()] = input; // This could throw too!
            ++size;
        }
        std::string at(int index){
            if(index >= size)
                throw std::out_of_range("Trying to access an element past the end in MyVector!");
    
            return a.get()[index];
        }
    };
    

    One final note

    This container is still pretty inefficient (a growth factor of 2 is not the theoretical best, although I don't know that much more about it), it has no notion of move semantics, it can't be copied, it can't be specialized for other types (although that wouldn't be too difficult), it doesn't have handy iterators to use with algorithms or a range-based-for loop, and so on.

    It is, however a very good learning exercise, and I applaud you for trying to improve on it by posting your result on StackOverflow :)

    Making a production-ready container is actually a lot of work, and requires quite deep hardware knowledge to get right, so as a conclusion I would advise you to stick to std::vector for when you actually need to use a vector somewhere ;)