C Strings in C++ and Deep Copies

I was told on Stack Overflow:

“A pointer is just a pointer.”

And that is true of course. To go beyond that we have structures and classes which have explicit construction and operators. However, C++ has inherited C strings which are just a pointer to the first char, but they also have a long history where the construction and operations are well known. So, when a C String is given to a map or vector, it may be thought that it could be doing a deep copy. After all, it is known (by implicit rules) how this could be completed. The size is defined. It is just not stored. Also, the storage is clear – just not managed.

So, what does this mean for a hash function. For std::unordered_map it will check if the type being used for the key has the required functionality. It needs to be sure the key type provides a comparison function and a hash function. These are built in for the fundamental types as the comparison is part of the language and the hash is easy to implement as size_of(type) is also known. Also for classes like std::string, they are implemented to supply these important functions.

This makes sense, but what about a C String?

Here the comparison is known, strcmp(), as is the hash by knowing the length of bytes from strlen(). So, it should be possible, however, it has not been implemented. unordered_map will treat a C String just like a char*. This means that if the string is changed then the result will unexpected.

char* name[8];
std::unordered_map<char*, char*> names;

strcpy(name, "smith");
names["john"] = name;
strcpy(name, "doe");
printf("%s", names["john"]);   
// prints "doe" and not "smith"

It is clear that C++ would not continue with the pain that is C Strings so it makes sense that it would not implement a deep copy just for this case. After all it could a pointer to a char and not a C String.

Is it possible to do this deep copy in a map class? Of course, but let’s look at what the pros and cons would be.

This is the smallest implementation and is only so the points can be raised. Clearly this is very hacky code.

The operator[] is implemented to return a value that is not a string, but a reference to the whole map. This is only capturing the index for later use. The assignment is done during the operator= and it uses the captured index to do the work. Finally, the cast operator char* allows the internal storage to get the right char* pointer returned where map is use.

This relies on the calling order of operator[] and the assignment and
cast and I can’t say it would work in all situation other than the simplest.

Note that due to the returning values of the operators the cast operator is required to make this work. Not ideal and makes using it ugly. (char*)map[key].

It was a useful journey for someone coming to C++ from an entrench C history.

To really get the point firmly worked out I implemented my own maps and string classes. This drove home what is going in and more importantly why, and when it helps.

Once I had worked through this, I realised this was the same though process that is the heart of C++, it is clear that being explicit and being able to specify when a deep copy occurs is very useful and the correct approach.

Running example of the code can be seen here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.