C++ object lifecyle adventure

Posted on 2024-06-17 In Coding

Introduction

Programmer: Huh. I wonder where are my objects?

C++: I don't know. Objects have their own lifecycles.

adapt from: "Your characters also have their own lives." -by anonymous genshin impact fans
original: 《角色也有自己的生活》

c++ object, contructor, deconstructor, variable scope and lifecycles.

Some Exploration

Let's start with a simple example. What is wrong with this piece of code?

void some_external_function(const char * cstr);

int main() {
    const char * cstr = std::string("Hello, World!").c_str();
    some_external_function(cstr);
    std::string secret_data = "secret data";
    std::cout << secret_data << std::endl;
    return 0;
}

It seems like it compiles and runs correctly. Everything works properly!

asd

Or does it?

If we add another piece of code and print out the pointer, it is supposed to contain "Hello world."

void some_external_function(const char * cstr);

int main() {
    const char * cstr = std::string( "Hello, World!").c_str();
    some_external_function(cstr);
    std::string secret_data = "secret data";
    std::cout << secret_data << std::endl;
    std::cout << cstr << std::endl;
    return 0;
}

Instead of printing "Hello world," it actually prints out "secret data."

Furthermore, if you are using a library built by malicious individuals, they can actually steal and modify your secrets.

#include <iostream>
#include "cstring"

static char * heheh = nullptr;

void some_external_function(const char * cstr) {
    printf("external function called with '%s'\n", cstr);
    heheh = (char *)cstr;
}

void some_evil_people() {
    printf("evil people got your secret '%s'\n", heheh);
    heheh[0] = 'H';
    heheh[1] = 'e';
    heheh[2] = 'h';
    heheh[3] = 'e';
    heheh[4] = 'h';
    heheh[5] = 'e';
}

void some_external_function(const char * cstr);

int main() {
    const char * cstr = std::string( "Hello, World!").c_str();
    some_external_function(cstr);
    std::string secret_data = "secret data";
    some_evil_people();
    std::cout << secret_data << std::endl;
    return 0;
}

What's going on here?!?!

This is all because of C++'s lifecycle.

Let's go throught object's lifecycle

Before talking about an object's lifecycle, let's first find out where all my object data is.

So, as we all know, people on the internet always say the stack stores the local variables and stack frames, while the heap stores all malloced data.

credit https://www.geeksforgeeks.org/memory-layout-of-c-program/

Is it the same for C++ with all those complicated objects and class stuff? (Because objects are essentially structs with magic pointers and values.)

Not exactly.

Let's take a look at this program:

int main() {
    init_heap_address();
    init_stack_address();
    std::string s1 = "Hello! World!";
    std::string s2 = "Heeeeeeeeeeeeeeeeeeeeeeeeeeeeeeello, World!";
    print_address_location("s1", (void *)&s1);
    print_address_location("s1 cstr", (void *)s1.data());
    print_address_location("s2", (void *)&s2);
    print_address_location("s2 cstr", (void *)s2.data());
    return 0;
}

We can see that although the object itself is indeed on the stack (we can see both s1 and s2 addresses are on the stack), its internal data can be somewhere else, either in the stack (s1.data) or in the heap (s2.data).

What's going on here? How can an object have its memory sometimes in the heap and sometimes in the stack?

This is because std::string is an object, and objects have constructors.

Like all other OOP languages, a constructor is called when an object is created.

Inside the constructor, a programmer can do whatever they want, like malloc a new memory or perform some calculations.

In the GCC implementation of std::string, there are two implementations of std::string. One is called a short string, and the other is called a long string.

Depending on the string length, if it's less than 23 characters, it allocates memory on the stack. Otherwise, it allocates new memory in the heap and puts the actual string content in the heap.

To verify this, we can add an allocation tracker to our program.

void * operator new(size_t size) {
    void * mem = malloc(size);
    printf("[Allocation Tracker] allocating %ld bytes at %p\n", size,mem);
    return mem;
}

void operator delete(void * mem) {
    printf("[Allocation Tracker] free memory from %p", mem);
    free(mem);
}

int main() {
    init_heap_address();
    init_stack_address();
    puts("init s1 string");
    std::string s1 = "Hello! World!";
    puts("init s2 string");
    std::string s2 = "Heeeeeeeeeeeeeeeeeeeeeeeeeeeeeeello, World!";
    print_address_location("s1", (void *)&s1);
    print_address_location("s1 cstr", (void *)s1.data());
    print_address_location("s2", (void *)&s2);
    print_address_location("s2 cstr", (void *)s2.data());
    return 0;
}

Run the program again, and we can see the string object malloc and free new memory behind the scenes when the string length is long.

asdf

But where does the program free the memory?

Here comes the destructor.

Stack Objects

For objects inside the stack, the destructor will be called "automatically" when the variable goes out of scope. (There are also other cases where the destructor will be called!!!)

Let's see some examples.

Let's start with a class that tracks object construction and destruction.

class Nanami{
    std::string name;
public:
    Nanami(const std::string& name) {
        this->name = name;
        std::cout << name <<": owu I got created" << std::endl;
    }
    ~Nanami() {
        std::cout << name <<": owo I got destroyed" << std::endl;
    }
    int lol() {
       return 1+1;
    }
};

For local variables, objects will be constructed when they are initialized and destructed when the function returns. Therefore, all allocated memory inside the object will be freed in the end, just like normal stack variables.

asd

For objects inside a block, they act like local variables. When the program goes out of the block, the object calls the destructor and frees all the memory.

For xvalue scope, it calls the destructor when the statement finishes.

However, things are very different when you allocate objects on the heap.

Heap Objects

For objects on the heap, their lifecycle is fully controlled by the programmer. This means the object is constructed when the programmer uses new and destructed when the programmer uses delete.

The destructor only runs when delete is invoked.

So normally, if you create a new object and then delete it at the end, everything is fine.

void good_procedure() {
    Nanami * mynanami = new Nanami("mynanami");
    delete mynanami;
}

But if your object goes out of scope and you no longer have access to it, it creates a dangling pointer and causes a memory leak in the program.

void block_memory_leak() {
    {
        Nanami * onheap = new Nanami("leak inside block");
    }
}

Conclusion

In conclusion:

Local (stack) objects, which are allocated within a function on the stack, have their lifetimes managed by the compiler. This means their destruction time is determined: when the program completes a certain code scope.
Heap objects, generally allocated via new, have lifetimes that differ from stack objects. Their lifetimes are fully controlled by the programmer, meaning manual control of the heap object's lifetime is required. Their destruction occurs when the corresponding delete is called.

there is also another case I didn't mention, its object inside objects. Maybe I will cover it later. but for now, just stick with those two concepts.

Reference

Source Code

check_location.cpp

#include <iostream>
#include "cstring"


void print_memory_mapping() {
    FILE * f = fopen("/proc/self/maps", "r");
    if (f == nullptr) {
        printf("Failed to open /proc/self/maps\n");
        return;
    }
    char buffer[1024];
    while (fgets(buffer, 1024, f)) {
        printf("%s", buffer);
    }
    fclose(f);
}

void get_self_address(const char * name, void **addr_start, void **addr_end) {
    FILE *maps_file = fopen("/proc/self/maps", "r");
    if (!maps_file) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }

    char line[256];
    while (fgets(line, sizeof(line), maps_file)) {
        if (strstr(line, name)) {
            // The line contains the stack segment
            sscanf(line, "%p-%p", addr_start, addr_end);
            break;
        }
    }
    fclose(maps_file);
}

static void *stack_start, *stack_end;
void init_stack_address() {
    get_self_address("[stack]", &stack_start, &stack_end);
    printf("Stack segment: %p-%p\n", stack_start, stack_end);
}
bool is_stack_address(void *addr) {
    return addr >= stack_start && addr <= stack_end;
}

static void *heap_start, *heap_end;
void init_heap_address() {
    get_self_address("[heap]", &heap_start, &heap_end);
    printf("Heap segment: %p-%p\n", heap_start, heap_end);
}
bool is_heap_address(void *addr) {
    return addr >= heap_start && addr <= heap_end;
}

void print_address_location(const char * name, void *addr) {
    if (is_stack_address(addr)) {
        printf("%s (%p) is in [stack]\n",  name, addr);
    } else if (is_heap_address(addr)) {
        printf("%s (%p) is in [heap]\n", name, addr);
    } else {
        printf("%s (%p) is in other segment\n", name, addr);
    }
}


void some_external_function(const char * cstr);


//void * operator new(size_t size) {
//    void * mem = malloc(size);
//    printf("[Allocation Tracker] allocating %ld bytes at %p\n", size,mem);
//    return mem;
//}
//
//void operator delete(void * mem) {
//    printf("[Allocation Tracker] free memory from %p\n", mem);
//    free(mem);
//}


int main() {
    init_heap_address();
    init_stack_address();
    puts("init s1 string");
    std::string s1 = "Hello! World!";
    puts("init s2 string");
    std::string s2 = "Heeeeeeeeeeeeeeeeeeeeeeeeeeeeeeello, World!";
    print_address_location("s1", (void *)&s1);
    print_address_location("s1 cstr", (void *)s1.data());
    print_address_location("s2", (void *)&s2);
    print_address_location("s2 cstr", (void *)s2.data());
    return 0;
}

obj_on_heap.cpp

#include <iostream>

class Nanami{
    std::string name;
    void * ptr;
public:
    Nanami(const std::string & name) {
        this->name = name;
        std::cout << this->name <<": owu I got created" << std::endl;
    }
    ~Nanami() {
        std::cout << this->name <<": owo I got destroyed" << std::endl;
    }
    int lol() {
        return 1+1;
    }
};


void block_memory_leak() {
    {
        Nanami * onheap = new Nanami("leak inside block");
    }
}

void good_procedure() {
    Nanami * mynanami= new Nanami("mynanami");
    delete mynanami;
}


int main() {
    good_procedure();
    block_memory_leak();
    return 0;
}

obj_on_stack.cpp

#include <iostream>

class Nanami{
    std::string name;
    void * ptr;
public:
    Nanami(const std::string & name) {
        this->name = name;
        std::cout << this->name <<": owu I got created" << std::endl;
    }
    ~Nanami() {
        std::cout << this->name <<": owo I got destroyed" << std::endl;
    }
    int lol() {
       return 1+1;
    }
};

void local_nanami_() {
    Nanami lcl("local scope");
    lcl.lol();
    puts("before exit local");
}

void local_nanami() {
    puts(">>>>> before local");
    local_nanami_();
    puts("<<<<< after exit local");
}

void block_nanami() {
    puts(">>>>> before block");
    {
        Nanami bls("block scope");
        bls.lol();
        puts("before exit block");
    }
    puts("<<<<< after exit block");
}

void temporary_nanami() {
    puts(">>>>> before xvalue scope");
    int val = Nanami("xvalue scope").lol();
    puts("<<<<< after xvalue scope");
}

// todo: wtf????
void reassign_nanami() {
    puts(">>>>> reconstruct_nanami");
    Nanami nanami("first initialization");
    nanami = Nanami("second initialization");
    nanami.lol();
    puts("<<<<< reconstruct_nanami");
}

void block_memory_leak() {
    {
        Nanami * onheap = new Nanami("leak inside bloc");
    }

}

Nanami glb("Global scope");


int main() {
    puts("enter main");


    local_nanami();
    block_nanami();
    temporary_nanami();
//    reconstruct_nanami();

    block_memory_leak();

    puts("before exit main");
    return 0;
}

Allocation Tracker Snippet

void * operator new(size_t size) {
    void * mem = malloc(size);
    printf("[Allocation Tracker] allocating %ld bytes at %p\n", size,mem);
    return mem;
}

void operator delete(void * mem) {
    printf("[Allocation Tracker] free memory from %p", mem);
    free(mem);
}