MEMORY LAYOUT OF C++ OBJECT
Reading Time: 5 minutes

In PART 1 of All about virtual keyword C++ series, we have discussed virtual function. In this article, we will discuss “How virtual class works internally?”. But I am iterating the same thing which I have mentioned in the earlier article as well. Implementation of a virtual mechanism is purely compiler dependent. So, there is no C++ standard is defined for this. Here I am describing the general approach.

As usual, learning anything new we have to see “Why it needed?”

Why we need a virtual class?

  • When we use inheritance, we basically extending a derived class with base class functionality. In simple word, the base class object would be treated as sub-object in the derived class.
  • As a result, this would create a problem in multiple inheritances if base class sharing the same mutual class as sub-object in the top-level hierarchy and you want to access its property. I know this statement is a bit complex. So! let see an example.
class Top {public: int t; };
class Left : public Top {public: int l; };
class Right : public Top {public: int r; };
class Bottom : public Left, public Right {public: int b; };
  • The above class hierarchy/inheritance results in the “diamond” which looks like this:
    Top
   /   \
Left   Right
   \   /
   Bottom
  • An instance of Bottom will be made up of Left, which includes Top, and Right which also includes Top. So we have two sub-object of Top. This will create ambiguity as follows:
Bottom *bot = new Bottom;
bot->t = 5; // is this Left's t variable or Right's t variable ??
  • This was by far the simplest reason for the need of the virtual base class. And consider the following scenarios as an example:
Top   *t_ptr1 = new Left;
Top   *t_ptr2 = new Right; 

These both will work fine as Left or Right object memory layout has Top subobject. You can see the memory layout of the Bottom object for clear understanding.

|                      |
|----------------------|  <------ Bottom bot;   // Bottom object 
|    Left::Top::t      |
|----------------------|
|    Left::l           |
|----------------------|
|    Right::Top::t     |
|----------------------|
|    Right::r          |
|----------------------|
|    Bottom::b         |
|----------------------|
|                      |

Now, what happens when we upcast a Bottom pointer?

Left  *left = new Bottom;

This will work fine as Bottom object memory layout starts with Left subobject.
However, what happens when we upcast to Right?

Right  *right = new Bottom;

For this to work, we have to adjust the right pointer value to make it point to the corresponding section of the Bottom layout:

|                      |
|----------------------|
|    Left::Top::t      |
|----------------------|
|    Left::l           |
|----------------------|  <------ right;
|    Right::Top::t     |
|----------------------|
|    Right::r          |
|----------------------|
|    Bottom::b         |
|----------------------|
|                      |
|                      |

After this adjustment, we can access the Bottom through the right pointer as a normal Right object.
But, what would happen if we do

Top* Top = new Bottom;

This statement is ambiguous: the compiler will complain

error: `Top' is an ambiguous base of `Bottom'

Although you can use force typecasting as follows:

Top* topL = (Left*) Bottom;
Top* topR = (Right*) Bottom;
Solution
  • Virtual inheritance is there to solve this problem. When you specify virtual when inheriting your classes, you’re telling the compiler that you only want a single instance.
class Top {public: int t; };
class Left : virtual public Top {public: int l; };
class Right : virtual public Top {public: int r; };
class Bottom : public Left, public Right {public: int b; };
  • This means that there is only one “instance” of Top included in the hierarchy. Hence
Bottom *bot = new Bottom;
bot->t = 5; // no longer ambiguous
  • This may seem more obvious and simpler from a programmer’s point of view, from the compiler’s point of view, this is vastly more complicated.
  • But an interesting question is how this bot->t will be addressed & handle by the compiler? Ok, this is the time to move on next point.

How virtual class addressing works internally?

  • A class containing one or more virtual base class subobjects, such as Bottom, is divided into two regions: 1). invariant region 2). a shared region.
  • Data within the invariant region remains at a fixed offset(which decided in compilation step) from the start of the object regardless of subsequent derivations. So members within the invariant region can access directly. In our case, its Left & Right & Bottom.
  • The shared region represents the virtual base class subobjects whose location within the shared region fluctuates with an order of derivation & subsequent derivation. So members within the shared region need to accessed indirectly.
  • An invariant region placed at the start of objects memory layout and the shared region placed at the end.
  • The offset of these shared region objects updated in the virtual table. The code necessary for this augmented by the compiler in the constructor. See below image for reference.
|                        |          
|------------------------| <------ Bottom bot;   // Bottom object           
|    Left::l             |          
|------------------------|               |------------------| 
|    Left::_vptr_Left    |-------|       |  offset of Top   | // offset starts 
|------------------------|       |-------|------------------|       // from left subobject = 20
|    Right::r            |               |    ...           |
|------------------------|               |------------------|  
|    Right::_vptr_Right  |-------|        
|------------------------|       |         
|    Bottom::b           |       |       |------------------| 
|------------------------|       |       |  offset of Top   | // offset starts 
|    Top::t              |       |-------|------------------|       // from right subobject = 12                       
|------------------------|               |    ...           |                                    
|                        |               |------------------|                                            
|                        |  
  • Now come to our interesting question “How this bot->t will be addressed ?”
Bottom *bot = new Bottom;
bot->t = 5; // no longer ambiguous

Above code will probably be transformed into

Bottom *bot = new Bottom;
(bot + _vptr_Left[-1])->t = 5;

This is how virtual class works internally.

Handling of virtual function in the virtual base class

  • Handling of the virtual function in the virtual base class is the same as we have discussed in our previous article with multiple inheritances. There is nothing special about it.

This is it for the topics, but if you want to learn more about complications of using the virtual base class, then here it is.

Reference

Awesome
Awesome Interesting Useful Cool Boring Sucks
11