Serialized Objects- UDataRecord

Prerequisites

The UObject Base Class
Variants

Article

Now that we've implemented a variant class that lets us store many data types in one object, we can implement serialization of objects in a straightforward and flexible way.

The Serializable class will be fairly simple, but let's talk about some design decisions.

Design Decision #1

How structured should we make our serialized data types? I've gone back and forth on two approaches to design:

You predefine the fields and their types in a schema. If you've ever done any web development, most ORM (Object Relation Mapping) systems make the user pre-define a schema or class type for serialization.
You simply use what is defined at run time.

Option 1 has a few advantages: when you implement your game editor, an object will have a pre-defined set of fields you can collect; also, your implementations that serialize and deserialize objects have some knowledge about what will be included. The biggest disadvantage of course is that you need to maintain some sort of definitions somewhere, which means more code and two or more places that need to be updated to change the schema. #1 also has the advantage that you can have the system handle schema changes and upgrades.

Option 2 similarly has an advantage in that you only write and maintain code once, but the disadvantage in that you have no preexisting knowledge of the structure of an object without creating one.

I have primarily come to use approach #2. Given we will implement Serialize and Deserialize functions in our objects, there should be no difference between returned values and schemas unless we ourselves start adding conditionals (if this then set this value).

In all of the cases where we might want some preexisting knowledge of an object's schema, we generally have an object to query. Once you get into things like only serializing the deltas or fields that have changed, a predefined schema starts to lose a lot of value. Finally, you typically need some sort of upgrade script for schema changes, to set reasonable defaults or migrate data. In summary, our approach to serialization into a generic container class should give us enough insight and reflection information without needing to pre-define and maintain a schema.

Design decision #2

Will we store "nested" objects, or will we only store plain old data types? For example, when storing a MeshComponent, we'll have (at least) a mesh file and a Transform (offset of the MeshComponent from an Entity's root Transform). We can store this in two different ways:

A link to an external Resource object and an external Transform object that contain the actual plain old data values (likely filename and vectors respectively). Our MeshComponent is now just two lookup or foreign key values. This approach would be akin to normalizing a database.
"Flatten" our object, and store the plain old data types directly in the MeshComponent object or table. For example, our MeshComponent is now a string with a filename, and a set of vectors and quaternions to re-create the Transform.

I have tried both systems, and found that #1 quickly leads me down a rabbit hole of nested objects; this makes it difficult to visualize and debug an object by looking at its serialized values, and creates a huge set of tables or objects, increasing complexity and time to load dependencies. Again, I will focus on approach #2 here and assume each object is as "flat" as possible and not optimize for references to other objects.

The DataRecord class

Now that we've made our design decisions, we can create a class that holds all of our serialized values. Since this class contains a data representation of our object, we will call this class a DataRecord:

    class UDataRecord : public UObject {
    public:
        UDataRecord();
        UDataRecord(UDataRecord& rhs);
        ~UDataRecord();

        /**
         * Overloaded operators for member access
         */
        UVariant Get(std::string key);
        void Set(std::string key, UVariant value);

        /**
         * Whether the record has been updated
         */
        bool Updated() { return m_updated; }

        /**
         * Get all properties
         */
        std::vector<std::string> GetKeys();

        /**
         * Serialize/deserialize as binary
         */
        void Serialize(unsigned char* bytes, unsigned int& length);
        void Deserialize(unsigned char* bytes, unsigned int length);

    public:
        // Integer ID
        uint32_t id;

        // Mark as delete(d)
        bool deleted;

    protected:
        // The values serialized into the record
        std::map<std::string, UVariant> m_variants;

        // Record has been updated
        bool m_updated;
    }

This class has a couple of interesting notes: first, using our variant class makes using the Get and Set functions easy - for example, you can write things such as:

    record->Set("mesh", "meshfile.fbx");
    record->Set("position", glm::vec3(0, 0, 0));

    std::string filename = record->Get("mesh").AsString();
    glm::vec3 position = record->Get("position").AsVector3();

Easy to write and easy to read!

We also keep track of an ID. This field is useful for storing objects in a database where a numeric primary key exists. I'm a big SQL database fan (SQLite, MySQL, Postgres, SQL Server). If we're using JSON or XML or another disk file, this can be set to zero and unused in that serializer.

Let's take a look at the Get and Set functions:

UVariant UDataRecord::Get(std::string key) {
    auto it = m_variants.find(key);
    if (it != m_variants.end()) {
        return(it->second);
    }

    UASSERT(false, "Key not found.");
}

void UDataRecord::Set(std::string key, UVariant value) {
    m_updated = true;
    m_variants[key] = value;
}

As you might expect, the Get and Set functions search through and set key / value pairs in our m_variants std::map respectively.

We also keep track of whether the object has been updated or not. We simply set the m_updated flag to true whenever the Set function is called, since including it in a network packet or a save file can be done in a fairly cheap way. However, if bandwidth was a concern, you could extend this to check against the existing value of a variant and only set it to true when the value actually changes.

We also have a GetKeys function that returns the list of keys present in our DataRecord:

std::vector<std::string> UDataRecord::GetKeys() {
    std::vector<std::string> keys;
    auto it = m_variants.begin();
    for (; it != m_variants.end(); it++) {
        keys.push_back(it->first);
    }

    return(keys);
}

At least at the time of writing this article, there is no out of the box way to get keys or values from an std::map into an std::vector. There are a lot of creative answers on the web, but ultimately, you're going to need to copy the keys or value into an std::vector and return it.

Finally, our serialized object has Serialize and Deserialize functions. But wait, isn't this an already serialized version of an object? These functions will read and write DataRecords to binary data. The primary use case here will be transmission over the network, but you could also save a lot of storage space by rolling your own binary save files if storage and disk space was a concern.

Let's start with the Serialize function:

void UDataRecord::Serialize(unsigned char* bytes, unsigned int& length) {
    /**
     * Format:
     *  - number of fields - unsigned char
     *  for each field:
     *    - length of field name (zero byte included) - unsigned char
     *    - field name - unsigned char * size
     *    - field type - unsigned char
     *    - data size - uint32_t
     *    - field data - variable
     */

    // Calculate size
    int binarySize = sizeof(uint8_t);           // Number of fields
    auto it = m_variants.begin();
    for (; it != m_variants.end(); it++) {
        binarySize += sizeof(uint8_t);          // Length of field name
        binarySize += it->first.length();       // Field name (w/ null byte)
        binarySize += sizeof(uint8_t);          // Field type
        binarySize += sizeof(uint32_t);         // Data size
        binarySize += it->second.Size();        // Field data
    }

    UASSERT(binarySize < length, "Buffer overflow.");
    int offset = 0;

    uint8_t numberOfFields = (uint8_t)m_variants.size();
    memcpy(bytes + offset, &numberOfFields, sizeof(uint8_t));
    offset += sizeof(uint8_t);

    it = m_variants.begin();
    for (; it != m_variants.end(); it++) {
        uint8_t nameLength = (uint8_t)it->first.length() + 1;
        memcpy(bytes + offset, &nameLength, sizeof(uint8_t));
        offset += sizeof(uint8_t);

        memcpy(bytes + offset, it->first.c_str(), nameLength);
        offset += nameLength;

        uint8_t type = (uint8_t)it->second.Type();
        memcpy(bytes + offset, &type, sizeof(uint8_t));
        offset += sizeof(uint8_t);

        uint32_t size = it->second.Size();
        memcpy(bytes + offset, &size, sizeof(uint32_t));
        offset += sizeof(uint32_t);

        memcpy(bytes + offset, it->second.GetPtr(), size);
        offset += size;
    }

    length = offset;
}

At the top, you can see our binary format: we start with the total number of fields, stored as an 8-bit integer, or unsigned char. For each field, we'll store the length of the field name (including leading zero) in an 8-bit integer; the field name itself (with leading zero); the UVariant type (as another 8-bit int); the size of the data as a 32-bit integer; and finally the data itself. Later, we'll use the UVariant::FromBytes function to read this back in.

The first loop calculates the total size of the binary data in bytes. It then checks the provided buffer to ensure it can hold all of the data and writes in the first data point: the number of fields. We use a local variable here called offset to help us keep track of how many bytes we've written so far.

The second loop writes the data structure from above into the buffer for each field. Finally, we set the length to the actual number of written bytes.

Since we know we're going to use this function primarily in the networking system, we do not want to be constantly allocating a new buffer. We'll ask the user to provide one - for example, the networking system can initialize one big buffer of 100K or something on initialization and then pass it in for us to write to.

Let's look at Deserialize:

void UDataRecord::Deserialize(unsigned char* bytes, unsigned int length) {
    int offset = 0;

    // Read in number of fields
    uint8_t numFields = 0;
    memcpy(&numFields, bytes + offset, sizeof(uint8_t));
    offset += sizeof(uint8_t);

    for (int i = 0; i < numFields; i++) {
        // Read field name length
        uint8_t nameLength = 0;
        memcpy(&nameLength, bytes + offset, sizeof(uint8_t));
        offset += sizeof(uint8_t);

        // Read field name
        unsigned char* fieldName = bytes + offset;
        std::string name = (char*)fieldName;
        name.resize(nameLength);
        offset += nameLength;

        // Read field type
        uint8_t type = 0;
        memcpy(&type, bytes + offset, sizeof(uint8_t));
        offset += sizeof(uint8_t);

        // Read data size
        uint32_t size = 0;
        memcpy(&size, bytes + offset, sizeof(uint32_t));
        offset += sizeof(uint32_t);

        // Read field data
        UVariant v;
        v.FromBytes(bytes + offset, type, size);
        offset += size;

        m_variants[name] = v;
    }
}

Deserialize does exactly the opposite, reading in the total number of fields, and then for each field, reading in the field name length, field name, UVariant type, data size, and data. Finally, it calls the UVariant::FromBytes function to set the value of the field.

A Serializable base class

Let's tie this back to our engine by implementing another base class for all serializable objects to inherit from:

    class USerializable {
    public:
        USerializable() = default;
        virtual ~USerializable() = default;

        virtual void Serialize(UDataRecord* record) = 0;
        virtual void Deserialize(UDataRecord* record) = 0;
    };

That's it. Now all objects that need to be serialized for network transmission or to be saved to the disk can inherit from USerializable and implement the Serialize and Deserialize functions. Those functions will store into or pull data from a DataRecord object respectively, as seen in our example above.

In the next article, we'll put this into practice by writing a serialization system in SQLite.

Variants - Generic Containers Serialization with SQLite