Serialization with SQLite

Prerequisites

The UObject Base Class
Variants
Serializable Objects

Article

We have a solid foundation in place: all objects in our game are derived from and can be passed around as UObjects, we have a generic container class (Variant) that can hold UObjects or other plain old data types (plus a few game-specific extras like vectors and strings), and we have a way to serialize those objects into a data-based key / value store, called a DataRecord. Now it's time to save our game objects into a file or database.

We'll start by looking at SQLite, which I love as a local resource file format. It's not quite as easy to view and edit as JSON (but pretty close with tools like DB Browser), but it makes up for it with built in indexes and fast searching. Also, as we'll see, if you've implemented SQLite for local storage, it's really easy to port that over to MySQL (or other SQL systems) for remote and networked storage.

All of these systems need to do two basic functions: load records and save records. We can define a base class that all other data loading systems can inherit from and then our applications can be backend agnostic - it won't matter if the storage is JSON, XML, a database, or more. Let's call our base class a DataLoader:

    class UDataLoader : public USystem {
    public:
        UDataLoader() = default;
        virtual ~UDataLoader() = default;

        /**
         * Open / close this data source
         */
        virtual bool Open(std::string location) = 0;
        virtual void Close() = 0;

        /**
         * Get all objects of a specific type
         */
        virtual std::vector<UDataRecord*> GetRecords(
            std::string type, 
            std::map<std::string, UVariant> search = std::map<std::string, UVariant>()
        ) = 0;

        /**
         * Get one object of a specific type and ID
         */
        virtual UDataRecord* GetRecord(std::string type, uint32_t id) = 0;

        /**
         * Save records
         */
        virtual void SaveRecords(std::string type, std::vector<UDataRecord*> records) = 0;

    protected:
        // Location that is opened
        std::string m_location;
    };

You can see that our data loader inherits from our USystem class - if you haven't read the article on ECS design, that's okay, it's not a prerequisite here. We inherit from this class because the systems in our game will use a Locator pattern - we have a global function called GetUSystem<>() that lets us find single instances of our systems from anywhere in the engine. We'll leverage that pattern with our DataLoader class as well.

Next, you can see the Open and Close functions - these will open and close the file or connection. If it's a local file, this will just be a filename; if it's a network connection, this will be a URL or connection string.

Finally, we have our 3 primary functions: GetRecords, GetRecord, and SaveRecords:

GetRecords is the most interesting, and takes a type and some search criteria as an std::map of strings and returns a vector of DataRecord objects. In our SQLite and MySQL implementations, these will become a part of the WHERE clause in our SQL string; if this was a local binary or JSON file, you would need to filter your returned objects in memory.
GetRecord returns a single DataRecord, given a type and a numeric ID. This is useful if you choose to normalize your objects and / or database and need to load dependencies (eg. a MeshComponent has a Resource file (for the Mesh) that could be loaded from a separate Resource table).
SaveRecords takes a type and set of already serialized DataRecord objects to be saved into our storage system.

You'll see that all of these functions take and return DataRecords. In the past, I have played with making these return objects that are already deserialized from their data representations. However, that requires a significant amount of reflection or "registration" of types to constructors in the engine, which leads to many more questions: Where do you put the registration functions? In each logical area? In a single central file? Does that make the IO / serialization library dependent on reflection? How does it integrate with scripting libraries? Most importantly, I have found many cases where I only need the data, and not an actual object. We'll leave deserialization to the part of the application that is requesting the records. More discussion on these approaches can be found at the end of the article.

Let's put this into practice by looking at a real implementation using SQLite.

SQLite Serialization

SQLite describes itself as "a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine". For all of those reasons and more, it is the most widely deployed database system in the world. If you've ever stored data in an Android or iOS application, it was probably to a SQLite database. It is used by Windows 10 as a core component, by the Chrome and Firefox browsers, and by tens of thousands of companies to store local file data.

I greatly enjoy using a SQLite database as a local storage method. It creates a single, portable file that can be dropped into a directory and have all of your game content in it. It can even be used to bundle and store resources, though there are better formats (with compression) out there for those. The one downside (compared to a text format like JSON or XML) is that version control can be difficult and you mostly end up overwriting data. However, for a solo developer, or a set of developers working across different scenes, SQLite is a great choice.

Let's look at our SQLiteDataLoader class:

    class USQLiteDataLoader : public UDataLoader {
    public:
        USQLiteDataLoader() = default;
        virtual ~USQLiteDataLoader() = default;

        /**
         * Open / close this data source
         */
        bool Open(std::string location);
        void Close();

        /**
         * Get all objects of a specific type
         */
        std::vector<UDataRecord*> GetRecords(
            std::string type, 
            std::map<std::string, UVariant> search = std::map<std::string, UVariant>()
        );

        /**
         * Get one object of a specific type and ID
         */
        UDataRecord* GetRecord(std::string type, uint32_t id);

        /**
         * Save records
         */
        void SaveRecords(std::string type, std::vector<UDataRecord*> records);

    protected:
        static int InternalDataCallback(void* instance, int count, char** data, char** cols);

    protected:
        // sqlite handle
        sqlite3* m_handle;

        // Record list
        std::vector<UDataRecord*> m_currentRecords;

        // Current table name
        std::string m_tmpTableName;
    };

Not too much new here from our DataLoader base class. Let's discuss what is new:

We have a new function: InternalDataCallback. The SQLite library will call this once for each row loaded. Unfortunately, once you're in the function, you just have a row or record of data - not even the name of the table you're loading from.
We've added two variables to our class to accommodate this approach: m_currentRecords will store all of the DataRecords we've loaded, and m_tmpTableName will store the name of the table or type we're loading from.
Finally, we have a sqlite3 pointer, which is the internal file handle for the sqlite library.

Everything else is inherited from our base class.

Let's look at how to open and close a SQLite file:

    bool USQLiteDataLoader::Open(std::string location) {
        // Save location
        m_location = location;

        // Open SQLite file
        int result = sqlite3_open(location.c_str(), &m_handle);
        if (result != 0) {
            UASSERT(false, "Unable to open SQLite location.");

            sqlite3_close(m_handle);
            m_handle = 0;

            return(false);
        }

        return(true);
    }

    void USQLiteDataLoader::Close() {
        if (m_handle) {
            sqlite3_close(m_handle);
            m_handle = 0;
        }
    }

These are fairly straightforward functions and mostly window dressing around the sqlite3_open and sqlite3_close functions, which simply take a string file path and an internal handle respectively.

Easy right? What's not to like? Well, one small thing: SQLite has less types than other databases. And the easiest interface, using sqlite_exec, returns the data in all columns as strings. You could check the type of each column and convert it, but our Variant class actually does that for us using the FromString function.

That means the easiest approach is for us to store all of our values as strings as well. And to make it easier for us to convert, we can store the Variant::Type identifier alongside the data itself. We'll store all values in our SQLite database as "type:value".

This approach can always be optimized later if it becomes a bottleneck. For now, it makes it very simple for us to write a way to query for records:

    std::vector<UDataRecord*> USQLiteDataLoader::GetRecords(std::string type, std::map<std::string, UVariant> search) {
        // Start by constructing a query
        std::string query = "SELECT * FROM " + type;

        // Add where clauses
        std::string primaryKey = type + "_id";
        if (search.size()) {
            query += " WHERE 1=1 ";
            std::map<std::string, UVariant>::iterator it = search.begin();
            for (; it != search.end(); it++) {
                if(it->first == primaryKey)
                    query += " AND " + it->first + " = '" + it->second.ToString() + "'";
                else
                    query += " AND " + it->first + " = '" + std::to_string(it->second.Type()) + ":" + it->second.ToString() + "'";
            }
        }

        // Execute
        m_tmpTableName = type;
        m_currentRecords.clear();
        if (sqlite3_exec(m_handle, query.c_str(), InternalDataCallback, this, 0) != 0) {
            UASSERT(false, "Unable to get record list from SQLite:" + (char*)sqlite3_errmsg(m_handle));
        }

        return(m_currentRecords);
    }

In the GetRecords function, we construct our query, starting with all rows and columns in a table, and then adding a WHERE clause if we have search filters. You'll see this starts with "WHERE 1=1". This is an old trick that evaluates to true for all records and allows us to treat all other clauses as AND statements. Since we take a map of key/value pairs as Variants, we convert them to the format we'll store data in (more on this below), except in the case of the primary key, which we just use as an (integer) value. Finally, we store our table name, clear our current record list, and send it off to the SQLite library, which will call our InternalDataCallback function:

    int USQLiteDataLoader::InternalDataCallback(void* instance, int count, char** data, char** cols) {
        // Create a new data record
        UDataRecord* record = new UDataRecord();

        // Access our SQLite class
        USQLiteDataLoader* loader = (USQLiteDataLoader*)instance;

        // Set our primary key field (since we control table structure, always [type]_id
        std::string primaryKey = loader->m_tmpTableName + "_id";
        for (int i = 0; i < count; i++) {
            // If this is an empty pointer, skip it
            if (data[i] == 0) continue;

            // Special case for primary key
            if (strcmp(cols[i], primaryKey.c_str()) == 0) {
                // Set the ID of the DataRecord
                record->ID((uint32_t)atoi(data[i]));
                continue;
            }

            // Read type and value (always stored as type:value)
            std::string fieldValue = data[i];
            int type = atoi(fieldValue.substr(0, fieldValue.find(":")).c_str());
            std::string value = fieldValue.substr(fieldValue.find(":") + 1);

            // Column name
            std::string col = cols[i];

            // Handle object case (store DataRecord)
            if (type == UVariant::Type::VAR_OBJECT) {
                // Split the type and unique ID
                std::string objectType = value.substr(0, value.find(":"));
                uint32_t uid = atoi(value.substr(value.find(":") + 1).c_str());

                // Case where it points to an empty object
                if (uid == 0) {
                    record->Set(col, (UDataRecord*)0);
                }
                else {
                    // If this is a nested object, go fetch the actual object as a DataRecord
                    UDataRecord* subrecord = loader->GetRecord(objectType, uid);
                    record->Set(col, subrecord);
                }

                continue;
            }

            // For all other types, create a new variant
            UVariant v;
            v.FromString(value, type);

            record->Set(col, v);
        }

        // Add to our list
        loader->m_currentRecords.push_back(record);
        return(0);
    }

This function will be called once per returned record, so it starts by creating a new DataRecord and casting our user pointer back to the instance of our USQLiteDataLoader class. Next it checks if this column is the primary key, which is always (in this system) className_id. If it is the primary key, it sets the ID of the DataRecord. This ID will be used in the SaveRecords statement later to do updates instead of inserts.

Next we see the format of our values. Because our Variant class has a FromString() function which takes a data type and a string, we store all values with a format of "type:value". For example "8:(0, 1, 0)" will be split into "8", giving us the type name for our Variant (vector3 is 8 in our Type enum in the Variant class), and "(0, 1, 0)", the value to be parsed. Likewise, an integer value would be stored as "1:123" - "1" for the type (int32) and "123" for the value.

The only exception to that is our object type. Even though we made the design decision to store everything as flat, we might change our minds later (this code is from a time when I was doing it in a more normalized way). If it is an object, we will store it in another table, and this table's column value will take the form "12:ClassName:key", with the ClassName being equal to the table name and the key being the primary key value of the record in the other table.

Of course, this implementation of fetching sub-objects leaves a lot of room for improvement. If you are fetching many objects with many sub-objects, you probably just want to keep a running list of all of the table names and IDs and load them all at the end. You could also lazy load them, but a game or game engine generally wants to load as much data from disk at one time as possible, since it's very expensive and you likely need the data to make the object complete.

Finally, we call our Variant::FromString() function and call the Set function on our DataRecord. After processing all columns, we add our DataRecord to our result set.

The GetRecords function gives us the ability to load records, possibly recursively, possibly containing a set of lookup values, from a SQLite database. Next, let's take a look at the fairly meaty effort of saving records:

    void USQLiteDataLoader::SaveRecords(std::string type, std::vector<UDataRecord*> records) {
        // Make sure we have records in the list
        if (records.size() == 0) return;

        // Make sure the table exists and spec matches
        std::string query = "CREATE TABLE IF NOT EXISTS ";
        query += type + " (";

        // Add a primary key
        query += type + "_id INTEGER";

        // Use the first record as a model
        UDataRecord* record = records[0];
        std::vector<std::string> keys = record->GetKeys();
        for (auto fi = keys.begin(); fi != keys.end(); fi++) {
            // Add all columns here as TEXT type
            query += "," + (*fi) + " TEXT";
        }

        // Specify primary key
        query += ", PRIMARY KEY(" + type + "_id ASC))";

        if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
            UASSERT(false, "Unable to create SQLite table.");
            return;
        }

        // Look for columns that may not exist (but the table does)
        query = "PRAGMA table_info(" + type + ")";
        if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
            UASSERT(false, "Unable to get SQLite schema.");
            return;
        }

        // Put the column names into a vector
        sqlite3_stmt* stmt = 0;
        std::vector<std::string> existingCols;
        int result = sqlite3_prepare(m_handle, query.c_str(), (int)query.length() + 1, &stmt, 0);
        while ((result = sqlite3_step(stmt))) {
            if (result != SQLITE_ROW) {
                break;
            }

            std::string colname = (const char*)sqlite3_column_text(stmt, 1);
            existingCols.push_back(colname);
        }

        // Add any we can't find
        auto fi = keys.begin();
        for (; fi != keys.end(); fi++) {
            auto it = std::find(existingCols.begin(), existingCols.end(), *fi);
            if (it == existingCols.end()) {
                query = "ALTER TABLE " + type + " ADD COLUMN " + (*fi) + " TEXT";

                if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
                    UASSERT(false, "Unable to alter SQLite DB.");
                    return;
                }
            }
        }

        // Iterate over each record
        auto it = records.begin();
        for (; it != records.end(); it++) {
            // First, check for new records (ID == 0)
            if ((*it)->ID() == 0) {
                std::string query = "INSERT INTO " + type + " (";
                std::string values = ") VALUES (";

                // Use the "data model" from the first record
                auto fi = keys.begin();
                for (; fi != keys.end(); fi++) {
                    UVariant v = (*it)->Get(*fi);
                    std::string value = std::to_string(v.Type()) + ":" + v.ToString();

                    if (v.IsObject()) {
                        // Serialize this object
                        UDataRecord record;
                        USerializable* serializable = v.AsObject<USerializable*>();
                        UObject* obj = v.AsObject();

                        // If this is both a serializable object and a UObject...
                        if (obj && serializable) {
                            // Serialize object
                            serializable->Serialize(&record);

                            // Get class name
                            std::string subtype = obj->UType().className;

                            // Save record
                            SaveRecords(subtype, { &record });
                            value = std::to_string(v.Type()) + ":" + subtype + ":" + std::to_string(record.ID());
                        }
                    }

                    query += "`" + *fi + "`,";
                    values += "'" + value + "',";
                }

                // Replace the last character with a closing bracket
                values = values.substr(0, values.length() - 1);

                // Complete query
                query = query.substr(0, query.length() - 1) + values;
                query += ")";

                if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
                    UASSERT(false, "Unable to insert object in SQLite DB.");
                }

                unsigned int newID = (unsigned int)sqlite3_last_insert_rowid(m_handle);
                if (newID == 0) {
                    UASSERT(false, "Unable to insert object in SQLite DB.");
                }

                (*it)->ID(newID);
                continue;
            }

            // We'll need the ID for the rest
            std::string primaryKey = std::to_string((*it)->ID());

            // Process deletes
            if ((*it)->Deleted()) {
                // Construct query
                std::string query = "DELETE FROM " + type + " WHERE " + type + "_id = " + primaryKey;

                if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
                    UASSERT(false, "Unable to delete object from SQLite.");
                }

                continue;
            }

            // Finally process updates
            std::string query = "UPDATE " + type + " SET ";
            auto f = keys.begin();
            for (; f != keys.end(); f++) {
                UVariant v = (*it)->Get(*f);
                std::string value = std::to_string(v.Type()) + ":" + v.ToString();

                // Handle object case
                if (v.IsObject()) {
                    // Serialize this object
                    UDataRecord record;
                    USerializable* serializable = v.AsObject<USerializable*>();
                    UObject* obj = v.AsObject();

                    // If this is both a serializable object and a UObject...
                    if (obj && serializable) {
                        // Serialize object
                        serializable->Serialize(&record);

                        // Get class name
                        std::string subtype = obj->UType().className;

                        // Save record
                        SaveRecords(subtype, { &record });
                        value = std::to_string(v.Type()) + ":" + subtype + ":" + std::to_string(record.ID());
                    }
                }

                if (f != keys.begin()) query += ",";
                query += "`" + *f + "` = '" + value + "'";
            }

            query += " WHERE " + type + "_id = " + primaryKey;

            if (sqlite3_exec(m_handle, query.c_str(), 0, 0, 0) != 0) {
                UASSERT(false, "Unable to update object in SQLite DB.");
            }
        }
    }

Let's walk through this function step-by-step... first, a quick check we have any records. This is important because we're going to assume uniformity across objects of the same type and use the first record in the set as a model for our table.

That brings us to our table creation... if this is the first time saving records, it will create a new table for us based on the schema of the first object in the list. We add a column called className_id as the primary key column, and then each key in the DataRecord is added as a TEXT column.

Next, we look for any columns that have been added to the DataRecord and need to be added to the table. This is handy, especially when dealing with scripted classes later on. If you've ever used Unity, you know you can just add a [SerializeField] tag to anything and now that value needs to be saved. We want to ensure our game developers can have the same flexibility.

Getting the existing column names calls a special SQLite query called PRAGMA table_info(table-name). This query gives us a result set with one row per column name. We store those into a vector of strings called existingCols and then in the next loop check each key in the DataRecord to see if it's in the existingCols vector. If not, we call ALTER TABLE and add it.

You'll notice deletes of column data are not handled here - the reasons for this are two-fold... first, deleting column data usually involves moving it somewhere else or transforming it or archiving it, all of which is better suited to a migration script. When we talk about deploying game updates, we'll talk about how to bundle and run those. The second reason is that when we create a deployment package from our editor later, we will create a "clean" version of our SQLite database - that is, it will only contain the columns and rows of data that are going to "ship". That makes the problem of having extra clutter or bloat in our database an "inside the box" problem (one that only exists for our engine users, not end game users). We can always create a function in our editor later to clean up the database.

Finally, we actually iterate over our DataRecords, processing them in 3 parts: new objects (inserts), objects marked for deletion, and existing objects (updates). We use the ID property of the object being set to determine if this is an insert or delete (primary key / ID == 0 tells us it is a new object). Both the insert and update sections look similar; we iterate over each key/value pair in the DataRecord and add them to our INSERT or UPDATE query respectively. For nested objects, we serialize those and also save them, then update the primary key reference and store it in the format discussed earlier ("12:className:primaryKey"). To get the className, we call the reflection function UType() on our UObject class, as detailed in the meta programming and reflection articles. We process any deleted objects in between inserts and updates so we don't waste time updating a record we're going to delete anyways.

Last, let's look at our GetRecord implementation, which just creates a special search parameter list and sends it to GetRecords:

    UDataRecord* USQLiteDataLoader::GetRecord(std::string type, uint32_t id) {
        // Form our search string
        std::map<std::string, UVariant> search;
        search[type + "_id"] = id;

        // Get records
        std::vector<UDataRecord*> records = this->GetRecords(type, search);

        // Return first record (if available)
        if (records.size()) {
            return(records[0]);
        }

        return(0);
    }

Data Ownership

As you have gone through these functions, you may have noticed a few potential issues. First, we rely heavily on the ID columns being populated to determine what type of record this is (new or existing); second, when serializing nested objects, we always create a new DataRecord object to serialize into, which means the ID will always be zero and we will constantly add new records and never update old references. Not ideal.

All of these issues come down to data ownership - who is responsible for the objects and the associated DataRecord relationships? I've gone a few different ways on this over the years and wanted to discuss the approaches and the pros and cons I've found so far:

UObject - DataRecord relationships are owned by the serializer (in this case the USQLiteDataLoader class). You could have an std::map inside of your serializer of UObject and UDataRecord pairs and for each UObject, look up whether it has an existing DataRecord, then use that instead of creating a new one. This approach has the advantage that any systems integrating with the serializer are now speaking in UObjects instead of DataRecords, which leaves DataRecord processing and handling generally inside the serialization system. Unfortunately this gets messy in a hurry... since the serializer now needs an awareness of objects, it makes sense to rewrite the GetRecords and SaveRecords functions to return and take lists of UObjects, instead of DataRecords. While fine for the SaveRecords function, that means the serializer is now creating objects in GetRecords. This approach becomes very difficult to integrate with object pooling. Consider the case of MeshInstance objects versus Resource objects... each MeshInstance can just be a new object, but if many objects share the same Resource, you likely want to check with a ResourceSystem to see if that Resource is already loaded.
You can add a DataRecord object to each USerializable. In this approach, the Serialize and Deserialize functions would simply update their own DataRecord object, which would be passed into or retrieved by the serialization system. I have had some success with this approach, especially where I have leaned into nested objects and normalization. However, like the first approach, you end up in a scenario where the serialization system needs to create objects in GetRecords in order to have awareness of the UObjects.
UObject - DataRecord relationships are maintained in the system that is requesting them. This is the approach we've taken here: our serializer only speaks in DataRecords and has no knowledge about what happens with them after they've been loaded or before they are saved - it is up to the editor, other systems, and game programmers to maintain a list. From a design perspective, this is the simplest approach. However, it leaves room for programmer error, bloat, and leaves us with the problem of nested objects.

In general, approaches #1 and #2 lead us down a sometimes fun but challenging rabbit hole of creating objects in our serialization systems. This can lean heavily on our reflection and meta systems to create objects, so it's not impossible to solve the object pooling issues. However, as seen in the example of MeshInstances vs. Resource objects, there are a number of edge cases that don't fit the mold. Speaking from experience, you'll soon start to run across enough of these edge cases that it's easier to leave the unpacking of the DataRecords to the system that is requesting them.

Approach #3 can be mitigated in our editor by having our save logic remove unused records where no reference exists - if we need to use nested objects at all. Our preference is to use a flat data model, which mitigates the problem itself.

Conclusion and Next Steps

We can now create a local SQLite database in which to store and retrieve our game objects, serialized into UDataRecords - key / value pairs using our flexible UVariant system.

Next we'll build a MySQL serialization system, which will allow us to read and write from a remote database. Of course, you can extend this model to write an XML or JSON serializer for other local file types, or even an API or socket-based serializer that facilitates multiple people editing and making saves to scenes at the same time. We'll also use this method of serializing and deserializing objects in our editor to help populate our properties panel.

Serialized Objects- UDataRecord Serialization with MySQL