Is there a canonical way to handle JSON data format changes?

by piojo   Last Updated April 15, 2019 08:05 AM

Problem

Say we have a C# class with is serialized to JSON (currently serialized with Newtonsoft's JSON.Net) and stored in a database:

public class User
{
    public string authInfo;
}

If the class definition changes, the old data will fail to load. Even if we try to update the database by hand, we risk data being loaded incorrectly during the conversion.

public class User
{
    public string username;
    public string token;
}

Solution (my attempt)

We may use a callback which is run after deserialization that converts the old data to the new data format. The attribute and parameters need to be adapted based on which serialization framework is being used:

public class User
{
    public string username;
    public string token;
    [Obsolete] public string authInfo;

    [OnDeserialized]
    public void FixData()
    {
        if (username == null)
        {
            var parts = authInfo.Split("/");
            username = parts[0];
            token = parts[1];
            authInfo = null;
        }
    }
}

If a field's format needs to change from a list to an object (or number) or vice versa, the newer field should be called authInfo_2, and incremented when the type changes again. If a field's format needs to change from a list of one type to a list of another type, a new field must also be created.

public class User
{
    [Obsolete] public List<string> address;
    public List<AddressLine> address_2;
    // FixData() will convert from address to address_2
}

Problem: If null is a valid value for the old or new data, we can't determine whether the data has been migrated to the newer format. The following is a workaround that will track whether new data has been added:

public class User
{
    [Obsolete] public List<string> name; // serialized old data
    private string _familyName; // serialized
    private bool _isFamilyNameSet; // serialized
    public string familyName { get { return _familyName; } set { _familyName = value; _isFamilyNameSet = true; } } // not serialized
    // FixData() will convert from name to familyName
}

Question

This procedure is a bunch of rules I made up, and I've probably missed something important. Is there an accepted best practice that deals with versioning in serialized data? (Including a version number seems like it would lead to a lot of problems.)

Tags : serialization


Answers 1


I would avoid having the new class know about the old class.

If the class name changes you can have

OldRepository
{
    public List<OldUser> GetAll()
}

Converter
{
    public NewUser Convert(OldUser)
}

NewRepository
{
    public void Add(NewUser)
}

You can then convert the whole DB to the new format with a script, or do on the fly conversion without having a dependency on the old class in the new class.

Generally if you have to store serialised data in a DB like this, rather than splitting out the fields you should include some sort of data versioning, to allow you to know what version of the data is stored in a particular row.

Ewan
Ewan
April 15, 2019 07:42 AM

Related Questions


Updated May 03, 2015 21:02 PM

Updated May 03, 2015 23:02 PM

Updated July 20, 2018 21:05 PM

Updated September 25, 2018 21:05 PM

Updated July 25, 2015 17:02 PM