Lately I’ve been designing and writting a serialization system for the CUBOS. game engine. I had the following goals in mind:
- keep it simple: the goal isn’t to implement deep serialization, but to have a ‘predictable’ (de)serializer which would be easy to use but not too hard to develop.
- make it as generic as possible: serializing to raw binary data, to JSON or to YAML should be the same and require as little extra effort as possible.
- make it flexible: objects may require extra context to be (de)serialized, and providing this context should be easy.
- stream agnostic: the (de)serializer shouldn’t care about where the data is coming from/going to.
Making the (de)serializer stream agnostic meant that I needed an abstract stream class, which provided the interface required for reading and writing data. I considered using the C++ STL streams, but I wanted to implement my own streams later on (compressed streams for example) and the STL is too hard/obscure to extend. This plus the fact that implementing a tiny streams library seemed fun pushed me down the I’ll do it myself route.
So, I studied some stream libraries (including STL) and decided that the
class should provide the following abstract methods:
read(data, size): reads data from the stream, and returns the number of bytes actually read.
write(data, size): writes data to the stream, and returns the number of bytes actually written.
tell(): returns the current position in the stream.
seek(offset, origin): seeks to a position in the stream.
eof(): used to check if there’s no more data to read from the stream.
peek(): used to read the next byte of the stream without removing it.
Although these methods are technically sufficient and you can perfectly
implement a binary serializer with them, any kind of text processing required
in, for example, a JSON serializer, would be very painful to implement. To solve
this, I wrote some utility methods in the
Stream class such as
readUntil, which called the other abstract methods.
I also wrote two implementations of this
StdStream is just a wrapper around a
stdio.h, which allows me to write to files and, for example,
stdout, with my
BufferStream is used to write data to/read data from a buffer.
Now that I had streams ready to be used, delving into actual serialization was
next. I decided to split the serialization functionality into two classes:
Deserializer. It doesn’t make sense to serialize and
deserialize from the same stream at the same time, and the classes would become
too large. Both the
Serializer and the
Deserializer are associated to a
stream when constructed, and write to/read from that stream exclusively.
Serializer class contains abstract methods for writing trivial types (eg:
double), and also strings. The same goes for the
but for reading instead of writting.
This seems okay, but there’s one problem: this would be sufficient for sequential binary data serialization, but, how would the values be serialized to, for example, JSON?
In order to solve this, I added a
name argument to every
write function on
Serializer class. This way, I could set names for the fields while
serializing. How would you differentiate between diffent objects then? I decided
to add a
beginObject(name) and a
endObject(), both abstract methods. This
Serializer knows how to group the values being written into objects.
I also added a
endObject() to the
Still, this approach wasn’t perfect: what if I didn’t know the number of values
I would be serializing/deserializing? This would be a problem while trying to
deserialize arrays and dictionaries. My solution was to add the
beginDictionary(length, name) and
endDictionary() abstract methods to the
Serializer. I added the equivalent
methods to the
Deserializer, but, instead of passing the length, the length
of current array/dictionary is returned. The dictionary ‘mode’ assumes that
values will be written in ‘key value’ order.
These new methods allowed me to implement methods such as
write(map) and the equivalent deserialization methods. Here
is an example of how you could use the serializer and deserializer as it is:
Serializer* s = ...; s->beginObject("npc1"); s->write("John", "name"); s->write(43, "age"); s->write(75.6, "weight"); s->beginDictionary(2, "inventory"); s->write("apple"); s->write(3); s->write("sword"); s->write(1); s->endDictionary(); // If you have a std::unordered_map, you also just // write it directly: // s->write(map, "inventory"); s->endObject();
Deserializer* s = ...; std::string name; int age; double weight; std::unordered_map<std::string, int> inventory; s->beginObject(); s->read(name); s->read(age); s->read(weight); s->read(inventory); s->endObject();
In the next post I will write about how I implemented the serialization methods on serializable/deserializable types, and how context is passed to them. We still need to provide an actual Serializer/Deserializer implementation, since right now we have only still defined abstract classes.