Serialization

Serialization is the act of reversibly converting run-time values to external representations. In simpler terms, it is saving and loading values in a given format. Serialization is also can also be called pickling or, simply, input/output.

Serialization problems in Ocaml

As the compiler only accepts programs which cannot ever have type errors, this information is erased at run-time, leading to increased performance. Generally, Ocaml values have a uniform representation (with some exceptions) which permits run-time polymorphism. However, the conjunction of polymorphism and run-time type erasure means that it is often not possible to reconstruct the type value of a value. Thus, it is not possible to dispatch on run-time types to write, for instance, a generic pretty-printing or serialization function that will work in all cases. The toplevel does some tricks to reconstruct type information for pretty-printing values, but these tricks do not always work, and are not available in native code, anyway.

Serialization solutions in Ocaml

Marshal

But

Preprocessor-based solutions

Sexplib (S-expressions)
Config_file
json-wheel

json-static does automatic marshalling. Here's a simple example:

type json point = { x : int; y : int }  (* an OCaml record *)

It creates the functions with the following signature:

val json_of_point : point -> Json_type.t (* Json_type.t is the JSON syntax tree that you can serialize using Json_io.string_of_json. *)
val point_of_json : Json_type.t -> point
# let j = json_of_point { x = 12; y = 34 };;
val j : Json_type.t = Json_type.Object [("x", Json_type.Int 12); ("y", Json_type.Int 34)]
# Json_io.string_of_json j;;
- : string = "{ \"x\": 12, \"y\": 34 }"

Json-static does not support parametrized types other than a few pervasive ones (lists, arrays, hash tables, options, ...). Like the other syntax extensions that deal with types it uses the type names to determine the JSON type to use, i.e. if 2 names are used to refer to the same OCaml type, they can use 2 different JSON representations. A common, predefined example is the "assoc" type, which is defined as "type 'a assoc = (string * 'a) list" and would use a JSON object rather than a JSON array of arrays, which is common usage.

XML
XDR

Types:

Performance comparison to XML: http://et.redhat.com/~rjones/secure_rpc/ (tiny table at the end) The OCaml implementation is slower than the C code generated by rpcgen. (However, the company I'm currently working for uses this implementation for a high-performance cluster of servers, and we never even thought about the XDR speed. It never mattered.)

The data type is defined in a special XDR notation, which ocamlrpcgen (from Ocamlnet) will then use to generate the Ocaml type and the (de)serialisation functions. One writes a '*.x' file and it gets converted to C by rpcgen or to OCaml by ocamlrpcgen. Currently, ocamlrpcgen understands only a few annotations that modify the OCaml type the XDR type is mapped to. Very lengthy example: http://git.et.redhat.com/?p=libvirt.git;a=blob_plain;f=qemud/remote_protocol.x;hb=HEAD

Hydro

A library for another RPC protocol, called ICE. It is possible to annotate an ICE type with an OCaml function that converts it into a more pleasuring representation.

The author wouldn't recommend Hydro for storing values, because its model is OO-centric, and there is some impedance mismatch between the OO approach and OCaml's type system.

ICE can represent cyclic values.

Combinator-based solutions

I/O combinator library

Compiler-based solutions

Safe unmarshal
GCaml

Desirable properties of serialization solutions

Speed

Serialization and deserialization should be reasonably fast: at most, say, ten times slower than the Marshal module.

Universality

The solution should apply to values of all types, including functions, objects, polymorphic variants and native data.

Portability

The serialization format should work accross all platform, endianness and compilation type (native code or byte code). This can be achieved by making it platform-independent or by providing transparent conversion.

Integrity

The solution should reject corrupt or malformed data.

Type-safety

Successfully deserialized values should be well-formed values of the expected type.

Overridability

The user should be able to define his own serializer/deserializer for some data types.

Off-line type checking

There should be a mechanism for quickly checking that serialized data is well-formed with respect to an interface.

Migration

Mechanisms should be provided for migrating data serialized from older types to newer types.

Editability

It should be possible to view, manually edit and automatically manipulate serialized data without having access to the modules or interfaces that produced it.

TODO

Do a table summarizing the features of the different systems.

Discussion

Please see the discussion page.