Thursday, August 18, 2016

Avro - schema change compatibility

I was under the impression that I can add an optional field to Avro schema and consumers with old schema would still be able to read the message whether or not the newly added optional filed is set or not in the record.

So I added an optional field to an Avro schema, generated an Avro from it and I was able to read it from consumers using the old and new schema (this might not work if the schema changes are not at the end).

But this is not how it should work as the Avro schema is not forward compatible. And that is what happened and one of the consumers started to fail.

So I was surprised why my code was working and the other code (using older schema) was throwing errors when reading the avro created out of new schema. This is where it was failing


    BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(is, null);
    while (!binaryDecoder.isEnd()) {
      DatumReader<MyClass> datumReader = new SpecificDatumReader< MyClass >(MyClass.class);
      datumReader.read(null, binaryDecoder); // failure here
    }

The difference was, I didn't had the while loop in my code, assuming the input stream represented a single record, while the other consumer was handling a batch of requests.

So the above code would read part of the avro byte array as per the old schema in the first iteration and the remaining bytes for the new field (even though it is not set) in the second iteration. Since the left bytes was not a complete record it was reporting errors.

2 comments:

Calvin said...

Welcome back to the blog-o'sphere pa! :)
BTW, what are you using Avro for? Some work pertaining to Big-Data, Map-reduce, etc.?

What about alternatives like Google's protocol buffers?

Anyways, very happy to see you sharing a sneak peek into your Work(shop)! :)

Cheers & all the best!

Amod Pandey said...

Hello Rags !!! Nice to hear from you. In our project we have used Protocol Buffers, Avro, Thrift and SBE. We use Avro to serialize messages while putting it over Kafka. Avro is most compact form as it does not add any schema related information with the data.

How are you doing?