Ciofeca Forensics - Revisiting Apple Notes (6): The Protobuf

TL;DR: This post explains portions of two protobufs used by Apple, one for the Note format itself and another for embedded objects. More importantly, it explains how you can figure out the structure of protobufs.

Background

Previous entries in this series covered how to deal with Apple Notes and the embedded objects in them, including embedded tables and galleries. Throughout these posts, I have referred to the fact that Apple uses protocol buffers (protobufs) to store the information for both notes and the embedded objects within them. What I have not yet done is actually provide the .proto file that was used to generate the Ruby output, or explained how you can develop the same on your app of interest. If you only care about the first part of that, you can view the .proto file or the config I use for protobuf-inspector. Both of these files are just a start to pull out the important parts for processing and can certainly be improved.

As with previous entries, I want to make sure I give credit where it is due. After pulling apart the Note protobuf and while I was trying to figure out the table protobuf, I came across dunhamsteve’s work. As a result, I went back and modified some of my naming to better align to what he had published and added in some fields like version which I did not have the data to discover.

What is a Protocol Buffer?

To quote directly from the source,

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

What does that mean? It means a protocol buffer is a way you can write a specification for your data and use it in many projects and languages with one command. The end result is source code for whatever language you are writing in. For example, Sean Ballinger’s Alfred Search Notes App used my notestore.proto file to compile to Go instead of Ruby to interact with Notes on MacOS. When you use it in your program, the data which you save will be a raw data stream which won’t look like much, but will be intelligable to any code with that protobuf definition.

The definition is generally a .proto file which would look something like:

syntax = "proto2";

// Represents an attachment (embedded object)
message AttachmentInfo {
   optional string attachment_identifier = 1;
   optional string type_uti = 2;
}

This definition would have just one message type (AttachmentInfo), with two fields (attachment_identifier and type_uti), both optional. This is using the proto2 syntax.

Why Care About Protobufs

Protobufs are everywhere, especially if you happen to be working with or looking at Google-based systems, such as Android. Apple also uses a lot of them in iOS, and for people that have to support both operating systems, using a protobuf makes the pain of maintaining two different code bases slightly less annoying because you can compile the same definition to different languages. If you are in forensics, you may come across something that looks like it isn’t plaintext and discover that you’re actually looking at a protobuf. When it comes specifically to Apple Notes, protobufs are used both for the Note itself and the attachments.

How to Use a .proto file

Assuming you have a .proto file, either from building one yourself or from finding one from your favorite application, you can compile it to your target language using protoc. The resulting file can then be included in your project using whatever that language’s include statement is to create the necessary classes for the data. For example, when writing Apple Cloud Notes Parser in Ruby, I used protoc --ruby_out=. ./proto/notestore.proto to compile it and then require_relative 'notestore_pb.rb' in my code to include it.

If I wanted instead to add in support for python, I would only have to make this change: protoc --ruby_out=. --python_out=. ./proto/notestore.proto

How Can You Find a Protobuf Definition File?

If you come up against a protobuf in an application you are looking at, you might be able to find the .proto protobuf definition file in the application itself or somewhere on the forensic image. I ended up going through an iOS 13 forensic image earlier this year and found that Apple still had some of theirs on disk:

[notta@cuppa iOS13_logical]$ find | grep '\.proto$'
./System/Library/Frameworks/MultipeerConnectivity.framework/MultipeerConnectivity.proto
./System/Library/PrivateFrameworks/ActivityAchievements.framework/ActivityAchievementsBackCompat.proto
./System/Library/PrivateFrameworks/ActivityAchievements.framework/ActivityAchievements.proto
./System/Library/PrivateFrameworks/CoreLocationProtobuf.framework/Support/Harvest/CLPCollectionRequest.proto
./System/Library/PrivateFrameworks/ActivitySharing.framework/ActivitySharingDatabaseCodables.proto
./System/Library/PrivateFrameworks/ActivitySharing.framework/ActivitySharingDomainCodables.proto
./System/Library/PrivateFrameworks/ActivitySharing.framework/ActivitySharingInvitationCodables.proto
./System/Library/PrivateFrameworks/ActivitySharing.framework/ActivitySharingCloudKitCodables.proto
./System/Library/PrivateFrameworks/CloudKitCode.framework/RecordTransport.proto
./System/Library/PrivateFrameworks/RemoteMediaServices.framework/RemoteMediaServices.proto
./System/Library/PrivateFrameworks/CoreDuet.framework/knowledge.proto
./System/Library/PrivateFrameworks/HealthDaemon.framework/Statistics.proto
./System/Library/PrivateFrameworks/AVConference.framework/VCCallInfoBlob.proto
./System/Library/PrivateFrameworks/AVConference.framework/captions.proto

Some of these are really interesting when you look at them, particularly if you care about their location data and pairing. You don’t even have to have an iOS forensic image sitting around as all of the same files are included in your copy of MacOS 10.15.6, as well, if you run sudo find /System/ -iname "*.proto". I am not including any interesting snippets of those because they are copyrighted by Apple and I would explicitly note that none are related to Apple Notes or the contents of this post.

In general, you should not expect to find these definitions sitting around since the definition file isn’t needed once the code is generated. For more open source applications, you might be interested in some Google Dorks, especially when looking at Android artifacts, as you might still find them.

How Can You Rebuild The Protobuf?

But what if you can’t find the definition file, how can you rebuild it yourself? This was the most interesting part of rewriting Apple Cloud Notes Parser as I had no knowledge of how Apple typically represents data, nor protobufs, so it was a fun learning adventure.

If you have nothing else, the protoc --decode-raw command can give you an intial look at what is in the data, however this amounts to not much more than pretty printing a JSON object, it doesn’t do a great job of telling you you what might be in there. I made heavy use of mildsunrise’s protobuf-inspector which at least makes an attempt to tell you what you might be looking at. Another benefit to using this is that it lets you incrementally build up your own definition by editing a file named protobuf_config.py in the protobuf-insepctor folder.

For example, below is the output from protobuf-inspector when I ran it on the Gunzipped contents of one of the first notes in my test database.

[notta@cuppa protobuf-inspector]$ python3 main.py < ~/note_18.blob 
root:
    1 <varint> = 0
    2 <chunk> = message:
        1 <varint> = 0
        2 <varint> = 0
        3 <chunk> = message:
            2 <chunk> = "Pure blob title"
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                2 <varint> = 0
                3 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                5 <varint> = 1
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                2 <varint> = 5
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                5 <varint> = 2
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 5)
                2 <varint> = 5
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 8)
                4 <varint> = 1
                5 <varint> = 3
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 10)
                2 <varint> = 4
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                4 <varint> = 1
                5 <varint> = 4
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 14)
                2 <varint> = 10
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                5 <varint> = 5
            3 <chunk> = message:
                1 <chunk> = message:
                    1 <varint> = 0
                    2 <varint> = 4294967295
                2 <varint> = 0
                3 <chunk> = message:
                    1 <varint> = 0
                    2 <varint> = 4294967295
            4 <chunk> = message:
                1 <chunk> = message:
                    1 <chunk> = bytes (16)
                        0000   EE FE 10 DA 5A 79 43 25 88 BA 6D CA E2 E9 B7 EC                          ....ZyC%..m.....
                    2 <chunk> = message(1 <varint> = 24)
                    2 <chunk> = message(1 <varint> = 9)
            5 <chunk> = message:
                1 <varint> = 5
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
            5 <chunk> = message:
                1 <varint> = 5
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
            5 <chunk> = message:
                1 <varint> = 5
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)

There is a lot in here for a note that just says “Pure blob title”! Because we know that protobufs are made up of messages and fields, as we look through this we are going to try to figure out what the messages are and what types of fields they have. To do that, you want to pay attention to the field types (such as “varint”) and numbers (1, 2, 3, you know what numbers are).

In a protobuf, each field number corresponds to exactly one field, so when you see many of the same field number, you know that is a repeated field. In the above example, there are a lot of repeated field 5, which is a message that contains two things, a varint and another message. You also want to pay attention to the values given and look for magic numbers that might correspond to things like timestamps, the length of a string, the length of a substring, or an index within the overall protobuf.

Breaking Down an Example

Looking at the very start of this, we see that this protobuf has one root object with in. That root object has two fields which we know about: 1 and 2. However, we don’t have enough information to say anything meaningful about them, other than that field 2 is clearly a message type that contains everything else.

root:
    1 <varint> = 0
    2 <chunk> = message:
      ...

Looking within field 2, we see a very similar issue. It has three fields, two of which (1 and 2) we don’t know enough about to deduce their purpose. Field 3, however, again is a clear message with a lot more inside of it.

...
    2 <chunk> = message:
        1 <varint> = 0
        2 <varint> = 0
        3 <chunk> = message:
            ...

Field 3 is where it gets interesting. We see some plaintext in field 2, which contains the entire text of this particular note. We see repeated fields 3 and 5, so those messages clearly can apply more than once. We see only one field 4, which is a message that has a 16-byte value and two integers.

    ...
        3 <chunk> = message:
            2 <chunk> = "Pure blob title"
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                2 <varint> = 0
                3 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                5 <varint> = 1
            3 <chunk> = message:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                2 <varint> = 5
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                5 <varint> = 2
            ... [3 repeats a few times]
            4 <chunk> = message:
                1 <chunk> = message:
                    1 <chunk> = bytes (16)
                        0000   EE FE 10 DA 5A 79 43 25 88 BA 6D CA E2 E9 B7 EC                          ....ZyC%..m.....
                    2 <chunk> = message(1 <varint> = 24)
                    2 <chunk> = message(1 <varint> = 9)
            5 <chunk> = message:
                1 <varint> = 5
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
            ... [5 repeats a few times]

An Example protobuf-Inspector Config

At this point, we need more data to test against. To make that test meaningful, I would first save the information we’ve seen above into a new definition file for protobuf-inspector. That way when we run this on other notes, anything that is new will stand out. Even though we don’t know much, this could be your initial definition file, saved in the folder you run protobuf-inspector from as protobuf_config.py.

types = {
  # Main Note Data protobuf
  "root": {
    # 1: unknown?
    2: ("document"),
  },

  # Related to a Note
  "document": { #
    # 1: unknown?
    # 2: unknown?
    3: ("note", "Note"),
  },

  "note": { # 
    2: ("string", "Note Text"),
    3: ("unknown_chunk", "Unknown Chunk"),
    4: ("unknown_note_stuff", "Unknown Stuff"),
    5: ("unknown_chunk2", "Unknown Chunk 2"),
  },

  "unknown_chunk": {
    # 1:
    2: ("varint", "Unknown Integer 1"),
    # 3:
    5: ("varint", "Unknown Integer 2"),
  },

  "unknown_note_stuff": {
    # 1: unknown message
  },

  "unknown_chunk2": {
    1: ("varint", "Unknown Integer 1"),
  },

}

Then when we run this against the next note in our database, we see many of the fields we have “identified”. Notice, for example, that the more complex field 3 we considered before is now clearly called a “Note” in the below output. That makes it much easier to understand as you walk through it.

notta@cuppa protobuf-inspector]$ python3 main.py < ~/note_19.blob 
root:
    1 <varint> = 0
    2 <document> = document:
        1 <varint> = 0
        2 <varint> = 0
        3 Note = note:
            2 Note Text = "Pure bold italic title"
            3 Unknown Chunk = unknown_chunk:
                1 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                2 Unknown Integer 1 = 0
                3 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                5 Unknown Integer 2 = 1
            3 Unknown Chunk = unknown_chunk:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 4)
                2 Unknown Integer 1 = 1
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                5 Unknown Integer 2 = 2
            3 Unknown Chunk = unknown_chunk:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                2 Unknown Integer 1 = 4
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 8)
                4 <varint> = 1
                5 Unknown Integer 2 = 3
            3 Unknown Chunk = unknown_chunk:
                1 <chunk> = message(1 <varint> = 1, 2 <varint> = 5)
                2 Unknown Integer 1 = 21
                3 <chunk> = message(1 <varint> = 1, 2 <varint> = 0)
                5 Unknown Integer 2 = 4
            3 Unknown Chunk = unknown_chunk:
                1 <chunk> = message:
                    1 <varint> = 0
                    2 <varint> = 4294967295
                2 Unknown Integer 1 = 0
                3 <chunk> = message:
                    1 <varint> = 0
                    2 <varint> = 4294967295
            4 Unknown Stuff = unknown_note_stuff:
                1 <chunk> = message:
                    1 <chunk> = bytes (16)
                        0000   EE FE 10 DA 5A 79 43 25 88 BA 6D CA E2 E9 B7 EC                          ....ZyC%..m.....
                    2 <chunk> = message(1 <varint> = 26)
                    2 <chunk> = message(1 <varint> = 9)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 22
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                5 <varint> = 3

Building Up the Config

Editing that protobuf_config.py file lets you quickly recheck the blobs you previously exported and you can build your understanding up iteratively over time. But how do you build your understanding up? In this case I looked at the fact that the plaintext string didn’t have any of the fancy bits that I saw in Notes and assumed that some parts of either the repeated 3, or the repeated 5 sections dealt with formatting.

Because there are a lot of fancy bits that could be used, I tried to generate a lot of test examples which had only one change in each. So I started with what you see above, just a title and generated notes that iteratively had each of the formatting possibilities in a title. To make it really easy on myself to recognize string offsets, I always styled the word which represented the style. For example, any time I had the word bold it was bold and if I used italics it was italics.

As I generated a lot of these, and started generating content in the body of the note, not just the title, I noticed a pattern emerging in field 5. The lengths of all of the messages in field 5 always added up to the length of the text. In the example above from Note 19, “Unknown Integer 1” is value 22, and the length of “Note Text” is 22. In the previous example from Note 18, “Unknown Integer 1” would add up to 15 (there are three enties, each with the value 5), and the length of “Note Text” is 15. Based on this, I started attacking field 5 assuming it contained the formatting information to know how to style the entire string.

Here, for example, are the relevant note texts and that unknown chunk #5 for three more notes which show interesting behavior as you compare the substrings. Play attention to the spaces between words and newlines, as compared to the assumed lengths in field 5.

[notta@cuppa protobuf-inspector]$ python3 main.py < ~/note_21.blob 
        3 Note = note:
            2 Note Text = "Pure bold underlined strikethrough title"
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 40
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                5 <varint> = 3
                6 <varint> = 1
                7 <varint> = 1

[notta@cuppa protobuf-inspector]$ python3 main.py < ~/note_32.blob 
        3 Note = note:
            2 Note Text = "Title\nHeading\n\nSubheading\nBody\nMono spaced\n\n"
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 6
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 8
                2 <chunk> = message(1 <varint> = 1, 3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 1
                2 <chunk> = message(3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 11
                2 <chunk> = message(1 <varint> = 2, 3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 5
                2 <chunk> = message(3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 13
                2 <chunk> = message(1 <varint> = 4, 3 <varint> = 1)

[notta@cuppa protobuf-inspector]$ python3 main.py < ~/note_33.blob 
        3 Note = note:
            2 Note Text = "Not bold title\nBold title\nBold body\nBold italic body\nItalic body"
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 4
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                3 <chunk> = message:
                    1 <chunk> = ".SFUI-Regular"
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 11
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                3 <chunk> = message:
                    1 <chunk> = ".SFUI-Regular"
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 11
                2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 10
                2 <chunk> = message(3 <varint> = 1)
                5 <varint> = 1
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 17
                2 <chunk> = message(3 <varint> = 1)
                5 <varint> = 3
            5 Unknown Chunk 2 = unknown_chunk2:
                1 Unknown Integer 1 = 11
                2 <chunk> = message(3 <varint> = 1)
                5 <varint> = 2

Inside the “Unknown Chunk 2” message’s field #2, we see a message that has at least two fields, 1 and 3. As we compare the text in note 32, which has each of the types of headings (Title, heading, subheading, etc), to the other two notes, we see that every time there is a title, the first field in the message in field 2, is always 0. When it is a heading, the value is 1, and a subheading the value is 2. Body text has no entry in that field, but monospaced text does. This makes it seem like that field #2 tells us the style of the text.

Then when we compare note 33’s types of text (bold, bold italic, and italic), we can see that everything stays the same except for field #5. In this case, when text is bold, the value in that field is 1, and when it is italic, it is 2. When it it both bold and italic, the value is 3. In note 21, we can see that fields 6 and 7 only show up in that message when something is underlined or struck through, this would make those seem like a boolean flag.

I created many more tests like this, but the general theory is the same: try to create situations where the only change in the protobuf is as small as possible. This was a lot of different notes, using literally all of the available featues in many of the needed combinations to be able to isolate what was set when. As I thought I figured out what a field was, I would add it to the protobuf_config.py file and continue going, until something did not make sense at which point I would back out that specific change. I did not try to figure out the entire structure as my goal was purely to be able to recreate the display of the note in HTML.

Formatting Notes

Although Apple does not directly document their Notes formats, the Developer Documents do provide insight into what you might expect to find. For example, Core Text is how text is laid out, which sounds a lot like what we were trying to find out in field 5. Reading these documents helped me understand some of the general ideas to be watching for.

What is in the Notes Protobuf Config?

Now that you know how you can iteratively build up a definition, I want to walk through the notestore.proto file which Apple Cloud Notes Parser uses. This could be easily imported to other projects in other languages besides Ruby and I am taking sections of the file out of order to build up a common understanding.

Note Protobuf

syntax = "proto2";

//
// Classes related to the overall Note protobufs
//

// Overarching object in a ZNOTEDATA.ZDATA blob
message NoteStoreProto {
  required Document document = 2;
}

// A Document has a Note within it.
message Document {
  required int32 version = 2;
  required Note note = 3;
}

// A Note has both text, and then a lot of formatting entries.
// Other fields are present and not yet included in this proto.
message Note {
  required string note_text = 2;
  repeated AttributeRun attribute_run = 5;
}

It seemed like what I found in poking at the protobufs fit the proto2 syntax better than the proto3 syntax, so that’s what I’m using. The NoteStoreProto, Document, and Note messages represent what we were looking at in the examples above, the highest level messages in the protobuf. As you can see, we don’t do much with the NoteStoreProto or Document and I would not be surprised to learn these have different names and a more general use in Apple. For the Note itself, the only two fields this .proto definition concerns itself with are 2 (the note text) and 5 (the attribute runs for formatting and the like).

// Represents a "run" of characters that need to be styled/displayed/etc
message AttributeRun {
  required int32 length = 1;
  optional ParagraphStyle paragraph_style = 2;
  optional Font font = 3;
  optional int32 font_weight = 5;
  optional int32 underlined = 6;
  optional int32 strikethrough = 7;
  optional int32 superscript = 8; //Sign indicates super/sub
  optional string link = 9;
  optional Color color = 10;
  optional AttachmentInfo attachment_info = 12;
}

//Represents a color
message Color {
  required float red = 1;
  required float green = 2;
  required float blue = 3;
  required float alpha = 4;
}

// Represents an attachment (embedded object)
message AttachmentInfo {
   optional string attachment_identifier = 1;
   optional string type_uti = 2;
}

// Represents a font
message Font {
   optional string font_name = 1;
   optional float point_size = 2;
   optional int32 font_hints = 3;
}

// Styles a "Paragraph" (any run of characters in an AttributeRun)
message ParagraphStyle {
    optional int32 style_type = 1 [default = -1];
    optional int32 alignment = 2;
    optional int32 indent_amount = 4;
    optional Checklist checklist = 5;
}

// Represents a checklist item
message Checklist {
  required bytes uuid = 1;
  required int32 done = 2;
}

Speaking of the AttributeRun, these are the messages which are needed to put it back together. Each of the AttributeRun messages have a length (field 1). They optionally have a lot of other fields, such as a ParagraphStyle (field 2), a Font (field 3), the various formatting booleans we saw above, a Color (field 10), and AttachmentInfo (field 12). The Color is pretty straight forward, taking RGB values. The AttachmentInfo is simple enough, just keeping the ZIDENTIFIER value and the ZTYPEUTI value. The Font isn’t something I actually take advantage of yet, but there are placeholders for the values which appear.

The ParagraphStyle is one of the more import messages for displaying a note as it helps to style a run of characters with information such as the indentation. It also contains within it a CheckList message, which holds the UUID of the checklist and whether or not it has been completed.

With the protobuf definition so far, you should be able to correctly render the text, although you will need a cheat sheet for the formatting found in ParagraphStyle’s first field. I originally had this in the protobuf definition, but I do not believe it is a true enum, so I moved it to the AppleNote class’ code as constants.

class AppleNote

  # Constants to reflect the types of styling in an AppleNote
  STYLE_TYPE_DEFAULT = -1
  STYLE_TYPE_TITLE = 0
  STYLE_TYPE_HEADING = 1
  STYLE_TYPE_SUBHEADING = 2
  STYLE_TYPE_MONOSPACED = 4
  STYLE_TYPE_DOTTED_LIST = 100
  STYLE_TYPE_DASHED_LIST = 101
  STYLE_TYPE_NUMBERED_LIST = 102
  STYLE_TYPE_CHECKBOX = 103

  # Constants that reflect the types of font weighting
  FONT_TYPE_DEFAULT = 0
  FONT_TYPE_BOLD = 1
  FONT_TYPE_ITALIC = 2
  FONT_TYPE_BOLD_ITALIC = 3
  ...

end

MergeableData protobuf

//
// Classes related to embedded objects
//

// Represents the top level object in a ZMERGEABLEDATA cell
message MergableDataProto {
  required MergableDataObject mergable_data_object = 2;
}

// Similar to Document for Notes, this is what holds the mergeable object
message MergableDataObject {
  required int32 version = 2; // Asserted to be version in https://github.com/dunhamsteve/notesutils
  required MergeableDataObjectData mergeable_data_object_data = 3;
}

// This is the mergeable data object itself and has a lot of entries that are the parts of it 
// along with arrays of key, type, and UUID items, depending on type.
message MergeableDataObjectData {
  repeated MergeableDataObjectEntry mergeable_data_object_entry = 3;
  repeated string mergeable_data_object_key_item = 4;
  repeated string mergeable_data_object_type_item = 5;
  repeated bytes mergeable_data_object_uuid_item = 6;
}

// Each entry is part of the pbject. For example, one entry might be identifying which 
// UUIDs are rows, and another might hold the text of a cell.
message MergeableDataObjectEntry {
  required RegisterLatest register_latest = 1;
  optional Dictionary dictionary = 6;
  optional Note note = 10;
  optional MergeableDataObjectMap custom_map = 13;
  optional OrderedSet ordered_set = 16;
}

Similar to the Note protobuf definition above, the MergeableDataProto and MergeableDataObject messages are likely larger objects which Notes just doesn’t have enough data to show the full understanding. MergeableDataObjectData (I know, the naming could use some work, that’s a future improvement) is really the embedded object found in the ZMERGEABLEDATA column. It is made up of a lot of MergeableDataObjectEntry messages (field 1) and the example from embedded tables is that an entry might tell the user which other entries are rows or columns. The MergeableDataObjectData also has strings which represent the key (field 4) or the type of item (field 5), and a set of 16 bytes which represent a UUID to identify this object (field 6).

MergeableDataObjectEntry is where things get more complicated. So far five of its fields seem relevant, with the Note message in field 10 already having been explained above. The RegisterLatest (field 1), Dictionary (field 6), MergeableDataObjectMap (field 13), and OrderedSet (field 16) objects are explained below, but will make the msot sense if you read about embedded tables at the same time.

// ObjectIDs are used to identify objects within the protobuf, offsets in an array, or 
// a simple String.
message ObjectID {
  required uint64 unsigned_integer_value = 2;
  required string string_value = 4;
  required int32 object_index = 6;
}

// Register Latest is used to identify the most recent version
message RegisterLatest {
  required ObjectID contents = 2;
}

The RegisterLatest object has one ObjectID within it (field 2). This message is used to identify which ObjectID is the latest version. This is needed because Notes can have more than one source, between your local device, shared iCloud accounts, and a web editor in iCloud. As updates are merged, you can have older edits present, which you don’t want to use.

The ObjectID itself is useful in more places. It is used heavily in embedded tables and has three different possible pointers, one for unsigned integers (field 2), one for strings (field 4), and one for objects (field 6). It should point to one of those three, as way seen below.

// The Object Map uses its type to identify what you are looking at and 
// then a map entry to do something with that value.
message MergeableDataObjectMap {
  required int32 type = 1;
  repeated MapEntry map_entry = 3;
}

// MapEntries have a key that maps to an array of key items and a value that points to an object.
message MapEntry {
  required int32 key = 1;
  required ObjectID value = 2;
}

Now that the ObjectID message is defined, we can look at the MergeableDataObjectMap. This message has a type (field 1) and potentially a lot of MapEntry messages (field 3). The type will be meaningful when looked up from another place.

The MapEntry message has an integer key (field 1) and an ObjectID value (field 2). The ObjectID will point to something that is indicated by the key, either as an integer, string, or object.

// A Dictionary holds many DictionaryElements
message Dictionary {
  repeated DictionaryElement element = 1;
}

// Represents an object that has pointers to a key and a value, asserting 
// somehow that the key object has to do with the value object.
message DictionaryElement {
  required ObjectID key = 1;
  required ObjectID value = 2;
}

The Directionary message has a lot of DictionaryElement messages (field 1) within it. Each DictionaryElement has a key (field 1) and a value (field 2), both of which are ObjectIDs. For example, the key might be an ObjectID which has an ObjectIndex of 20 and the value might be an ObjectID with an ObjectIndex of 19. That would say that whatever is contained in index 20 is how we understand what we do with whatever is in index 19.

// An ordered set is used to hold structural information for embedded tables
message OrderedSet {
  required OrderedSetOrdering ordering = 1;
  required Dictionary elements = 2;
}


// The ordered set ordering identifies rows and columns in embedded tables, with an array 
// of the objects and contents that map lookup values to originals.
message OrderedSetOrdering {
  required OrderedSetOrderingArray array = 1;
  required Dictionary contents = 2;
}

// This array holds both the text to replace and the array of UUIDs to tell what
// embedded rows and columns are.
message OrderedSetOrderingArray {
  required Note contents = 1;
  repeated OrderedSetOrderingArrayAttachment attachment = 2;
}

// This array identifies the UUIDs that are embedded table rows or columns
message OrderedSetOrderingArrayAttachment {
  required int32 index = 1;
  required bytes uuid = 2;
}

Finally, we have a set of messages related to OrderedSets. These are really key in tables (as are most of these more complicated messages we discuss) and kind of wrap around the messages we saw above (i.e. an ObjectID is likely pointing to an index in an OrderedSet). An OrderedSet message has an OrderedSetOrdering message (field 1) and a Dictionary (field 2). The OrderedSetOrdering message has an OrderedSetOrderingArray (field 1) and another Dictionary (field 2). The OrderedSetOrderingArray interestingly has a Note (field 1) and potentially many OrderedSetOrderingArrayAttachment messages (field 2). Finally, the OrderedSetOrderingArrayAttachment has an index (field 1) and a 16-byte UUID (field 2).

I would highly recommend checking out the blog post about embedded tables to get through these last three sections of the protobuf with an example to follow along.

Conclusion

Protobufs are an efficient way to store data, particularly when you have to interact with that same data or data schema from different languages. My understanding of the Apple Notes protobuf is certainly not complete, but at this point is generally good enough to support recreating the look of a note after parsing it. Most of the protobuf is straightforward, it is really when you get into embedded tables that things get crazy. At this point, you should have a good enough understanding to compile the Cloud Note Parser’s proto file for your target language and start playing with it yourself!