[Imap-protocol] Parsing, part numbering and BODYSTRUCTURE

Dave Cridland dave at cridland.net
Mon Jun 29 01:35:47 PDT 2015

On 29 June 2015 at 07:11, David Harris <David.Harris at pmail.gen.nz> wrote:

> For reasons that aren't relevant here, I'm in the process of rewriting my


> parser for about the fifth time in twenty-five years. Each time I do this

> I find I spend

> a lot of time trying to reconcile the way I do my parsing with the demands

> of IMAP. I

> should probably keep notes each time, but I never do. *sigh*.


> A lot of the trouble I have comes from the paucity of detail in RFC3501

> over two key

> issues - part numbering, and BODYSTRUCTURE. This is not helped by what

> appears to me to be an erratum - the sample numbering scheme shown on page

> 56, which appears to suggest that the bare part number for any part of a

> message

> references the first byte of the part INCLUDING any MIME headers it might

> have (if

> you look at 4.1, it is *followed* by 4.1.MIME, which appears to suggest

> that the

> MIME headers are a subset of 4.1).


> So here's my first question: could someone confirm for me that a bare part

> number

> (such as "4.1") refers to the part starting at the first byte *following*

> the CRLF at the

> end of its MIME headers?



Yes. Well, for leaf parts, anyway.

> Next, in a BODYSTRUCTURE, do the line and octet counts for such a part

> include

> the MIME headers, or not? I believe the correct answer is "not", but would

> like to

> know for sure.



I'd agree, octet counts there are expected to be those for the part itself
and not the headers.

> This leads to my next question, which is "is BODYSTRUCTURE reversible"?

> That

> is, if you parse a message, build a BODYSTRUCTURE from the parsed data,

> then

> re-parse the BODYSTRUCTURE, will the two parses be the same? I have to

> clarify

> here, because this question depends on context: if you're parsing for an


> server, it's quite reasonable to assume that your parser will build two

> entries for

> each part, the first tracking the offset of the MIME headers for the

> part, the second

> tracking the offset of the part itself: this allows you to do a simple

> lookup to satisfy

> fetches for both <partnumber> and <partnumber>.MIME... Yet it seems to me

> that

> you cannot reconstruct this information from a BODYSTRUCTURE - you would

> lose the offset to the MIME headers. Why am I asking this? I'm trying to

> work out if

> it's possible to use BODYSTRUCTURE as a way of storing a parse between

> invocations, since it's always going to be far quicker to parse a


> than it is to read the entire message again.



So by "is BODYSTRUCTURE reversible", I thought you meant something else

But no, BODYSTRUCTURE itself doesn't contain the offsets into the message,
and may have normalized other parts of the data. A server would need more
data, and as I recall it's not quite a superset either - there are items
you need for the BODYSTRUCTURE which aren't otherwise useful for a server.

But - also as I recall - Cyrus IMAP does a single parse which extracts both
a server-side structure and the BODYSTRUCTURE.

> Finally, is there a detailed discussion of part numbering and BODYSTRUCTURE

> anywhere? I had a look through the RFC index and couldn't see any other

> documents that might expand on these subjects, and google didn't yield

> anything

> helpful either. And in a similar vein, is there a repository anywhere of

> sample

> messages with matching canonical part number listings and bodystructures?

> This

> would be extremely helpful in testing parsers and bodystructure generators.



I vaguely recall a lengthy discussion on a mailing list (either this one or
imapext) a few years back, but I can't find it immediately either.

> I'm sure this has all been asked a billion times before, and I apologize

> for that, but

> any guidance would be gratefully received.


> Cheers!


> -- David --


> ------------------ David Harris -+- Pegasus Mail ----------------------

> Box 5451, Dunedin, New Zealand | e-mail: David.Harris at pmail.gen.nz

> Phone: +64 3 453-6880 | Fax: +64 3 453-6612


> Thought for the day:

> A diplomat is a man who can convince his wife she'd look

> stout in a fur coat.




> _______________________________________________

> Imap-protocol mailing list

> Imap-protocol at u.washington.edu

> http://mailman13.u.washington.edu/mailman/listinfo/imap-protocol


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20150629/a9412ea3/attachment.html>

More information about the Imap-protocol mailing list