[Imap-protocol] Parsing, part numbering and BODYSTRUCTURE
David.Harris at pmail.gen.nz
Sun Jun 28 23:11:34 PDT 2015
For reasons that aren't relevant here, I'm in the process of rewriting my MIME
parser for about the fifth time in twenty-five years. Each time I do this I find I spend
a lot of time trying to reconcile the way I do my parsing with the demands of IMAP. I
should probably keep notes each time, but I never do. *sigh*.
A lot of the trouble I have comes from the paucity of detail in RFC3501 over two key
issues - part numbering, and BODYSTRUCTURE. This is not helped by what
appears to me to be an erratum - the sample numbering scheme shown on page
56, which appears to suggest that the bare part number for any part of a message
references the first byte of the part INCLUDING any MIME headers it might have (if
you look at 4.1, it is *followed* by 4.1.MIME, which appears to suggest that the
MIME headers are a subset of 4.1).
So here's my first question: could someone confirm for me that a bare part number
(such as "4.1") refers to the part starting at the first byte *following* the CRLF at the
end of its MIME headers?
Next, in a BODYSTRUCTURE, do the line and octet counts for such a part include
the MIME headers, or not? I believe the correct answer is "not", but would like to
know for sure.
This leads to my next question, which is "is BODYSTRUCTURE reversible"? That
is, if you parse a message, build a BODYSTRUCTURE from the parsed data, then
re-parse the BODYSTRUCTURE, will the two parses be the same? I have to clarify
here, because this question depends on context: if you're parsing for an IMAP
server, it's quite reasonable to assume that your parser will build two entries for
each part, the first tracking the offset of the MIME headers for the part, the second
tracking the offset of the part itself: this allows you to do a simple lookup to satisfy
fetches for both <partnumber> and <partnumber>.MIME... Yet it seems to me that
you cannot reconstruct this information from a BODYSTRUCTURE - you would
lose the offset to the MIME headers. Why am I asking this? I'm trying to work out if
it's possible to use BODYSTRUCTURE as a way of storing a parse between
invocations, since it's always going to be far quicker to parse a BODYSTRUCTURE
than it is to read the entire message again.
Finally, is there a detailed discussion of part numbering and BODYSTRUCTURE
anywhere? I had a look through the RFC index and couldn't see any other
documents that might expand on these subjects, and google didn't yield anything
helpful either. And in a similar vein, is there a repository anywhere of sample
messages with matching canonical part number listings and bodystructures? This
would be extremely helpful in testing parsers and bodystructure generators.
I'm sure this has all been asked a billion times before, and I apologize for that, but
any guidance would be gratefully received.
-- David --
------------------ David Harris -+- Pegasus Mail ----------------------
Box 5451, Dunedin, New Zealand | e-mail: David.Harris at pmail.gen.nz
Phone: +64 3 453-6880 | Fax: +64 3 453-6612
Thought for the day:
A diplomat is a man who can convince his wife she'd look
stout in a fur coat.
More information about the Imap-protocol