[Imap-protocol] Where to start?

Mark Crispin mrc+imap at panda.com
Sun Jun 5 16:58:40 PDT 2011


Most of what others have posted here is good advice. I will not duplicate
what they said.

If you enjoy beating your head against a wall incessantly, then by all
means ignore what I am about to say. Over the years, I have learned that
there are individuals who delight in beating their heads against the wall,
and that it is useless for me to attempt to deprive them of their fun.

If you don't enjoy an aching head, then listen up.

First and foremost, the Formal Syntax section of RFC 3501 should be your
holy book. If any part of RFC 3501 distracts you from the Formal Syntax,
ignore it in favor of the Formal Syntax.

Your jaw will drop when you first see the Formal Syntax. Your eyes will
glaze over. You will start saying "no, no, no." Just work through that
stage. It's a steep hill to climb, but once you make it to the top you
will see everything with crystal clarity.

IMAP is a very subtle protocol in places, and only the Formal Syntax
relates the full subtlty. When a space is required, then exactly ONE
space (not ZERO, not TWO) MUST be there; and when a space is not required
then it is FORBIDDEN.

Whatever you do, DO NOT ATTEMPT TO IMPLEMENT ANY COMMAND OR RESPONSE BY
LOOKING AT THE EXAMPLES! You are guarantee to screw up if you use the
examples as a model (that means YOU, Microsoft!). Use only the Formal
Syntax. The result should indeed look like the examples; and if it
doesn't then study the Formal Syntax to understand why.

Do I still have your attention? Good. You apparently don't want to bang
your head against the wall. I just saved you for one of the major causes
of head-banging.

Second, mind the transition between literal and non-literal modes. You
are either outputting a line, which is set of octets terminated by CRLF;
or you are outputting a literal, which is a precisely counted number of
octets with no termination. However, in all cases in IMAP, there is a
line after a literal (even if it is just a CRLF to end the command or
response).

There are some deep insights in the previous paragraph that will be of
great use to you if you implement your server in a network I/O
infrastructure in which you are doing either "line" or "buffered" mode.
Go ahead. Ask me how I know... :)

Some commands may have multiple literals. So, if a command has two
arguments, both of which come in as literals, you must: read line, read
sized buffer, read line, read sized buffer, read line. The SEARCH command
can have quite a few literals. Or it may have none. Be prepared.

Third, you are best off parsing commands into a struct, following the
Formal Syntax. Think about extensibility. You'll be glad you did if you
ever implement the MULTIAPPEND extension.

Something else is important: there is ABSOLUTELY NOTHING that a command
can do, as part of the parsing of the command, that changes anything in
the mailbox or the server session. If a command responds with either NO
or BAD, then the command is guaranteed to have done nothing. Thus,
commands are in three phases: parse, action, response; and there is an
action if and only if the response is OK.

Fourth, implement responses by a struct, again following the Formal
Syntax. You'll have lots of enums for types to identify which union to
follow. But it's possible to define all of IMAP response in terms of a
single struct (truly the Struct From Hell...but once you get it it will be
a delight). Now you just need an engine to poot it out. Guess what. I
just saved you a HUGE amount of time in writing the IMAP response
generator.

Writing response routines is obsolete technology. I know because I have
done it both ways. It is so much faster to have a single response engine
that poots out a response struct tree.

Fifth, keep in mind that pretty much all of RFC 3501 is mandatory to
implement. You CAN save yourself some time by not implementing non-INBOX
mailboxes (thus the LIST command only poots a \NoInferiors mailbox called
INBOX, and CREATE/DELETE/RENAME all respond with NO). But you'll probably
need the non-INBOX mailboxes sooner rather than later.

When you do implement other mailboxes, don't try to be magic about
subscriptions. Just treat it like server-based bookmarks.

Modified UTF-7 is a royal pain; and I had hoped that it would have been
extinct by now. But you can defer doing anything about it until you
implement non-INBOX mailboxes.

Sixth, you probably will want a vector of messages in the selected
mailbox, indexed by message sequence number. In this vector you will
want, at a minimum, the internal date, size, flags, and UID. You'll
probably want some pointers to message contents, and perhaps preparsed
ENVELOPE and BODYSTRUCTURE. Do UID lookup via binary search through this
vector (not linear search).

Seventh, this reminds me about something else. Those BODYSTRUCTURE things
have lots of size counts. These, and the RFC822.SIZE, must all be EXACT
(do you hear me, Novell? Microsoft?). Clients that do partial fetching
depend upon those counts being exact. No, it is not OK to guess. These
are NOT advisory.

Eighth, flags being booleans is a common implementation and that's quite
alright. However, you'll need some way to record keyword names. Unless
you are providing access to a non-owned mailbox, all flags should be
permanent.

Ninth, slash should be the One And Only True Hierarchy Delimiter. Almost
everybody agrees that it was a mistake to allow others.

Tenth, OK, I lied. I will repeat something that someone else said. Mind
that COPY requires atomicity. So does the MULTIAPPEND extension, if you
ever implement that. If you turn out to be talented enough to implement
MULTIAPPEND, you can claim that you truly grok IMAP.

Eleventh, a server session only needs one mailbox open at a time. If you
have a multithreaded server that supports multiple sessions, consider a
design that allows you to spawn off additional server processes when the
current process fills up. A 32-bit address space simply is not enough for
a modern multi-session server.

Finally, pat yourself on the back. You decided to ask for advice before
you started implementing, instead of diving in, making all the newbie
mistakes, and only then ask questions. You're already ahead of the game.

Regards,

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.



More information about the Imap-protocol mailing list