[Imap-protocol] Character encoding question

Jeff Mckay jeff.mckay at comaxis.com
Wed Nov 2 10:53:21 PDT 2011


Thanks for your comments. I'm still a bit confused. Let me clarify
what I am seeing in
these two examples. In the first, one of the characters in question is
"lower case o with
acute" which is supposed to be xF3 in ISO-8859-2 and xC3 xB3 in UTF-8.
The imap
server represents this as ampersand followed by AMP followed by a dash
(I am writing
out the description so it does not get interpreted incorrectly
somewhere). If I take the
AMP and run it through a base64 decoder, I get xF3. So far so good.

In the second example, we have the letters Temp/New followed by a couple
Chinese
characters that I don't know the names of. The two Chinese characters
are represented
in imap by ampersand followed by bUuL1Q and the closing dash. When I base64
decode this I end up with x6D x4B x8B xD5. This appears to be big-endian
UTF-16. I have to byte-reverse each 2 byte sequence, but then I can convert
it to UTF8 (my target) and see the Chinese characters. I could also take
the original data and stick a + in front of it (ending up with +bUuL1Q) and
convert this from UTF7 to UTF8 and end up with valid characters. This
last part I really don't understand - if it is base64 encoded, how is
that valid
UTF7? Anyway, I don't seem to have an algorithm that will work on both
of these examples, and no way to detect which one I should use. Obviously
I am totally confused about what I am doing, but any further insight would
be appreciated.

Michael M Slusarz wrote:

> Quoting Jeff Mckay <jeff.mckay at comaxis.com>:

>

>> I am dealing with a Sun Java imap server that seems a little screwy

>> in regards to

>> encoding certain non-English character strings - hopefully this is my

>> problem but

>> I'm not sure what is going on. Here are a couple examples of folder

>> names

>> from this server:

>>

>> visible in client: test/A hegyek hóval borított

>> encoded by imap: "test/A hegyek h&APM-val bor&AO0-tott\"

>>

>> visible in client: Temp/New??

>> encoded by imap: "Temp/New&bUuL1Q-"

>

> Both of those mailbox names look fine.

>

>> In the first case, it is necessary to take the &AMP- part and base64

>> decode it,

>> then treat the result as modified UTF7.

>

> I am assuming that you are intending to convert the IMAP server stored

> mailbox name to a displayable representation on the client side. If

> so, your description is incorrect. The mailbox name on the server

> **is** modified UTF-7. Once you base64 decode (and remove the & and -

> delimiters), the resulting mailbox string is now in the charset of the

> MUA (e.g. UTF-8).

>

>> In the second case, the base64 decode

>> step is unnecessary, it is already in UTF7 format.

>

> Mailbox names on the IMAP server are ALWAYS modified UTF-7. So not

> sure what you mean by "unnecessary".

>

>> So my question is, when do I do a base64 decode and when not?

>

> Generally, IMHO, it will be easiest to work with mailbox names in the

> native charset on the MUA side. So you only need to convert to/from

> modified UTF-7 when either sending or parsing an IMAP command.

>

> michael

>

> _______________________________________________

> Imap-protocol mailing list

> Imap-protocol at u.washington.edu

> http://mailman2.u.washington.edu/mailman/listinfo/imap-protocol

>

>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20111102/1afbade3/attachment.html>


More information about the Imap-protocol mailing list