[Imap-protocol] Character encoding question
jeff.mckay at comaxis.com
Wed Nov 2 10:53:21 PDT 2011
Thanks for your comments. I'm still a bit confused. Let me clarify
what I am seeing in
these two examples. In the first, one of the characters in question is
"lower case o with
acute" which is supposed to be xF3 in ISO-8859-2 and xC3 xB3 in UTF-8.
server represents this as ampersand followed by AMP followed by a dash
(I am writing
out the description so it does not get interpreted incorrectly
somewhere). If I take the
AMP and run it through a base64 decoder, I get xF3. So far so good.
In the second example, we have the letters Temp/New followed by a couple
characters that I don't know the names of. The two Chinese characters
in imap by ampersand followed by bUuL1Q and the closing dash. When I base64
decode this I end up with x6D x4B x8B xD5. This appears to be big-endian
UTF-16. I have to byte-reverse each 2 byte sequence, but then I can convert
it to UTF8 (my target) and see the Chinese characters. I could also take
the original data and stick a + in front of it (ending up with +bUuL1Q) and
convert this from UTF7 to UTF8 and end up with valid characters. This
last part I really don't understand - if it is base64 encoded, how is
UTF7? Anyway, I don't seem to have an algorithm that will work on both
of these examples, and no way to detect which one I should use. Obviously
I am totally confused about what I am doing, but any further insight would
Michael M Slusarz wrote:
> Quoting Jeff Mckay <jeff.mckay at comaxis.com>:
>> I am dealing with a Sun Java imap server that seems a little screwy
>> in regards to
>> encoding certain non-English character strings - hopefully this is my
>> problem but
>> I'm not sure what is going on. Here are a couple examples of folder
>> from this server:
>> visible in client: test/A hegyek hóval borított
>> encoded by imap: "test/A hegyek h&APM-val bor&AO0-tott\"
>> visible in client: Temp/New??
>> encoded by imap: "Temp/New&bUuL1Q-"
> Both of those mailbox names look fine.
>> In the first case, it is necessary to take the &- part and base64
>> decode it,
>> then treat the result as modified UTF7.
> I am assuming that you are intending to convert the IMAP server stored
> mailbox name to a displayable representation on the client side. If
> so, your description is incorrect. The mailbox name on the server
> **is** modified UTF-7. Once you base64 decode (and remove the & and -
> delimiters), the resulting mailbox string is now in the charset of the
> MUA (e.g. UTF-8).
>> In the second case, the base64 decode
>> step is unnecessary, it is already in UTF7 format.
> Mailbox names on the IMAP server are ALWAYS modified UTF-7. So not
> sure what you mean by "unnecessary".
>> So my question is, when do I do a base64 decode and when not?
> Generally, IMHO, it will be easiest to work with mailbox names in the
> native charset on the MUA side. So you only need to convert to/from
> modified UTF-7 when either sending or parsing an IMAP command.
> Imap-protocol mailing list
> Imap-protocol at u.washington.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Imap-protocol