Click here to Skip to main content
15,881,745 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
hey guys, s.o.s!!!
i'm developing a program which conect to my company support email (gmail) using winsock
(System.Net.Sockets). i am new regarding to this subject. aniway, i manage to connect to the
e-mail address and pass threw the ssl. i can see how many inbox i have and how many unread.
now, the problem is in the encoding of the text. i tryed using utf-7,8,32 and what-so-ever and still nothing. my main problem is that the emails is written in Hebrew and the encoding returns me all kind of wierd leters (gibrish). i am putting the function which handle the reading and the StreamReader here. pleaseeee help, thanks!

VB
Dim m_buffer() As Byte
Dim m_sslStream As SslStream

    Sub GetEmails(ByVal Server_Command As String)
        'Dim m_buffer() As Byte = System.Text.Encoding.ASCII.GetBytes(Server_Command.ToCharArray())
        Dim m_buffer() As Byte = System.Text.Encoding.GetEncoding("iso-8859-8").GetBytes(Server_Command.ToCharArray())
        Dim stream_Reader As StreamReader
        Dim TxtLine As String = ""
        Try
            m_sslStream.Write(m_buffer, 0, m_buffer.Length)
            stream_Reader = New StreamReader(m_sslStream)
            Do While stream_Reader.Peek() <> -1
                TxtLine += "***********" & vbNewLine & stream_Reader.ReadLine() & vbNewLine
            Loop
            TextBox1.Text = TxtLine
        Catch ex As Exception
            MsgBox(ex.Message)
        End Try
    End Sub


[After long discussions below, I think comprehensive answers are finally ready by now, Solutions 1-2 — SA]
Posted
Updated 12-Aug-14 9:04am
v3
Comments
Sergey Alexandrovich Kryukov 10-Aug-14 14:10pm    
Who told you that the input is in so outdated encoding as ISO-8859-8? It's not very likely. You need to know input encoding exactly, or just try all which can apply. As to UTF-s, such encodings as UTF-16 come in two variants UTF-16LE and UTF-16GE (different endianess)...
Besides, Hebrew text in some encoding could be further encoded to represent ASCII (does input visually look like ASCII?)... And so on...
—SA
oronsultan 10-Aug-14 14:15pm    
the "ISO-8859-8" is only one of my trials. i tryed "Windows-1255" also. can u tell me how to encode it currectly?
Sergey Alexandrovich Kryukov 10-Aug-14 14:31pm    
Why "to encode"? We don't know who to decode, that's the problem...
—SA
oronsultan 10-Aug-14 16:01pm    
so i understand that u the expert of this month and a legend in "code project" but so far u didnt answerd my question (althaugh u dont have to). all i am asking is how can i read my hebrew emails in a clear way? pleaee, answer this and not arround this. if u can attach a code it will be great.
Sergey Alexandrovich Kryukov 10-Aug-14 16:52pm    
I cannot. The only problem is that we are not aware of the way the data was encrypted in first place. With e-mail: such thing used to happen: different encoding layer one on top of another; because some agent in a middle assumes some encoding and recode text based on that assumption. It creates a big mess. Even if you sent me some code sample, if would be difficult to me because I don't read Hebrew.

However... could you read and post the sample of the mail headers? In case of multi-part mail, the headers of the parts containing text...

—SA

This is pretty simple. First, divide a message by parts. This header tells you some unique delimiter line:
Content-Type: multipart/alternative; boundary=001a11c361f6384c8e050046d84b

The delimiter is empty line followed by the value of "boundary".

Now, some headers are encoded with UTF-8, for example
Subject: =?UTF-8?B?15HXk9eZ16fXlA==?=

Three headers usually shown by a mail client will show:
From: אורון סולטן
To: oron sultan
Subject: בדיקה

Does it make sense to you?

Now, two parts at the end don't give any readable text, but I'll tell you how to read them. Look at those headers:
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: base64

It means that the plain-text content is HTML formatted, written in UTF-8 and then base64-encoded to form an ASCII text, not readable but tolerable by legacy mail systems, for extra reliability. Indeed, this is what is usually done instead of directly using UTF-8. To get original text, just base64-decode it. This is how: http://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx[^].

Before using trial-and-error approach, just inspect the message and learn if you see some unknown terms. Here is what happened: the misleading information in your question was the story of your attempting of different encoding, which suggests that you did not have information on encoding and could not see what's in the messages. In fact, you have all the headers, which makes the problem quite trivial.

If you want a comprehensive interpretation of each and every detail of a message, you can use the open-source library OpenPOP.NET:
http://sourceforge.net/projects/hpop[^].

This library is not very well written, but it thoroughly follows many standards like MIME, the information which is difficult to put together.

That's all.

—SA
 
Share this answer
 
v2
Oron Sultan asked:
…however, in a weird way, the "subject" part is a pickle. all i get in the subject, instead of Hebrew is this:
"=?UTF-8?B?15HXk9eZ16fXlA==?="…
Yes, this is a missing part. The remaining problem is the header values like "=?UTF-8?B?15HXk9eZ16fXlA==?=" (thanks for reminding me). Let me explain them.

This is the encoding defined by RFC 2047 and is used to represent any non-ASCII data as ASCII, in a value of a mail header: http://tools.ietf.org/html/rfc2047[^].

In this format, all string is sandwiched in pair of '=', and '?' delimit the following terms:
1) encoding (UTF-8), 2) transfer-encoding (B, "binary"), encoded text ("15HXk9eZ16fXlA=="). In your case, it means the same encoding technique as I described for a part in your multi-part sample: a UTF-8 text is base64-encoded. For other detail, please see RFC 2047.

Decoding of this form is available in OpenPOP.NET referenced above. But you can avoid using 3-rd parties (and doing it all by yourself, using Encoding and base64 algorithm after parsing the subject line by '=' and '?', which is also not so hard to do) and do it in a very simple way. Here is my "secret weapon": in .NET FCL, this string can be decoded as "attachment name". For example:
C#
string headerValue = System.Net.Mail.Attachment.CreateAttachmentFromString(
    string.Empty,
    "=?UTF-8?B?15HXk9eZ16fXlA==?=").Name;


Please see: http://msdn.microsoft.com/en-us/library/system.net.mail.attachment.createattachmentfromstring%28v=vs.110%29.aspx[^].

—SA
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900