Click here to Skip to main content
15,888,202 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
hi.
i want get arabic or persian character unicode number in C#
simple as = "فارسي" get me this answer (such as windows Character Map)
ufed3 ufe8e ufead ufeb3 ufef2 (i want this)
arabic letter Feh initial form , arabic letter Alef Final form , arabic letter reh isolated form,arabic letter seen initial form,arabic letter Yeh Final form
i want get unicode code from that text ,
C# find this code for "فارسي"
u0641 u0627 u0631 u0633 u064a (i dont want this)
for this code
C#
foreach (char c in text)
            {
                textBox2.Text += string.Format("u{0:x4}", (int)c);
            }

how i can detect character in initial or final or medial ???
please help me

if see Windows Character map Find this Codes you will understand
Posted
Updated 5-Jul-12 12:59pm
v4
Comments
Sergey Alexandrovich Kryukov 5-Jul-12 19:13pm    
Not a big problem, but what is that "ufed, ugeb" notation? You need to define it.
--SA

1 solution

As the notion of initial/isolated/final forms are specific just to Perso-Arabic script (I'm not sure, maybe to some historically related scripts), they are not universal, so I don't know that there is a ready-to-use methods of doing what you want; I doubt it. (Please see: http://en.wikipedia.org/wiki/Perso-Arabic_script[^].) You can easily do it by yourself.

Something like this:
C#
enum ArabicCharacterClass {
    Digit,
    Letter,
    Punctuation,
    Symbol, //?
    //something else?
}

enum ArabicContextualForm {
   None, //?
   End,
   Middle, 
   Beginning,
   Isolated
}

struct ArabicCharacterDescriptor {
   public ArabicCharacterDescriptor(char codePoint) {
       CodePoint = codePoint;
       //can throw exception if the code point is not Arabic (Perso-Arabic)
       //calculate the other members using the dictionaries (see below) if required
   }
   public char CodePoint { get; private set; }
   public ArabicCharacterClass CharacterClass { get; private set; }
   public ArabicContextualForm ContextualForm { get; private set; }
   public string Name { get; private set; }
   public string Din31635 { get; private set; }
   public string IPA { get; private set; }
   //something else?
   public override string ToString() {
       return //... calculate from other members:
       // whatever you want as a default ASCII string representation; could be your notation
   }
}


Find out the Perso-Arabic subset of the of the Unicode code points (there are many, take patience):
http://unicode.org/[^],
http://www.unicode.org/charts/PDF/U0750.pdf[^],
http://www.unicode.org/charts/PDF/U08A0.pdf[^],
http://www.unicode.org/charts/PDF/UFB50.pdf[^],
http://www.unicode.org/charts/PDF/UFE70.pdf[^].

Traverse all this data and create an instance of ArabicCharacterDescriptor of each character, classify them all and put some collection. Its the best to put then all in a collection based on key-value pair; for fast search. You many need to have two or more container with the same values (say, the values will be the instances of ArabicCharacterDescriptor), but indexed with different keys, for a fast search by code point, name, IPA or whatever else.

Please see: http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx[^].

The collections based on key-value pairs are:
http://msdn.microsoft.com/en-us/library/f7fta44c.aspx[^],
http://msdn.microsoft.com/en-us/library/ms132319.aspx[^],
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx[^].

The one most usually used is System.Collections.Generic.Dictionary<TKey, TValue>.

Basically, that's it.

Good luck,
—SA
 
Share this answer
 
v3
Comments
Wonde Tadesse 5-Jul-12 21:39pm    
Good references. 5+
Sergey Alexandrovich Kryukov 5-Jul-12 22:21pm    
Thank you, Wonde.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900