Click here to Skip to main content
15,889,462 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have a csv file which it has a Polish characters too. The content of this is here:

ID_WORKER;FNAME;LNAME;WORKERS_GROUP;POSITION;
1;Paweł;ĄąĆćĘꣳŃńÓ󌜏źŻż;IT;IT Specialist;
6;Dawid;ĄąĆćĘꣳŃńÓ󌜏źŻż;Technologists;Technologists;
8;Maciej;ĄąĆćĘꣳŃńÓ󌜏źŻż;Storekeepers;Storekeeper;


As you see it has these characters like "Ąą Ćć Ęę Łł Ńń Óó Śś Źź Żż". Then I import csv file with below code:

private DataTable ImportFile()
{
    DataTable imported_data = new DataTable();

    OpenFileDialog ofd = new OpenFileDialog();
    ofd.Title = "Open csv file";
    ofd.DefaultExt = "*.csv";
    ofd.Filter = "Documents (*.csv)|*.csv";
    ofd.ShowDialog();

    FileInfo fi = new FileInfo(ofd.FileName);
    string FileName1 = ofd.FileName;
    string excel = fi.FullName;

    using(StreamReader sr = new StreamReader(excel, Encoding.Default))
    {
        string header = sr.ReadLine();
        if (string.IsNullOrEmpty(header))
        {
            MessageBox.Show("Not found or loaded not correct file.");
            return null;
        }

        string[] header_columns = header.Split(',');
        foreach(string header_column in header_columns)
        {
            imported_data.Columns.Add(header);
        }

        while (!sr.EndOfStream)
        {
            string line = sr.ReadLine();

            if (string.IsNullOrEmpty(linia)) continue;

            string[] fields = line.Split(',');
            DataRow imported_row = imported_data.NewRow();

            for (int i = 0; i < fields.Count(); i++)
            {
                imported_row[i] = fields[i];
            }

            imported_data.Rows.Add(imported_row);
        }
    }
    return imported_data;
}


And when I insert that imported data with all content from csv file to database with that code in below:

private void save_modules(DataTable imported_data)
{
    string connection = "datasource=localhost;port=3306;username=root;password=";
    using (MySqlConnection conn = new MySqlConnection(connection))
    {
        conn.Open();
        foreach (DataRow importRow in imported_data.Rows)
        {
            string query = @"INSERT IGNORE INTO try1.workers (ID_WORKER, FNAME, LNAME,
            WORKERS_GROUP, POSITION) VALUES (@ID_WORKER, @FNAME, @LNAME,
            @WORKERS_GROUP, @POSITION);";

            MySqlCommand cmd = new MySqlCommand(query, conn);

            cmd.Parameters.AddWithValue("@ID_WORKER", importRow["ID_WORKER"]);
            cmd.Parameters.AddWithValue("@FNAME", importRow["FNAME"]);
            cmd.Parameters.AddWithValue("@LNAME", importRow["LNAME"]);
            cmd.Parameters.AddWithValue("@WORKERS_GROUP", importRow["WORKERS_GROUP"]);
            cmd.Parameters.AddWithValue("@POSITION", importRow["POSITION"]);

            cmd.ExecuteNonQuery();
        }
        conn.Close();
    }  
    MessageBox.Show("Imported to database");
}


I see in mysql database in "LNAME" column i see NOT all Polish characters: "Aa Cc Ee Ll Nn Óó Ss Zz Zz". And that's not good enough.

As for as mysql database is concerned i have set utf8_polish_ci method of comparing subtitles.

I use phpmyadmin 4.8.4 and mysql database version is 10.1.37-MariaDB

Now, how can i import csv with the Polish characaters. Maybe is required to change the line of code in first piece of importing csv file code:

using(StreamReader sr = new StreamReader(excel, Encoding.Default)) 


Any ideas

What I have tried:

I've tried to import csv file with other encoding formats:

1) Encoding.Default - then it shows like as i shew in this example: "Aa Cc Ee Ll Nn Óó Ss Zz Zz".

2) Encoding.ASCII - then it shows all '?' characters

3) Encoding.UTF8 - but it shows all '?' characters too.

4) Encoding.GetEncoding(1252) - didn't help too much.
Posted
Updated 12-Jul-19 1:58am

Assuming that you properly set Character Sets and Collations[^]...

Solution #1 - using StreamReader Constructor (System.IO) | Microsoft Docs[^] with utf-8 encoding
C#
using(StreamReader sr = new StreamReader(excel, System.Text.Encoding.GetEncoding("utf-8")))
{
	    string headerline = sr.ReadLine();
        if (string.IsNullOrEmpty(headerline))
        {
            Console.WriteLine("Not found or loaded not correct file.");
        }

        string[] headers = headerline.Split(new char[]{';'}, StringSplitOptions.RemoveEmptyEntries);
        imported_data.Columns.AddRange(headers.Select(h=>new DataColumn(h)).ToArray());

        while (!sr.EndOfStream)
        {
            string line = sr.ReadLine();
			imported_data.Rows.Add(line.Split(new char[]{';'}, StringSplitOptions.RemoveEmptyEntries));
		}
}


Solution #2 - using File.ReadAllLines Method (System.IO) | Microsoft Docs[^] with Linq

C#
string[] lines= File.ReadAllLines(excel);
string[] headers = lines[0].Split(new char[]{';'}, StringSplitOptions.RemoveEmptyEntries); 
imported_data.Columns.AddRange(headers.Select(h=>new DataColumn(h)).ToArray());
imported_data = lines.Skip(1)
	.Select(line=> imported_data.LoadDataRow(line.Split(new char[]{';'}, StringSplitOptions.RemoveEmptyEntries), false))
	.CopyToDataTable();


Both return proper data.
 
Share this answer
 
v2
Comments
Member 10696161 11-Jul-19 7:44am    
I've just tested both alternatives solutions and they don't work yet because instead Polish Characters i get same '?' signs. Maybe should I change method of comparing subtitles.
Maciej Los 11-Jul-19 10:15am    
Where? In a datatable or in a database?
Note: i have tested both and they work as well. My regional settings are the same as yours, so it should works fine for you too, unless your database character set and collation was set inproperly.
I don't know but i found that solution: in my connection string wasn't added a "charset".
If i look at "cmd.ExecuteNonQuery" is not sending Unicode to the MySQL database.

So i added to this code line and it looks like this:

string connection = "datasource=localhost;port=3306;username=root;password=;CharSet=utf8mb4;";


It means: "CharSet=utf8mb4;" is added

When i tested this then it worked. But anyway thank you very much for giving to me the solutions! :)
 
Share this answer
 
v3
The first thing to do is to find out which encoding is used for the csv file in the first place. Could be UTF-16 or UTF-32, or it could be any of the encodings supported on the platform which produced the csv file. Could also be an issue by the byte-ordering of the file. So get back to the source for the csv file, find out which encoding/byte ordering is used; only then you will be able to properly read and display its contents.
 
Share this answer
 
Comments
Member 10696161 11-Jul-19 6:29am    
As for as csv file is concerned I save this file form MS Excel (Office 365) comma seperated. So possible is that it can encoded in ANSI.
phil.o 11-Jul-19 6:57am    
Maybe this thread[^] will help you schedding some lights on your issues. Here[^] is another post which discusses specific issues with Office and .csv handling.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900