Sorry this is a bit complex to explain but its really a very simple question....just point me at the right web-site.
I am using SSIS to write a set of n data files from a database. Simplifying a hell of lot I divide very big SQL data sets comprising different related objects into smaller chunks and write them to text files. So for example I might have 17 customer files, 17 aligned customer address files and so on. At a set point in the process I ask SSIS to work through a ForEach loop which configures the SQL for that file and then writes the file. This is all complicated by other requirements but it works well. So today I ask for customers, tomorrow I ask for vendors, on Monday its materials, Tuesday Service History etc etc. Brilliant application if I say so myself, one problem...
The "writes the file" bit using C# has an issue with Unicode. When I hit it with volume, it has difficulty detecting if a file needs to be saved in UNICODE or not.
So for example the first 1 of 17 files for customers all have names which are in the standard base character set (they do its old UK and NW European customers) but when it gets to file 5 it find lots of our far eastern customers. The C# code seems to be stuck not in Unicode and translates the characters as ????s. Since there are lots of customers in one place like this it stands out, but checking back I've had errors previously just not in file 1. Moreover if I adjust the order of my extract to start with the far-eastern customers its fine again. I'm a SQL coder with lots of C# experience but I am missing something fundamentally important here. A short simple explanation now will prevent future disaster - thank-you guys...
My code is below, I think what is happening is when I use
DataTable dtD = new DataTable();
it isn't actually giving me a new DataTable its just clearing down the old one. So if it decided to be not Unicode at the first dataset it continues to be that way. Remember the script is called through SSIS which controls the next iteration.
The supplied parameters are:
nTiledExtractHeaders: A set of headers for the file, its an SQL query that returns a single row. We create the file with this and append the data underneath. So could be something like
CustomerID, CustomerName
ID, Name
nTiledExtractData: This is our data-set a simple set of made up names might be
ID, Name
C000001, Ace Limited
C000002, Acme Inc
...
C045123, 中國客戶
...
C050001, 日本のお客様
...
C092345, Zyppy Zip Company
(using a comma in each list to represent the end of each data field)
nTiledFileName: Just a file name
public void Main()
{
string pos = "00";
try
{
pos = "01";
string filename = Dts.Variables["User::nTiledFileName"].Value.ToString();
pos = "02";
pos = "03";
DataTable dtH = new DataTable();
DataTable dtD = new DataTable();
pos = "04";
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter();
pos = "05";
adapter.Fill(dtH, Dts.Variables["User::nTiledExtractHeaders"].Value);
pos = "06";
adapter.Fill(dtD, Dts.Variables["User::nTiledExtractData"].Value);
pos = "07";
pos = "08";
System.Text.Encoding e = System.Text.Encoding.Unicode;
System.Collections.Generic.List<string> header = null;
pos = "09";
if (dtD.ExtendedProperties.ContainsKey("Unicode") && (bool)dtD.ExtendedProperties["Unicode"])
{
pos = "10";
e = System.Text.Encoding.Unicode;
}
pos = "11";
using (System.IO.StreamWriter file = new System.IO.StreamWriter(System.IO.File.Open(filename, System.IO.FileMode.Create), e))
{
pos = "12";
header = new System.Collections.Generic.List<string>(dtH.Columns.Count);
pos = "13";
foreach (System.Data.DataColumn heading in dtH.Columns)
{
pos = "14";
header.Add(heading.ColumnName);
}
pos = "15";
file.WriteLine(string.Join("\t", header));
pos = "16";
foreach (System.Data.DataRow headingRow in dtH.Rows)
{
pos = "17";
System.Collections.Generic.IEnumerable<string> hFields = (headingRow.ItemArray).Select(Field => Field.ToString());
pos = "18";
file.WriteLine(string.Join("\t", hFields));
}
foreach (System.Data.DataRow dataRow in dtD.Rows)
{
pos = "19";
System.Collections.Generic.IEnumerable<string> dFields = (dataRow.ItemArray).Select(Field => Field.ToString());
pos = "20";
file.WriteLine(string.Join("\t", dFields));
}
}
pos = "21";
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Events.FireError(0, "Write TDF Files - " + pos, ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}
What I have tried:
The unicode or not test is a nice to have I don't really need so I commented it out and just let everything get encoded as Unicode. My solution works, and this is a good enough work-around for this application. The fact that all the files will be Unicode isn't a problem but I am worried about why its happening...and for potential similar situations in the future how could I avoid this.
if (dtD.ExtendedProperties.ContainsKey("Unicode") && (bool)dtD.ExtendedProperties["Unicode"])
{
pos = "10";
e = System.Text.Encoding.Unicode;
}