Click here to Skip to main content
15,891,828 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hello,
Appreciate thoughts on importing from a CSV Flat File that changes in course of a yearly cycle. The CVS file has variable number of column as applicant pool grows.

** Specifically I want to create a table with all the fields I will need at the end of the cycle and import with an “IF, Else…” Or “CASE When …” statement checking if the source CVS Flat File has that field or not.

A little more explanation:
Early in cycle the number of School columns in CVS File might be like:

Applicant School_0 School_1 School_2 School_3
101 Dartmouth Duke null null
102 St. Johns Auburn Clemson Auburn


This same source CVS File will grow in columns like this:

Applicant School_0 School_1 School_2 School_3 … School_11
101 Dartmouth Duke null null … null
102 St. Johns Auburn Clemson Auburn … null
12034 Yale BYU Rice Baylor … Cornell


This same variable column dimension goes for many other column as well, just using “School” column as an example. I never know how many columns are growing as cycle matures.

Therefore, I will make Target Table to be imported into to have all columns at end of cycle and just import those that are present in earlier source files.
Any thoughts on how to code this once at beginning of cycle and not have to manipulate the myriad of added columns throughout cycle?

**Note: After importing source CVS file each week, I then create a relational table with Cross-Join to get all School_0 … School_?? Into one School column in SQL table. I get errors if Columns do not exist (naturally) and want to force their creation at beginning of cycle and just let them have null values until end of cycle.

Appreciate any thoughts on how I can have one set of import queries that will import based on an “IF exists”.. OR “Case When..” OR another method.

Thank you for any help,
Tony

What I have tried:

Microsoft SQL help.
Searching this and other help sites
Google search with Expert articles.
Do not find the same circumstance addressed as I have.
Posted
Updated 13-May-16 4:16am
Comments
Sergey Alexandrovich Kryukov 12-May-16 13:01pm    
Do you mean that your database schema is supposed to get extended on some of such imports? Oh... then you should better review your architecture... :-)
—SA
mis777 12-May-16 16:36pm    
Sorry Sergey, not sure what you are indicating. We have no control on source CSV file from clearing house vendor. I know what it will grow to be, just want to build to final file format with script that will import to that final format, but skip importing columns that do not yet exist, but those columns will exist later in the cycle.
TenmanS14 13-May-16 5:27am    
by the look of that you just need to be creating 3 tables to import into,

applicant which will hold applicantid and other stuff if you have it
school which will hold schoolid and name and other stuff

and a 3rd table to join them which will hold applicantid and schoolid.

to import from that CSV I'd import it into a temp table and then run through that with a counter to add records to the other tables.
mis777 13-May-16 10:18am    
Thank you Tenman,

I think I would run into the same problem as I get currently when I go to CROSS JOIN.

Currently I CROSS JOIN to make the CVS Flat File Data into a Relational File in SQL. So my CVS file's: School_1, School_2, ... School_x each would all be imported into to my one SQL table column "School".

I have been doing this with a script that basically creates the Relational SQL table with this general flow:

<pre lang="SQL">SELECT COLUMNS -- List my Relational SQL columns and FieldTypes to CROSS JOIN CVS Flat File columns into
INTO --creates the Relational SQL table
FROM (
SELECT * FROM(
CASE n When, Then..
FROM -- the downloaded CSV file
CROSS JOIN
SELECT... -- Union..
)
WHERE --filter out empty columns and whatever..
something
) AS Xtemp</pre>

Works wonderfully, with the exception of needing to have all the columns known and specified in the CASE n statements particularly.

If I could only have an "IF EXISTS.. THEN.. ELSE" in the CASE statement where it will skip attempt to IMPORT that columns if it does not exist, and just keep going to next line.
Then I can build the CASE n statements to include all the columns that will exist at the end of the cycle when this variable dimensioned CVS Flat File data source will be at its greatest dimension. In meantime, have the script skip those import lines when they do not yet exist in the CVS Flat file (# of columns grow throughout year)

Any way to import columns conditionally in scenario above?

I want to avoid having to manually rem out portions of script due to those columns not yet existing. Want a script that can handle the variable nature of the life of this CVS File's changing column dimensions.

I appreciate all the help.
Thank you,
Tony

1 solution

Similar to the suggestion from @TenmanS14

Use a "staging table" to obtain the details from the CSV - make sure that this table is dropped each time. I've mimicked that behaviour using some dummy tables
SQL
-- This mimics loading the staging table from a CSV
-- Make sure that staging_table is dropped each time
if OBJECT_ID('staging_table') is not null DROP TABLE staging_table
select * into staging_table from csv2

So staging_table now contains as many columns as have been provided by the clearing house this time around. We can generate a comma separated list of those columns thus
SQL
-- Derive a list of columns in the current version of the staging_table
DECLARE @listCols VARCHAR(MAX)
SELECT @listCols = COALESCE(@listCols+',' ,'') + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME  = 'Staging_Table'

Which for my sample looks like Applicant,School_0,School_1,School_2,School_3,School_4,School_5

You can then use that to generate the dynamic SQL needed to populate the target table
SQL
-- Generate the sql to populate the target table ... NB Nvarchar
DECLARE @sql nvarchar(max)
SET @sql = 'INSERT INTO [target] (' + @listCols + ') SELECT ' + @listCols + ' FROM staging_table'

-- Load the data into the target
exec sp_executesql @sql

7 lines of code that should be able to handle all of your files
 
Share this answer
 
Comments
mis777 19-May-16 17:07pm    
Fabulous... CHill60 !!!

Now I can "Chill".
-- I Added the "Staging Table" step for each of my import files (Flat Files from Vendor) with code from your elegant solution.
-- This enables me to import into a final "end of cycle" larger data schema for each import.
-- Your solution also allows me flow to next stop where I build my CROSS-JOIN once to end of cycle column specs and not error out because Vendor Flat File Data is constantly adding column names (expanding).

I apologize for not getting back sooner, but was pulled off this project with another project needing immediate attention.

Thanks again CHill60.. only 7 lines of code and Well Done!!
Tony
CHill60 19-May-16 20:07pm    
I'm pleased it works!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900