Importing into SQL from CVS flat file source who's columns are automatically added as data scope grows (columns are variable, not fixed)

Question

1.00/5 (1 vote)

See more:

Hello,
Appreciate thoughts on importing from a CSV Flat File that changes in course of a yearly cycle. The CVS file has variable number of column as applicant pool grows.

** Specifically I want to create a table with all the fields I will need at the end of the cycle and import with an “IF, Else…” Or “CASE When …” statement checking if the source CVS Flat File has that field or not.

A little more explanation:
Early in cycle the number of School columns in CVS File might be like:

Applicant School_0 School_1 School_2 School_3
101 Dartmouth Duke null null
102 St. Johns Auburn Clemson Auburn

This same source CVS File will grow in columns like this:

Applicant School_0 School_1 School_2 School_3 … School_11
101 Dartmouth Duke null null … null
102 St. Johns Auburn Clemson Auburn … null
12034 Yale BYU Rice Baylor … Cornell

This same variable column dimension goes for many other column as well, just using “School” column as an example. I never know how many columns are growing as cycle matures.

Therefore, I will make Target Table to be imported into to have all columns at end of cycle and just import those that are present in earlier source files.
Any thoughts on how to code this once at beginning of cycle and not have to manipulate the myriad of added columns throughout cycle?

**Note: After importing source CVS file each week, I then create a relational table with Cross-Join to get all School_0 … School_?? Into one School column in SQL table. I get errors if Columns do not exist (naturally) and want to force their creation at beginning of cycle and just let them have null values until end of cycle.

Appreciate any thoughts on how I can have one set of import queries that will import based on an “IF exists”.. OR “Case When..” OR another method.

Thank you for any help,
Tony

What I have tried:

Microsoft SQL help.
Searching this and other help sites
Google search with Expert articles.
Do not find the same circumstance addressed as I have.

Posted 12-May-16 6:52am

mis777

Updated 13-May-16 4:16am

Add a Solution

Comments

Sergey Alexandrovich Kryukov 12-May-16 13:01pm

Do you mean that your database schema is supposed to get extended on some of such imports? Oh... then you should better review your architecture... :-)
—SA

mis777 12-May-16 16:36pm

Sorry Sergey, not sure what you are indicating. We have no control on source CSV file from clearing house vendor. I know what it will grow to be, just want to build to final file format with script that will import to that final format, but skip importing columns that do not yet exist, but those columns will exist later in the cycle.

TenmanS14 13-May-16 5:27am

by the look of that you just need to be creating 3 tables to import into,

applicant which will hold applicantid and other stuff if you have it
school which will hold schoolid and name and other stuff

and a 3rd table to join them which will hold applicantid and schoolid.

to import from that CSV I'd import it into a temp table and then run through that with a counter to add records to the other tables.

mis777 13-May-16 10:18am

Thank you Tenman,

I think I would run into the same problem as I get currently when I go to CROSS JOIN.

Currently I CROSS JOIN to make the CVS Flat File Data into a Relational File in SQL. So my CVS file's: School_1, School_2, ... School_x each would all be imported into to my one SQL table column "School".

I have been doing this with a script that basically creates the Relational SQL table with this general flow:

<pre lang="SQL">SELECT COLUMNS -- List my Relational SQL columns and FieldTypes to CROSS JOIN CVS Flat File columns into
INTO --creates the Relational SQL table
FROM (
SELECT * FROM(
CASE n When, Then..
FROM -- the downloaded CSV file
CROSS JOIN
SELECT... -- Union..
)
WHERE --filter out empty columns and whatever..
something
) AS Xtemp</pre>

Works wonderfully, with the exception of needing to have all the columns known and specified in the CASE n statements particularly.

If I could only have an "IF EXISTS.. THEN.. ELSE" in the CASE statement where it will skip attempt to IMPORT that columns if it does not exist, and just keep going to next line.
Then I can build the CASE n statements to include all the columns that will exist at the end of the cycle when this variable dimensioned CVS Flat File data source will be at its greatest dimension. In meantime, have the script skip those import lines when they do not yet exist in the CVS Flat file (# of columns grow throughout year)

Any way to import columns conditionally in scenario above?

I want to avoid having to manually rem out portions of script due to those columns not yet existing. Want a script that can handle the variable nature of the life of this CVS File's changing column dimensions.

I appreciate all the help.
Thank you,
Tony

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Accepted Answer · 2016-05-13T03:05:00

Similar to the suggestion from @TenmanS14

Use a "staging table" to obtain the details from the CSV - make sure that this table is dropped each time. I've mimicked that behaviour using some dummy tables

SQL

-- This mimics loading the staging table from a CSV
-- Make sure that staging_table is dropped each time
if OBJECT_ID('staging_table') is not null DROP TABLE staging_table
select * into staging_table from csv2

So staging_table now contains as many columns as have been provided by the clearing house this time around. We can generate a comma separated list of those columns thus

SQL

-- Derive a list of columns in the current version of the staging_table
DECLARE @listCols VARCHAR(MAX)
SELECT @listCols = COALESCE(@listCols+',' ,'') + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME  = 'Staging_Table'

Which for my sample looks like Applicant,School_0,School_1,School_2,School_3,School_4,School_5

You can then use that to generate the dynamic SQL needed to populate the target table

SQL

-- Generate the sql to populate the target table ... NB Nvarchar
DECLARE @sql nvarchar(max)
SET @sql = 'INSERT INTO [target] (' + @listCols + ') SELECT ' + @listCols + ' FROM staging_table'

-- Load the data into the target
exec sp_executesql @sql

7 lines of code that should be able to handle all of your files