Click here to Skip to main content
15,867,964 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm more familiar with SQL, so I thought I would reach out for help using C#.

My objective is to call a C# script from a SQL Server SSIS package which parses through a webpage for downloadable links starting and ending with 2 substrings that I know will not change.

The webpage is here: PatentsView Data Download[^]

I'd like to find every instance in the HTML that starts with "http://www.patentsview.org/data/" and ends with ".tsv.zip". For the moment this is my main challenge (the next challenges will be 1) saving these as variables or something of the sort in SSIS, 2) downloading them, 3) unzipping them, and 4) loading them to a SQL Server database.). Mainly focused on parsing the HTML at this point, though.

Does anyone have input on how to do this? Please keep in mind that I have never used C# before, but I have have a moderate amount of experience coding in other languages.

Best
Nico

What I have tried:

I have tried using third party SSIS components, but I believe using script tasks is the best way.
Posted
Updated 15-Jul-18 22:16pm
Comments
gggustafson 15-Jul-18 13:48pm    
I'm not sure what you mean by "C# script". C# is a programming language not a scripting language. Could you mean JavaScript script?
#realJSOP 16-Jul-18 4:05am    
He's talking about a script task in an SSIS package.
gggustafson 16-Jul-18 10:26am    
Is that type of task referred to as a "C# script"?
#realJSOP 16-Jul-18 10:46am    
Well, starting with SQL Server 2008, you could choose between VB.net and C# for script tasks. He did mention he was writing an SSIS package in his question. Technically, it's a "script task", and I'm not really sure on this, but I think the code in a script task is interpreted on the fly, so it might be more accurate to call it a "c# script".

I'm not a SSIS expert, but I did stay at a Holiday Inn Express last night. :)
gggustafson 16-Jul-18 10:50am    
Thanks for the insight.

1 solution

Scraping a web page is fraught with danger, because the page format could change at any time in the future, thus breaking your package. However, keep in mind that html is actually nothing more than xml, and parsing it is a simple matter. There are also libraries available, such as Html Agility Pack | HAP[^] that can make your parsing life much easier.

Once you've scraped your file names, download the files, and unzip them in the script task, and then create an importer package to import the data into your database.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900