|
I have a csv file in which there are 20 million records, there can be duplicate records also in file.
There is a table with 17-18 million records. now i have to read from file and match records against table and if they match then store in other table.
I was thinking of creating a window service which will do these operation. But performance is of highest priority.
what can possibly be the best approach and fastest way to read and filter 20 mil records.
|
|
|
|
|
With that volume you are going to suffer but I would suggest using an ELT approach and an SSIS package
Extract - you have done
Load - I would BCP or bulk copy the lot into a staging table in SQL Server the staging data should exactly reflect your csv and all fields should be varchar.
Transform - write a stored proc that does the transforms (delete, match, validate formats, dedupe, copy to production table).
NEVER do your transforms in code, SQL is designed to chew through bulk data doing the transforms in c# is just wrong!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
that's good idea but then one concern would be error messages would be a problem plus to read file. Also would service execute the bcp. Because files will come every hour so we need someone to watch the files and pass latest for bcp.
|
|
|
|
|
You can schedule a job to run as often as required, you can either use a timed process or I think you can use a filesystemwatcher.
Error messages should go to a log file and emailed to an operator. How is the operator supposed to respond to the errors!
What you are looking for is a full blown ELT process, the last one I built had a winforms UI that displayed the log results, the files processed with timing, any error messages.
It archived the data files to a daily folder and allowed the operator to repeat a process.
It was not a trivial project.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
i am planning to have asp.net web forms which will show logs and reports.
can you share architecture of the application you built.
do you have some performance stats as to how much time was consumed, transfer process etc..
|
|
|
|
|
nitin_ion wrote: you share architecture of the application you built
It was many years ago and proprietary to the organisation I work for, sharing would get me crucified!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
no problem and thanks anyways for providing direction.
But only concern is that we'll then be mixing business logic with database layer as there is some business rules to be run so if we go with ELT then our rules will be in procedures.
Is there a way we can segregate?
We are also planning reports from this, i know SSRS but not sure if MS SQL can handle such volumes of data.
Do you think MS can handle it 20 million every month?
|
|
|
|
|
I see no problem with having business logic in the database, especially when dealing with the transform of such volumes. Id hate to loop 20m records in c#.
I would get a professional DBA to answer the volume question and then get them to set it up. Designing the DB infrastructure for such a volume is not a trivial matter. I would also consider Oracle, as much as I loathe the database it does a better job of serious volume. Caveat Oracle will cost - a bomb!
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I agree with Mycroft. Write an import module of some sort that inserts the data from the CSV into the database.
If it is a possibility remove the succesfully imported rows from CSV afterwards (in case of updates later on). Then perform the magic through SQL queries instead.
I do something similar with 1.8 million to 8 million records in a file that is inserted in a table now counting >2x107 rows. For this the memory of the object I use to hold the data is too large for the java framework I use (don't ask ) you need to read in the file and start inserting while you go. I usually do that in batch (1000 inserts at once) instead of reading it in at once or sending to the database one by one.
Hope this helps.
|
|
|
|
|
At least we do ours using connect direct so it is server to server, the UI only initiates the process, never actually touches the data.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I have 2 identical models:
public class CredentialAPIModel
{
public string UserName { get; set; }
public string Password { get; set; }
}
and
public class CredentialEntity
{
public string UserName { get; set; }
public string Password { get; set; }
}
and my mappings
Mapper.CreateMap<CredentialEntity, CredentialAPIModel>();
Mapper.CreateMap<CredentialAPIModel, CredentialEntity>();
And this is where it fails:
public LoginAPIResponse Login(CredentialAPIModel credentials)
{
CredentialEntity credentialEntity = Mapper.Map<CredentialAPIModel, CredentialEntity>(credentials);
LoginResponse appSecurityResponse = BO.Login(credentialEntity);
.
.
.
}
I get this:
Missing type map configuration or unsupported mapping.
Mapping types:
CredentialAPIModel -> CredentialEntity
FMG.WebServiceAPI.Models.AppSecurity.Models.CredentialAPIModel -> FMG.Entities.AppSecurity.Entities.CredentialEntity
Destination path:
CredentialEntity
Source value:
FMG.WebServiceAPI.Models.AppSecurity.Models.CredentialAPIModel
Mapper.AssertConfigurationIsValid() does not find any errors.
Anyone see what's wrong here?
Thanks
If it's not broken, fix it until it is
modified 28-Oct-13 21:37pm.
|
|
|
|
|
Are you sure, your method where you create mapping is being called ?
just for testing, try to create mapping in Login method before you map as shown below :
public LoginAPIResponse Login(CredentialAPIModel credentials)
{
Mapper.CreateMap<CredentialEntity, CredentialAPIModel>();
Mapper.CreateMap<CredentialAPIModel, CredentialEntity>();
CredentialEntity credentialEntity = Mapper.Map<CredentialAPIModel, CredentialEntity>(credentials);
LoginResponse appSecurityResponse = BO.Login(credentialEntity);
}
Regards,
CodeBlack
|
|
|
|
|
Good day everyone. I am developing an application that uses Microsoft Access as database. I want that during installation, the database file should be installed in the C:\Users\<username>\Documents folder. Please, how do I go about this. Also, how can I use this path in my application when operating on the database. Any help will be appreciated. Thanks
|
|
|
|
|
string documentsPath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
Thanks alot @Richard. But I dont quite get how to include the above code in connecting to my database.
My code looks like this
string dbconnect = @"Provider = Microsoft.Jet.OleDB.4.0; Data Source = C:\Users\elfico\Documents\accounts.mdb";
Please, any help will be appreciated.
|
|
|
|
|
The setup program is likely executed by a different user. Hence it is not easy to determine the Documents folder of a specific user. If the databse could be shared among all users of that computer, I'd suggest to use a folder below ProgramData instead.
|
|
|
|
|
Thanks a lot @Bernard. I had tried to have the database file install in the Program Files and i found out that I could only read from the database, but cannot write to it. Will using the Program Data folder enable me to read/write my database. Your reply is highly appreciated.
|
|
|
|
|
Exactly that's the point. ProgramFiles is "read-only" for users, for security reasons. PrgramData (which corresponds to XP's C:\Documents and Settings\All Users) is the place where all users of a computer can access the files with read and write access.
Normally, you'll create a folder with your company name there, and below that a folder for your specific program. I.e. C:\ProgramData\YourCompany\YourFirstProgram. Use the Environment.GetSpecialFolder function, as suggested by Richard.
|
|
|
|
|
Thank you very much. Your reply was very helpful. Talking about the other method mentioned by Richard, I need a little clarification on how to include it in my code. Am a bit of a newbie. Thanks.
|
|
|
|
|
I am developing a .net windows application to fetch HTML from a particular web page. I am using WebBrowser control to launch the web page and trying to fetch the HTML source. But, what i am getting is HTML from View Source. There is difference between HTML source in View Source and in Developer Tool. please help me in fetching the HTML source from Developer Tool.
|
|
|
|
|
It's not possible to answer your question. What do you mean by "Developer Tool"?? WHat do you mean by "getting the source from the developer tool" as opposed to the HTML source the brwoser gets? How do you know the difference?? Is this a website that you control and have the source for??
|
|
|
|
|
I mean in IE if I click on f12 I get a developer tool which will have a html code. When right click on a webpage and select view source I can see the html code.
There is difference between view source html and what I see in developer tool html code.
Below is the code for accessing view source html code:
HtmlElementCollection objHtmlElementCollection = null;
HtmlDocument objHtmlDocument = webBrowser1.Document;
similarly I want to access developer tool's html code.
Please help..
|
|
|
|
|
There is no "developer tools source". What you're seeing is a debugger analysis of the same source you see in "View Source". There is no different "source" you can possibly get from the browser.
|
|
|
|
|
The code is different in view source and developer tool source.
I have a scenario where a button is being displayed as in view source, whereas in developer tool, it is displayed as . Also, when i tried accessing a web page with frames, in View Source, i could not able to see the control tags appearing inside the frame, whereas i could see those in developer tools.
|
|
|
|
|