|
Sander Rossel wrote: Last week, it took 17 tries to import one file.
The funny thing is, it ultimately works every time.
Would dumping it into a blank temp table, then doing the work from there after it is all in make things better? I would not be concerned about 5 to 10 (or even more) seconds for something that takes a lot of background checking.
|
|
|
|
|
David O'Neil wrote: Would dumping it into a blank temp table, then doing the work from there after it is all in make things better? Yeah, something like that would be my "instant" solution.
The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.
|
|
|
|
|
I wouldn't waste another minute on this. If one of my DABs was working on it I would redirect them. Find something that run 10,000 a day.
|
|
|
|
|
And if you were me, I'd be out of a job for such short-sightedness.
That they don't use it much does not mean it's not important.
One CSV can result in 16 invoices, no invoices no money...
Now 16 invoices, five times a day, equals 80 invoices and 80 invoices is enough to keep you busy for the day.
Uploading five files should take about five minutes, but right now it takes them half an hour or even longer, and that means other people are waiting too...
So yeah, I am going to "waste" minutes on this vital task that they perform every day and which doesn't work most of the time!
In fact, the client called me twice to tell me this is a top priority and should be fixed ASAP.
|
|
|
|
|
ha! Always keep the factory running! Always.
Charlie Gilley
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
Has never been more appropriate.
|
|
|
|
|
You're consistently getting a timeout on the import? That is the problem that needs to be fixed.
From your description the upload worked until you added a lot of processing to the upload. Timeout errors can be fixed by changing the environment or changing the process. Since you probably can't change the environment, fix the process by segregating the upload (insert into temp table), and then process the file.
|
|
|
|
|
5 seconds is not terrible. If you really wanted to make it faster, you could try inserting it into a smaller table, and then merge those new records into the existing table with a background job?
To err is human. Fortune favors the monsters.
|
|
|
|
|
honey the codewitch wrote: If you really wanted to make it faster, you could try inserting it into a smaller table, and then merge those new records into the existing table with a background job? I'd rather insert the raw data and handle everything else in a background job.
The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.
I'm not a fan of having the same table twice and using the one as some sort of staging for the other (if that's what you're suggesting).
|
|
|
|
|
Whenever dealing with a fragile system (ie the web, international WAN) I would eliminate it first. As others have said load the data into a temp table just to get it off the web, process the crap out of it and then dump it into the destination table in its final form.
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|
|
My feeling is that the database schema needs redesign - specifically, adding indexes and keys.
If you have an field which is supposed to be unique, define it as a unique key in the database table. The database will then ensure that the field is unique in the table, with no extra work required.
Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes.
IOW, let the database engine do what it's best at.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Ditto to Dan's words.
When you do the redesign consider having some sort of self-balancing binary tree as records are added to keep search times fast.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
jmaida wrote: some sort of self-balancing binary tree as records are added
That's the database engine's worry.
Another thing that would help maintain the database's consistency would be normalizing the database. This uses separate tables to (ideally) ensure that all data is stored only once, using unique keys to link instances. For example, you would have a table containing {country id, country name}, and any other table needing to store a country name would refer to it by the country id. The results of a query must be constructed by reading from multiple tables.
Note that this redesign can be complex, and can in some cases lead to slower retrieval.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes. Bit of mansplaining there
But yeah, I actually checked my indexes and found one with a very high impact on performance.
Removed an unused one too.
I'm not going to add a unique key, as there are around 20 fields that, together, should be unique.
I feel like that would hurt performance more than it would add to it.
That's why I'm checking that manually, because I can search for potentially double data on a specific index, which I need to do anyway to let the user know they're uploading double data (which is now actually allowed, since last month).
Removed some data analysis at this step too.
I'm storing some redundant data so I don't have to analyze it later.
Turns out, with a well placed index, the analysis is instant later on.
|
|
|
|
|
Sander Rossel wrote: Bit of mansplaining there
Please accept my apologies.
I know that you are a developer of some years experience, but you pressed my "teaching" button, and I tend to over-explain at times.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: and I tend to over-explain at times
Better than the opposite, which I tend to do at times.
|
|
|
|
|
A few other ideas:
* Check for index fragmentation. If it's high that will hurt performance. In SSMS, right click on Indexes and select either Rebuild or Reindex to see what the values are. Then OK to do that action.
* Even if you need 20 columns to make a unique index, then do it as a composite key. That will still perform better than you manually doing the check. Manual checks may also have race conditions between the check and the insert.
* When creating indexes, don't forget about included columns. These are columns that are not a part of the index, but are retrieved with it. It allows for your index to be small and fast, but you get the data you need faster.
* Use the Execution Plan in SSMS to see where your bottlenecks are on the database side. Sometimes it will also offer index suggestions.
Enjoy!
Bond
Keep all things as simple as possible, but no simpler. -said someone, somewhere
|
|
|
|
|
|
That's impressive
|
|
|
|
|
Never tried it, as we don't use SQL Server anymore, but it looks very promising
|
|
|
|
|
I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources?
Since you have a lot of error handling per your own admission I'd think you could add some logging into the mix to see where you are spending your time.
But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced.
Off the cuff thoughts.
Charlie Gilley
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
Has never been more appropriate.
|
|
|
|
|
charlieg wrote: these values are text? numeric? Text, numbers, decimals, dates.
charlieg wrote: Just how large is the overall file? I have an initial file of 248 KB.
charlieg wrote: What's the connection speed between the web site and the azure based system? They both run in Azure in the same region, so I suspect the connection is fast.
charlieg wrote: Does your azure system have sufficient resources? Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.
charlieg wrote: see where you are spending your time. Inserting 2500 lines.
await context.BulkInsertAsync(lines); is the exact line
Although I suspect there may be some other long running queries in production.
Possibly getting all the lines for a specific date.
I'm looking at my new code now (and added an index) and the inserting takes the longest by far (a rewrite of this functionality was necessary for other reasons too, I just gave it more priority because of this).
charlieg wrote: Solve the timeout issue, and I'd give it 50/50 your performance issues go away. The timeout issue is the performance issue
|
|
|
|
|
I suspect your problem is that Entity Framework tries to be "intelligent".
My first guess is that turning off AutoDetectChangesEnabled would solve your problem.
But seriously, I would skip the whole BulkInsert thingy and go directly to SqlBulkCopy instead.
|
|
|
|
|
Jörgen Andersson wrote: I suspect your problem is that Entity Framework tries to be "intelligent". Yeah, using "vanilla" EF takes minutes to insert 2500 rows, so that's not an option.
I'm using the EFCore.BulkExtensions library for this one.
Jörgen Andersson wrote: and go directly to SqlBulkCopy instead Wouldn't I have to insert before I can copy?
|
|
|
|
|
No, you just need to read the CSV-file to an IEnumerable<t> of sorts and connect it to an EntityDataReader that you use as an input to SqlBulkCopy.
EntityDatareader is a part of System.Data.EntityClient.
Or you can use a CSV-Reader[^] that you connect directly to SqlBulkCopy.
|
|
|
|
|
The BulkInsert is using SqlBulkCopy internally.
Using SqlBulkCopy directly is about equally fast.
|
|
|
|