Click here to Skip to main content

Welcome to the Lounge

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

Re: Question on performance...

Daniel Pfeffer15-Oct-22 15:12

15-Oct-22 15:12

My feeling is that the database schema needs redesign - specifically, adding indexes and keys.

If you have an field which is supposed to be unique, define it as a unique key in the database table. The database will then ensure that the field is unique in the table, with no extra work required.

Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes.

IOW, let the database engine do what it's best at.

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.

Re: Question on performance...

jmaida15-Oct-22 15:53

15-Oct-22 15:53

Ditto to Dan's words.
When you do the redesign consider having some sort of self-balancing binary tree as records are added to keep search times fast.

"A little time, a little trouble, your better day"
Badfinger

Re: Question on performance...

Daniel Pfeffer15-Oct-22 19:00

15-Oct-22 19:00

jmaida wrote:
some sort of self-balancing binary tree as records are added

That's the database engine's worry.

Another thing that would help maintain the database's consistency would be normalizing the database. This uses separate tables to (ideally) ensure that all data is stored only once, using unique keys to link instances. For example, you would have a table containing {country id, country name}, and any other table needing to store a country name would refer to it by the country id. The results of a query must be constructed by reading from multiple tables.

Note that this redesign can be complex, and can in some cases lead to slower retrieval.

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.

Re: Question on performance...

Sander Rossel15-Oct-22 20:40

15-Oct-22 20:40

Daniel Pfeffer wrote:
Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes.

Bit of mansplaining there Wink | ;)

Wink | ;)

But yeah, I actually checked my indexes and found one with a very high impact on performance.
Removed an unused one too.
I'm not going to add a unique key, as there are around 20 fields that, together, should be unique.
I feel like that would hurt performance more than it would add to it.
That's why I'm checking that manually, because I can search for potentially double data on a specific index, which I need to do anyway to let the user know they're uploading double data (which is now actually allowed, since last month).

Removed some data analysis at this step too.
I'm storing some redundant data so I don't have to analyze it later.
Turns out, with a well placed index, the analysis is instant later on.

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Daniel Pfeffer15-Oct-22 23:07

15-Oct-22 23:07

Sander Rossel wrote:
Bit of mansplaining there

Please accept my apologies.

I know that you are a developer of some years experience, but you pressed my "teaching" button, and I tend to over-explain at times. Smile | :)

Smile | :)

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.

Re: Question on performance...

Jörgen Andersson16-Oct-22 20:48

Jörgen Andersson

16-Oct-22 20:48

Daniel Pfeffer wrote:
and I tend to over-explain at times

Better than the opposite, which I tend to do at times.

Wrong is evil and must be defeated. - Jeff Ello

Re: Question on performance...

Matt Bond17-Oct-22 4:56

17-Oct-22 4:56

A few other ideas:
* Check for index fragmentation. If it's high that will hurt performance. In SSMS, right click on Indexes and select either Rebuild or Reindex to see what the values are. Then OK to do that action.
* Even if you need 20 columns to make a unique index, then do it as a composite key. That will still perform better than you manually doing the check. Manual checks may also have race conditions between the check and the insert.
* When creating indexes, don't forget about included columns. These are columns that are not a part of the index, but are retrieved with it. It allows for your index to be small and fast, but you get the data you need faster.
* Use the Execution Plan in SSMS to see where your bottlenecks are on the database side. Sometimes it will also offer index suggestions.

Enjoy!

Bond
Keep all things as simple as possible, but no simpler. -said someone, somewhere

Re: Question on performance...

RickZeeland15-Oct-22 20:12

15-Oct-22 20:12

Maybe you can try this SQL index management tool from CodeProject:
SQL Index Manager – Free GUI Tool for Index Maintenance on SQL Server and Azure[^]

modified 11-Nov-22 12:22pm.

Re: Question on performance...

Sander Rossel15-Oct-22 21:10

15-Oct-22 21:10

That's impressive OMG | :OMG:

OMG | :OMG:

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

RickZeeland15-Oct-22 21:52

15-Oct-22 21:52

Never tried it, as we don't use SQL Server anymore, but it looks very promising $Shucks | :-\$

Re: Question on performance...

charlieg16-Oct-22 2:09

16-Oct-22 2:09

I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources?

Since you have a lot of error handling per your own admission Poke tongue | ;-P

Poke tongue | ;-P

I'd think you could add some logging into the mix to see where you are spending your time.

But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced.

Off the cuff thoughts.

Charlie Gilley

“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759

Has never been more appropriate.

Re: Question on performance...

Sander Rossel16-Oct-22 3:57

16-Oct-22 3:57

charlieg wrote:
these values are text? numeric?

Text, numbers, decimals, dates.

charlieg wrote:
Just how large is the overall file?

I have an initial file of 248 KB.

charlieg wrote:
What's the connection speed between the web site and the azure based system?

They both run in Azure in the same region, so I suspect the connection is fast.

charlieg wrote:
Does your azure system have sufficient resources?

Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.

charlieg wrote:
see where you are spending your time.

Inserting 2500 lines.
await context.BulkInsertAsync(lines); is the exact line Wink | ;)

Wink | ;)

Although I suspect there may be some other long running queries in production.
Possibly getting all the lines for a specific date.
I'm looking at my new code now (and added an index) and the inserting takes the longest by far (a rewrite of this functionality was necessary for other reasons too, I just gave it more priority because of this).

charlieg wrote:
Solve the timeout issue, and I'd give it 50/50 your performance issues go away.

The timeout issue is the performance issue Laugh | :laugh:

Laugh | :laugh:

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Jörgen Andersson16-Oct-22 21:11

Jörgen Andersson

16-Oct-22 21:11

I suspect your problem is that Entity Framework tries to be "intelligent".
My first guess is that turning off AutoDetectChangesEnabled would solve your problem.
But seriously, I would skip the whole BulkInsert thingy and go directly to SqlBulkCopy instead.

Wrong is evil and must be defeated. - Jeff Ello

Re: Question on performance...

Sander Rossel16-Oct-22 23:23

16-Oct-22 23:23

Jörgen Andersson wrote:
I suspect your problem is that Entity Framework tries to be "intelligent".

Yeah, using "vanilla" EF takes minutes to insert 2500 rows, so that's not an option.
I'm using the EFCore.BulkExtensions library for this one.

Jörgen Andersson wrote:
and go directly to SqlBulkCopy instead

Wouldn't I have to insert before I can copy? Confused | :confused:

Confused | :confused:

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Jörgen Andersson17-Oct-22 0:43

Jörgen Andersson

17-Oct-22 0:43

No, you just need to read the CSV-file to an IEnumerable<t> of sorts and connect it to an EntityDataReader that you use as an input to SqlBulkCopy.
EntityDatareader is a part of System.Data.EntityClient.
Or you can use a CSV-Reader[^] that you connect directly to SqlBulkCopy.

Wrong is evil and must be defeated. - Jeff Ello

Re: Question on performance...

Sander Rossel17-Oct-22 9:49

17-Oct-22 9:49

The BulkInsert is using SqlBulkCopy internally.
Using SqlBulkCopy directly is about equally fast.

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Ian Fitzgerald19-Oct-22 7:32

19-Oct-22 7:32

Sander Rossel wrote:
Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.

I ran some testing of our application with Azure SQL some time ago and found it to be very poor for an OLAP style IO bound workload, and probably checking a big index for uniqueness is similar. 50 DTU's on Basic or Standard tier is no more than 200 IOPS - DTU-based purchasing model - Azure SQL Database | Microsoft Learn[^]

For us it was better to use a SQL server on an Azure virtual machine, then we could do optimisations like striping the database across multiple smaller (unmanaged) hdd's using Windows Storage Spaces. Nowadays there are probably better managed disk options with SSD's etc to get you a decent IO performance level. Configure storage for SQL Server VMs - SQL Server on Azure VMs | Microsoft Learn[^]

Re: Question on performance...

Sander Rossel20-Oct-22 1:09

20-Oct-22 1:09

50 DTU's on Standard tier is more than enough for everything else we have.
It's not an OLAP application, it's more administrative.
It has some master data, you can import and edit orders (it's this importing that's causing me headaches) and some one-off reports and other functionality.
The import lines (about 2500 a file) are grouped, some prices are calculated and those prices are added per group in code, which is super fast.
Other than that, the lines are printed on a report.
All in all it's a pretty small application that ties the production and the financial software together.
The database currently stores 3.38 GB of data of which this import table is about 2/3rds (and that's all we do with those lines) Sigh | :sigh:

Sigh | :sigh:

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

harold aptroot16-Oct-22 2:49

16-Oct-22 2:49

Inspired by this question, I went to take a more detailed look at where the time to load a 1.2 million line, 0.5 gigabyte, CSV file goes in a particular application that I'm working on. That process takes about 5 seconds too, by some coincidence.

There are three steps: reading the file (all at once), converting the encoding, and parsing the CSV into objects.

reading the file (File.ReadAllBytes) takes about 150 milliseconds. (so we're reading the file at 3.3GB/s, which is reasonable, near the maximum sequential read speed of the SSD)
converting the encoding (aka Encoding.GetString) takes about 900 milliseconds. (at 5GHz, this comes down to 9 cycles per byte, should it take 9 cycles per byte to do encoding conversion? sounds slow to me)
parsing / creating objects takes the rest of the time. (this code is crap and takes at least 10 times as long as it should)

There's no database involved. Loading 2k rows in the same time basically comes down to saying that using a database makes loading the data over 500 times as slow - compared to code that is slow already. Is that reasonable? Databases are infamous for being pure molasses, but that sounds extreme.

On the other hand, it's only 5 seconds, not a big deal.

Re: Question on performance...

Sander Rossel16-Oct-22 4:03

16-Oct-22 4:03

harold aptroot wrote:
load a 1.2 million line, 0.5 gigabyte, CSV file

OMG | :OMG:

harold aptroot wrote:
Databases are infamous for being pure molasses, but that sounds extreme.

Well, at 2500 lines and 40 values per line, it comes down to storing 100,000 values in the correct place.
Storing the CSV directly takes a few 100 milliseconds.
The database actions are the only slow lines of code here.
Reading the CSV and putting it to objects is milliseconds (and that's using a general CSV parser that uses reflection).

harold aptroot wrote:
using a database makes loading the data over 500 times as slow

I guess that sounds about right.
Depending on how fast your DB server is, of course Wink | ;)

Wink | ;)

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Marc Clifton16-Oct-22 4:05

16-Oct-22 4:05

What David said. I have similar requirements for updating a huge table (or set of tables) with a relatively small CSV dataset. I put the CSV into a separate table, do the data validation (like duplicate checks) with SQL and indexed columns on the criteria for duplicates, and perform the resulting updates/inserts in C# and email the customer with the change log. Very very fast.

Latest Article:
Create a Digital Ocean Droplet for .NET Core Web API with a real SSL Certificate on a Domain

Re: Question on performance...

Sander Rossel16-Oct-22 8:22

16-Oct-22 8:22

How do you handle validation towards your users?
A user imports a file and wants to know why it failed (e.g. some value is invalid or the file was already (partly) imported).

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: Question on performance...

Marc Clifton17-Oct-22 5:43

17-Oct-22 5:43

Well, since these updaters run as an automated process, the customer gets an email.

Internally, all errors, warnings, and performed actions are logged, so if it were a manually initiated process, we could provide a full report to the user on the website as well.

Latest Article:
Create a Digital Ocean Droplet for .NET Core Web API with a real SSL Certificate on a Domain

Re: Question on performance...

raddevus16-Oct-22 5:13

16-Oct-22 5:13

Your answer is right here on CP: A Fast CSV Reader[^]
That article is not written by me, but I've used this component since sometime way back in 2008 or so.
It has been updated along the way.

That is a fantastic and amazingly fast (see perf stats in the article).

It's super easy to use and will solve your problems fast. I remember getting it to work in about 15 minutes. Import the component into your project & the API calls are intuitive.

Seriously try it and I'm sure it'll make you happy. Big Grin | :-D

Big Grin | :-D

Let me know what you think.

Here's a snippet of the performance from the article:

From article:
To give more down-to-earth numbers, with a 45 MB CSV file containing 145 fields and 50,000 records, the reader was processing about 30 MB/sec. So all in all, it took 1.5 seconds! The machine specs were P4 3.0 GHz, 1024 MB.

Re: Question on performance...

Sander Rossel16-Oct-22 8:20

16-Oct-22 8:20

I'm pretty sure it's a great tool, but it's not the CSV I'm having trouble with.
I get all my values in objects in mere milliseconds with my own generic CSV reader.
Besides, this project uses .NET Framework 2.0, which is like 15 versions, two complete overhauls and twenty years behind my version of .NET Hmmm | :|

Hmmm | :|

My problem is getting all those values in a database in an acceptable time frame (and 30+ seconds is not acceptable Unsure | :~

Unsure | :~

).

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

General News Suggestion Question Bug Answer Joke Praise Rant Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.