Click here to Skip to main content
15,881,172 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hello!
I've been studying Python for the last weeks and I'm trying to implement a way to use Python to improve SQL queries using the GPU (CUDA). But was talking with a Oracle engineer in a convention and he said that this kind of improvement was possible with Python, as it has libraries ready to work with GPU. So what I need it to improve the query time, like a select or view. Right now, let's say, a select we have takes 1 minute to select all rows from our database. I want to use Python to use the GPU parallelism to improve this query time. I read that you can use que PyCuda to make vectors and matrix calculation. What I don't know is if you can use it, or any library to make the SQL query run using the GPU instead of the CPU, as the database engine usually does. As the Python layer it would be over the database integrated with out software. I don't know if this helps. I Googled around but didn't find anything specific. Can anyone can confirm this? Anyone have any resources that I can read, watch, study on the matter? Any information is helpful! Thank you very much!

What I have tried:

I saw that there are a few softwares that make this kind of improvement, like MapD, BlazingDB and Kinetica (I haven't fully read about them though).
Posted
Comments
David_Wimbley 22-Jan-18 16:21pm    
It sounds like you might be using oracle since you talked with an oracle engineer? Unless you are aggregating data for reporting purposes, if you are trying to show massive amounts of data in a UI all at once without pagination I would rethink what you are trying to do.

But Anyways, you can try and speed up your sql results all day long but if response time is that big of an issue that you need to run against the GPU i would be inclined to look at the schema first and see if the db server has hardware problems or review indexes on the tables to see if some need to be added or removed.

How many rows does your table have? You indicate a select statement took 1 minute to return all rows in your table...For it take a full minute it better have a few million rows...to put this in perspective I've got a poorly indexed table with over 7 million rows that took 1 minute to return those rows.

SQL is designed to handle huge amounts of data but if you are trying to run reports or aggregate that data in a production database, I would either look into setting up a reporting database that allows you to do your work with no impact to production or look into making your indexes more efficient. There is a lot out that about indexes and given i don't know your exact DB i won't go to far into that but indexes will help your query speeds greatly.

All this to say, rather than going and trying to implement sql gpu acceleration...it is important to evaluate what the true cause of your slowness is rather than building something on top of it as a band aid.

The other suggestion is, if you need this kind of speed, maybe look into NoSQL db options.
HSakamoto 22-Jan-18 17:06pm    
First of all, thanks for the reply David!
I would say that I'm 98% sure that our queries are well tuned. Just giving you some context, we have a tax calculation software, so we pretty much need to make sums and work with views using many tables, and as much most of the time we have the results in an acceptable time, some of our clients insert 5 million rows/day in some tables, and for these ones which have a huge amount of data, we would like to, at least try, to solve this problem with GPU parallelism, instead of beef up their hardware in general.
Of course this data is used for reports but as well to bookkeeping, and send it to IRS.
Our software use C#, but I'm looking for a Python solution, as I was told by the engineer, that Python could do that easily (and now I know that isn't that easy).
And I would ask, if you know any other way to use GPU with SQL database, I would like to know, so I can try implement the solution and run some tests.
Again, thanks for your help and time!
an0ther1 22-Jan-18 20:21pm    
Am not sure you would see any improvement unless the entire DB engine was running on the GPU. When a query is executed there is the client processing time & the server processing time, since the client passes the query off to the database server for the actual execution you need to look at where the slowdown is.
High client processing time can be due to slow network, poor performance, low specs or it could be due to the presentation itself.
High server time can be caused by insufficient resources to handle a large number of concurrent requests, poor disk performance, inappropriate indexing or poor query design.
I have a table that contains ~2.9 million rows and is approx 2GB in size. A select * on this table takes ~1 minute.
The Client processing time is the majority & the server wait time is ~9 seconds.

Do some performance monitoring - I typically run a tune every 3-4 months and log all queries that take more than 'x' seconds to complete. These are then tracked down to determine the why.

Kind Regards

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900