Click here to Skip to main content
15,887,331 members
Please Sign up or sign in to vote.
4.00/5 (2 votes)
I am importing huge amount of data from CSV files into MSSQL Server 2008. I am using core JDBC (Without any ORM frameworks) and communicating with the DB using the driver 'sqljdbc4.jar' provided by Microsoft.

As of now am importing the records one by one. Which takes considerable amount of time even though am using PreparedStatement.

I am not using batch import since I need to log the exact error into the Error file.

Kindly suggest any an idea to improve the performance with out sacrificing the accurate error logging. And I am forced to do this without any ORM tools.

Here is the sample code:

public void importCSV(){
   // Create a db connection if its null or closed.
   // Create PreparedStatement objects for selects and inserts if null or closed.

   for (Map<String, String> csvRecord : csvAsList) {
      // Prepare category object using csvRecord.
      // Check whether category exists in by.
      // Import files can have up to 1,00,000 records so tracking errors is critical.
      try{
         categoryDAO.findByName(categoryName,<PreparedStatement object>);
      }
      catch(Exception exp){
         // log this to error.csv file
      }
      // If its a new category import it to the db.
      try{   
         categoryDAO.insert(category,<PreparedStatement object>);
      }
      catch(Exception exp){
         // log this to error.csv file
      }
   }
   // Close PreparedStatement objects
   // Close DB Connection
}

public Category insert(Category category,PreparedStatement pstmt ) throws SQLException{

   if (category == null) {
      return null;
   }
   ResultSet rs = null;
   try {
      pstmt.setInt(1, category.getField1());
      pstmt.setString(2, category.getField2());
      int result = pstmt.executeUpdate();
      if (result < 1) {
         return null;
      }
      rs = pstmt.getGeneratedKeys();
      if (rs.next()) {
         category.setId(rs.getInt(1));
      }
   } finally {
      if (rs != null)
         rs.close();
   }
   return category;
}


Note:
The time taken for inserting 42390 records + (8 * 42390) select operations is 6.30 mins(approx). Is it a good figure ? Or still is there any way to improve the performance even better ?


Thanks in advance.
Posted
Updated 21-Sep-12 3:11am
v2
Comments
Ashraff Ali Wahab 21-Sep-12 10:05am    
You can split the list into configurable size and use the thread to instert the data.

1 solution

 
Share this answer
 
Comments
pasztorpisti 21-Sep-12 14:06pm    
+5 bulk insert is nice magic. however I would mention a funny story: on a machine the regional/language settings contained comma (',') as a decimal point. For this reason excel saved the floating point data so that the decimal point was ',' in the cvs file (just like the separators of course). The saved data was corrupted so badly even excel was unable to load the data back correctly from the file and we succeeded to mess up the database as well. +few hours of manual work to cleanup. :-)
Mehdi Gholam 21-Sep-12 14:15pm    
Cheers!
leejoyprakash 26-Sep-12 8:38am    
Thank you Mehdi Gholam for your answer. This is really a useful piece of information.

Sorry that I forget to explain about my CSV file structure, in my case, the contents of one CSV file doesn't belongs to one table.
The CSV file is as follows:

COL 1,CAT 1,CAT 2,CAT 3,CAT 4,ITEM,DESCRIPTION
Val 1,Val 2,Val 3,Val 4,Val 5,Val 6,Val 7

The requirements are as follows:

1. COL 1 belongs to table1. First I need to check whether it exists, if not skip the record by tracking this in Error.csv
2. CAT 1 to CAT 4 belongs to table2. Check whether each of this exists, if not import into table2.
3. Item & Description belongs to table3. Check if exists, if not import into table3.

So I assume that, it not possible to do a bulk import for a CSV with values belonging to more than one table. Please correct me if I am wrong.

Thanks once again for your answer which really was a new idea for me.

Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900