Click here to Skip to main content
15,885,365 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Originally I made a compare statement that works well. It Full joins my tables based on their table ID's. then I import an excel sheet into its own table and then compare both. However, I have not imported all my data yet. When doing research I found that if i am filling a table or tables in a data set then each table could only consist of 16,777,216 rows. However, the issue is my database is for genealogy and the SSDI (Social Security Death Index) is over 90 million names alone.

I need to be able to Full join all my tables that are in the database so i can make a true comparison when matching. mainly, I need to take the external table, the excel import, and compare this against the entire database. How can I compare my data up against hundreds of millions of individuals?

temptable.Reset()
               'first make a connection string to your excel file, if you want to search for multiple excel files to pick from.
               Dim excelConnection As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + files.FileName + ";Extended Properties=Excel 12.0;")
               Dim excmd As New OleDbCommand("Select * from [Sheet1$]", excelConnection)
               excelConnection.Open()

               Dim sheet1 As String = excelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing).Rows(0)("TABLE_NAME").ToString()
               'Dim dtExcelData As New DataTable()
               'dtExcelData.Reset()
               Dim oda As New OleDbDataAdapter((Convert.ToString("SELECT * FROM [") & sheet1) + "]", excelConnection)
               oda.Fill(temptable)
               excelConnection.Close()

               'dipose
               excelConnection.Dispose()
               oda.Dispose()
               excmd.Dispose()


then I insert using sqlbulkinsert from the temp table, but looking at this now since its a temp table I do not think I need this. I would just use the in memory table and not make this part of the database.

next I Full join all my tables

'connect to
      Dim sqlconn1 As New SqlConnection("SQLCON")
      Dim sqladaptor = New SqlDataAdapter

      sqlconn1.Open()
      sqladaptor.SelectCommand = New SqlCommand("Select Profile.Individual_ID, First_name, Middle_name, Last_name, Sex, Race, Place_of_birth, County_of_birth, State_of_birth, Date_of_birth, death.Death_ID, Place_of_death, State_of_death, County_of_death, Date_of_death, Cause_of_death, Cemetery_reference.Cemetery_ID, Cemetery_name, Cemetery_state, Cemetery_county, Cemetery_address, Cemetery_Section, Cemetery_Row, Cemetery_Lot, Grave, Burial_date from Profile full join Death On profile.Individual_ID = death.individual_ID full join Burial On Profile.Individual_ID = Burial.Individual_ID full join Cemetery_reference On Burial.Cemetery_ID = Cemetery_reference.Cemetery_ID ORDER BY Last_name ASC, First_name ASC, Date_of_birth ASC", sqlconn1)
      sqladaptor.SelectCommand.CommandTimeout = False
      sqladaptor.SelectCommand.ExecuteNonQuery()
      sqladaptor.Fill(dt)
      sqlconn1.Close()
      sqlconn1.Dispose()
      sqladaptor.Dispose()



next I rearrange my view of the temp table

Dim view As New DataView(temptable)
       view.Sort = "Last_name ASC, First_name ASC, Date_of_birth ASC"
       Dim temptab As DataTable = view.ToTable
       temptable.Clear()
       temptable = temptab



then i do my comparisons row by row

For I = 0 to temptable.rows.count
    For i2 = 0 to dt.rows.count
       'match code
    next
next


one issue that arises is the fact that a person can be buried multiple times. I am working on showing these exhumations and reburials but each of these with an inner join would create a new row. however, the burial date will now be different than the original burial date because they were reburied unless cremated after.

so far the only constants are Date of birth and date of death. you can only be dead once and be alive once.

First_name	Middle_name	Last_name	Gender	Date_of_death	Date_of_birth	Place_of_birth
WILLIAM	THAMES	HAMMES	M	1/1/1957	1/1/1900	New York
JAMES	W	ODONNELL	M	1/2/1957	1/2/1900	New York



the code will determine if the temp table has any relevant column to any of the tables that are used in the join statement. If a record exists for that individual in the table it will update missing information. If a record does not exist then a new row will be added with the relevant information related to each table.

hopefully this additional information will help out.

What I have tried:

As of right now I have done a inner join all database tables and then compared them to a bulksql inserted temp excel table. I used sqldataadapter.fill method for both the database tables and the external data. I just need help being pushed in the right direction that would show the best method to achieve what I am looking for.
Posted
Updated 16-Jun-20 10:01am
v7
Comments
Garth J Lancaster 15-Jun-20 1:30am    
I think you should modify (use 'Improve question') your question and give a few more details- eg, 'which' DB server are you using, MySql, SQL Server etc, Maybe 'abridged' schemas etc. It is very hard for 'us out here' to give you an answer based on what you have so far.

I could say 'indexing'/partitioning and/or creating views over your data as a very general technique but likely wrong approach because there isnt enough information in what you have supplied to refine an answer
Maciej Los 15-Jun-20 3:25am    
What database?
Maciej Los 17-Jun-20 4:50am    
This line (underlined part) is very strange:
Dim oda As New OleDbDataAdapter((Convert.ToString("SELECT * FROM [") & sheet1) + "]", excelConnection)

You don't need to convert string to string, because it is a string already!

What data is stored in an Excel file?

1 solution

Because you have mentioned both "bulksql" and "sqldataadapter" I'm hoping it is safe to assume that you are using a SQL Server database.

If that is the case, then there is no hard and fast rule for the size of a table, it will depend on many factors - see Maximum capacity specifications for SQL Server - SQL Server | Microsoft Docs[^]

The only place I have seen the number you quote is in regard to MongoDb BSON objects.

You say you want to be pushed in the right direction but haven't given any clear guidance on what you are comparing or joining so at best you will receive vague answers that probably won't help you at all.

- You are referring to multiple tables in your question - if you are creating tables to overcome the perceived size limitation then that is the wrong design.

- Load the data into the database in batches if that is easier - I would use bulksql for the reference data as that is a one-off exercise. Use sqldataadapter.fill for the Excel data, or bulksql again.

- When querying the database let the SQL Server take the strain - the results you are getting back to VB.net should just be a small subset of the actual data.

- Consider writing stored procedures to handle the matching process.

Beyond this, there simply isnt' enough information here for us to help you properly
 
Share this answer
 
Comments
Member 11856456 15-Jun-20 19:47pm    
I appreciate the post, could you describe what would be needed to help give clarification? I will do my best to update if I knew what was needed.
CHill60 16-Jun-20 6:14am    
As per the comments from Garth and Maciej ...
- Confirm that you are using a SQL Server database (or tell what you are using).
- What do the tables you are trying join look like - just include the relevant columns and any indexes you have set up
- What does the "temp" table look like
- Share the code that you have for the "join"
- Perhaps some sample data (not too much) and expected results to help us understand what you are expecting to see
Member 11856456 16-Jun-20 15:57pm    
I have updated the information including the sections asked about. Hopefully, this will give some better clarification.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900