|
Although it used SQL as the backend, I remember a Customer Relationship Management system called Maximiser that took a similar approach. There were, ISTR just two tables, one to hold all the relatively constant client data itself and one to hold the collection of notes linked to that.
In the Maximiser app there were complex joins on one table producing 'subtables' that held various views on the data. Some columns contained numbers that indicated what other columns actually held! I was given the job of moving all the data held in this system to another SQL based program.
It took ages (in the absence of any database schema documentation) to unravel the various actual combinations of joins required to get what we wanted. Here's just one query to extract a little of the info: All the tables named as a, b, c, d etc duplicate joins used in 'built-in' queries on the maximiser database.
I thought you might find an example of the stuff I had to build mildly amusing 8)
-- Build the View of the Maximiser data that shows what we want and store it
SELECT
CASE
WHEN c.Record_Type = 1 THEN c.Name
WHEN c.Record_Type = 31 THEN d.Name + ' - ' + c.First_Name + ' ' + c.Name
WHEN c.Record_Type = 2 AND len(c.Firm) > 0 THEN c.Firm
WHEN c.Record_Type = 2 AND len(c.Firm) < 1 THEN c.First_Name + ' ' + c.Name
WHEN c.Record_Type = 32 THEN
(
CASE
WHEN len(d.Firm) > 0 THEN d.Firm + ' - ' + c.First_Name + ' ' + c.Name
WHEN len(d.Firm) < 1 THEN d.First_Name + ' ' + d.Name + ' - ' + c.First_Name + ' ' + c.Name
END
)
ELSE c.Name
END AS Company,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Address_Line_1
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Address_Line_1
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Address_Line_1
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Address_Line_1
ELSE c.Address_Line_1
END AS Address_1,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Address_Line_2
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Address_Line_2
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Address_Line_2
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Address_Line_2
ELSE c.Address_Line_2
END AS Address_2,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.City
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.City
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.City
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.City
ELSE c.City
END AS City,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.State_Province
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.State_Province
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.State_Province
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.State_Province
ELSE c.State_Province
END AS State,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Zip_Code
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Zip_Code
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Zip_Code
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Zip_Code
ELSE c.Zip_Code
END AS Zip,
CASE
WHEN c.Address_Id > 0 AND c.Record_Type IN (1, 31) THEN e.Country
WHEN c.Address_Id < 1 AND c.Record_Type = 31 THEN g.Country
WHEN c.Address_Id > 0 AND c.Record_Type IN (2, 32) THEN f.Country
WHEN c.Address_Id < 1 AND c.Record_Type = 32 THEN g.Country
ELSE c.Country
END AS Country,
CASE
WHEN n.Type = 0 THEN 'Manual Note'
WHEN n.Type = 1 THEN 'Mail - Out'
WHEN n.Type = 2 THEN 'Phone Call'
WHEN n.Type = 3 THEN 'Timed Note'
WHEN n.Type = 4 THEN 'Transfer'
WHEN n.Type = 5 THEN 'Task'
WHEN n.Type = 6 THEN 'Reserved'
WHEN n.Type = 7 THEN 'Reserved'
WHEN n.Type = 8 THEN 'Opportunity'
WHEN n.Type = 12 THEN 'Customer Service'
ELSE 'Unknown'
END AS Activity_Type,
n.DateCol, n.TextCol, n.Owner_Id, n.Client_Id, n.Contact_Number, n.Note_Type,
' ' AS sndex
INTO BookerNotes
FROM
dbo.AMGR_Client_Tbl AS c
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS d ON c.Client_Id = d.Client_Id AND d.Contact_Number = 0
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS e ON c.Client_Id = e.Client_Id AND c.Address_Id = e.Contact_Number
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS f ON c.Client_Id = f.Client_Id AND c.Address_Id = f.Contact_Number
LEFT OUTER JOIN dbo.AMGR_Client_Tbl AS g ON c.Client_Id = g.Client_Id AND g.Contact_Number = 0
RIGHT OUTER JOIN dbo.AMGR_Notes_Tbl AS n ON c.Client_Id = n.Client_Id AND c.Contact_Number = n.Contact_Number
WHERE c.Record_Type IN (1, 2, 31, 32)
GO
|
|
|
|
|
Ooof, that would be nasty to maintain!
|
|
|
|
|
I just saw something similar the other day with additional complications.
The developer that inherited was trying to figure out how to make it a little more maintainable by removing some conditions that were nonsensical and others that were just bad hard codes.
|
|
|
|
|
old posting, but reddit had Key Value, then moved to what sounds like a few tables which are thing/data, so basically instead of just 1 key/value table, its many more key/value tables
|
|
|
|
|
yes, it has limited documentation ....
diligent hands rule....
|
|
|
|
|
I will double check the table column number and have very limited documents...
diligent hands rule....
|
|
|
|
|
Southmountain wrote: it has hundreds of fields
Presumably you mean 'columns'...
Southmountain wrote: what is the best way to get understanding of this table quickly?
It is unlikely there is a way to do it quickly. The number of columns suggest it is probably overloaded so there are multiple uses. Best you might be able to do quickly is determine how the data is created in the first place. And that would only be true if it is just a batch load.
|
|
|
|
|
I found MS Access and Excel, with some SQL management studio, good enough for "data analysis".
Access and Excel can connect to SQL server. You can then tap into their analytics and query ability.
There's also MS Power BI (Desktop), to top it off.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
following your ideas, I will try to load it into an Excel pivot table and play around with it...
diligent hands rule....
|
|
|
|
|
Having done this sort of thing in the past (and yes it was for the banking industry) you are going to need someone with domain knowledge, making an incorrect assumption on the relevance/relationship of a column can lead you down some nasty cul de sacs.
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|
|
Delete a column, see who complains and then get them to explain what it's for.
// TODO: Insert something here Top ten reasons why I'm lazy
1.
|
|
|
|
|
see if you can get an input screen (or a few) and some reports and open/run it for a specific record. Next you need to try and see if you can match the data for a specific record to fields on the input screens or reports. That will give you a good understanding of how some of the fields fit together.
|
|
|
|
|
Southmountain wrote: what is the best way to get understanding of this table quickly?
There isn't one. There are tools and scripts for most DBs that can create some type of analysis for you, but you really don't need to understand the fields, you need to understand the data.
We can all grouse and speculate about the "100s of fields", but let's assume there is a valid reason for them even though I'm hard pressed to come up with one.
What type of understanding are you trying to achieve? Data is data and the question is if and where it is used. I'd suspect there could be a lot of drop columns in your future, but that requires a detailed look at your recordset objects in the code that is using the DB. Honestly, it's a flat table so despite the crazy column count, it should be clear to understand. If it has a bunch of relations, that could take a lot of caffeine or alcohol, or both
What's your scope of work in relation to this monster? Crazy as it seems, the DB could be oddly efficient depending on the use of the data. You know, Select * (perish the thought!) from tablename where id=x is pretty simple, lol. If your task is to clean up and reduce the size of the database, that is one thing. If you're stuck with it, it is what it is and how the data is used is of the utmost importance.
|
|
|
|
|
Understand ‘quickly?’: probably not possible.
Step back. Look at the application and the interfaces that update the table. Depending on the database tech, there’ll be a way to search procedures for the table name. Study these procedures.
Then (or while doing the above) look at a subset of the data, such as the last day’s worth of records.
Good luck.
Time is the differentiation of eternity devised by man to measure the passage of human events.
- Manly P. Hall
Mark
Just another cog in the wheel
|
|
|
|
|
This mostly applies if it's a relational table. There a lot of excellent responses here, but one topic that I didn't see explicitely mentioned was "Normalization".
Normalize the data. You might need to have a sample of data to work(play) with to understand.
There are better resources to explain normalization further, and *some* SQL books hit the topic early (the good ones, anyway).
Normalization is at the root of what many here have indicated are problems - like too many columns in individual tables! You likely have key fields and data fields repeating all through your data. Even if you can't change the tables, understanding how it isn't normalized will help you understand the data better. (Maybe a future task will let you fix this)
> You'll need to understand the domain. It would be great if you have some people that can get you pointed in the best area. It sounds like there is a lot more complexity than can be dealt with quickly.
> Maybe there's a program that uses that table that might reveal columns content or purpose?
> Review a small sample of data in a dev/work database.
> For longer-term goals
>try to use the right terms too, especially when consulting your database expert. e.g. It might be habit to say 'fields', but in the relational database world it's a 'column'. Stuff like that will make communication more consistent.
> Maintain a group of documents, references for yourself of what you're finding - and share it.
> Topics and terms to be familiar with, or review:
> the different between a data table versus a reference table
> A Many-to-Many relationship and a "Join Table" (sometimes called a "Pin Table" etc. ...but not by me)
I didn't intend to write a book here! You have too much to read here already! I wish you success!
|
|
|
|
|
Others have mentioned researching input screens/forms and reports.
If you can capture the queries/views/procedures from the database side that are involved that could help out.
|
|
|
|
|
I work for a finance institution, and our tables are highly un-normalized with all the problems this incurs.
we have many tables with hundreds of columns.
But people in finance are used to excel like spreadsheet and they want to see everything at one place.
I came to that place with a background in databases and I was horrified and still am.
But it is hard to change legacy code AND mentalities. So for better or worse I'm stuck with it.
The best advice I can give you: find the queries people/programs make the most often, this often reveal the «subtables» within the big one.
Also if you have self join that often indicate two tables (or more) sharing the same table data.
Once you have that draw a diagram as if those tables really exist.
If you can create views that represent those tables.
This way you will better understand the «schema»
Quote: «Il faut imaginer Sisyphe heureux»
Albert Camus
|
|
|
|
|
the engineer job description. I loved it.
Also the thought that jumped in my head.
Data Scientist.
Forecasts the future using neither data nor science.
See also, Weather Forecaster
Project Manager
Not a Manager of people nor do they have only one project
To err is human to really elephant it up you need a computer
|
|
|
|
|
data scientist is a fancy term now...
diligent hands rule....
|
|
|
|
|
I bet you really like the titles:
1. DevRel (Developer Relations)
2. Developer Advocate
Here's the meme I created for DevRel (snapshot)[^].
|
|
|
|
|
Development Manager:
Safety device that operates to provide deadline-overpressure protection of a VP and CEO. If project fails because of the high circle's stupidity, those melts.
Advertise here – minimum three posts per day are guaranteed.
|
|
|
|
|
Yes, but ... weather forecasts use heavy computer models etc
it seems to me that a lot of real science is in there actually
so this metaphor could be a bit stretched ...
|
|
|
|
|
They are calling me (phone) and sending me e-mails every day.
Opening their first mail using Notepad++ and removing all \0 chars, I've seen they are trying to make me participate in a free training course to learn how to sell their products better.
A guy calling from Washington USA told me I have been so good selling their products I deserve to participate in their PARTNER SALES ACCELERATION SERVICES!
I declined his invitation during that first call, but they keep calling and sending mails.
I HAVE 2 OFFICE LICENCES!!!
|
|
|
|
|
Are you sure that it is Microsoft, and not some scammer trying to get your money? The latter being much more likely.
|
|
|
|
|
Why can't it be both?
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|