Click here to Skip to main content
15,887,302 members
Articles / Database Development / MySQL
Tip/Trick

Get a Customized Table Structure With Non Null Fields Using Dynamic SQL

Rate me:
Please Sign up or sign in to vote.
4.00/5 (1 vote)
9 Apr 2020CPOL3 min read 4.4K   3  
In Data Engineering, supporting Data Science, Data Mining and Reporting tasks, it is useful to get only fields that have data. We don't mind nulls values and we are grateful if this field's structure is dynamic. These ones might be a stat in the set of empty's kingdom.
How I can get back from query only not null fields? Yes, you can do it easily with a modern programming language, but you may lose resources by getting redundant information from tables and, many times, it could add redundant code. This task is only oriented to solve in MySQL since version 5.3.

Introduction

You have a lot of information with a tons of spreadsheets, csv, xml, bin, logs or plain text files and after a lite oversight, there are several columns inside the tables that have no values or, of course, NULL values. For desktop users, it could be only an annoyance. On another hand, for Cloud users, you may not have cheap resources and costs only pay low bill. And for DBA, those NULLS are a lot of space of cent 'spents' that have no data. But the fact is that ETL jobs only bring data, not guilty here. The present article solved this situation using a very high popular MySQL database engine, for purpose of didactical way.

Background

I present an example of the situation with the help of a database called cinemashop, based on a streaming video store service company. It is running on MySQL, but it would be functional also for MariaDB and Percona forks. Work further could be advanced from here for these ones.

Using the Code

Inside cinemashop database, there is a customer table with several columns as the row size can allowed put there (at date, over 1000 columns).

Image 1

MySQL ETL jobs put the information there and only it is necessary to take data from columns with no null values. The next figure shows a small content of the first registers:

Image 2

So, it is worthy checking if all the registers in the field have at least solely one data. This one data alone could change the model of the datalake / datawarehouse / reporting service input, may be 1% of the population. Due to this situation, the following procedure called check_field_null generates this behaviour. The procedure uses PREPARE, EXECUTE and DEALLOCATE statements to manage SQL dynamically. It stores the result of the COUNT scalar function in QN variable. Thus, it is possible to get the value of how many register are NULL values.

SQL
/*======================================================
    Classroom:    SQL Level 3
    Database:    cinemashop
  ======================================================*/
use cinemashop;
drop procedure if exists check_field_null;

set delimiter //

create procedure check_field_null(col varchar(64), schemaname varchar(255), _
                                  tablename varchar(255), out QN int)
BEGIN
    SET @sSQL = concat('SELECT  @N := COUNT(*) FROM ', schemaname, '.', _
                        tablename , ' WHERE (', col, ' <=> NULL);');
    prepare stm from @sSQL;
    execute stm;
    set QN =@N;
    deallocate prepare stm;
    
END
//

set delimiter ;

The following code illustrates the action of the above mentioned function. With respect to customer_id field, it shows that there are values.

SQL
set @p = 0; 
call check_field_null('customer_id', 'cinemashop', 'customer', @p); 

/* This field customer_id has NOT null values */ 

select @p; 

Image 3

By another hand, if a field like gender has only NULL values, the expected result is checked by the counter:

SQL
set @p = 0;
call check_field_null('gender', 'cinemashop', 'customer', @p);

/* This field gender has null values */

select @p;

Image 4

Finally, the work consists of getting the field name of each column in the table. Today, it could be a data structure, but tomorrow, it could have changed on demand, and, of course, NULL values may be presented there. This way uses CURSOR technique inside a stored procedure called cur_cs_customer. It only works if information_schema is allowed to fetch, so, some permissions will be required there. Note that Count_Null variable is used in the same way of our previous p variable. Also, allcols variable stores all fields from each column in the MYCOL variable that do not have any NULL values.

SQL
/*======================================================
    Classroom:    SQL Level 3
    Database:    cinemashop
    Table name:    customer
  ======================================================*/
use cinemashop;
drop procedure if exists cur_cs_customer;

set delimiter //

create procedure cur_cs_customer(inout allcols varchar(255))
BEGIN
    DECLARE Count_Null  int default 0;
    DECLARE initial INT DEFAULT 0;
    DECLARE MYCOL   char(64);
    DECLARE ch_done INT DEFAULT 0;
    DECLARE cs_cur1 CURSOR FOR SELECT C.COLUMN_NAME
                               FROM information_schema.COLUMNS C
                               WHERE C.TABLE_SCHEMA = 'cinemashop' _
                               AND C.TABLE_NAME ='customer';
    
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET ch_done = true;
    
    open cs_cur1;
    
    read_cs_cur1:
    LOOP
        FETCH cs_cur1 INTO MYCOL;
    
        IF (ch_done ) THEN
            LEAVE read_cs_cur1;
        END IF;
    
        IF NOT isnull(MYCOL) THEN
            
            call check_field_null(MYCOL, 'cinemashop', 'customer', Count_Null);
            
            if Count_Null = 0 then
                /* Only it includes fields with not null values */
                set initial = initial + 1;
            
                if initial = 1 then
                    SET allcols =  MYCOL;
                else                
                    SET allcols  = concat( cast(allcols as char(255)), ',', MYCOL);
                end if;
            end if;
        END IF;
        
    END LOOP read_cs_cur1;
    
    close cs_cur1;
    
    select allcols;
END
//

Put all together, we can invoke with non interactive statements using @my_args variable like a string of all columns with no NULL values.

SQL
set delimiter ; 
call cur_cs_customer(@my_args); 
select @my_args; 
set @stm = concat('SELECT ', @my_args, ' FROM cinemashop.customer;'); 
PREPARE stmt1 FROM @stm; 
execute stmt1; 
deallocate prepare stmt1;

When the process runs, it gets the following output:

Image 5

Of course, cur_cs_customer procedure can be created with dynamic SQL for generic use, but we may check this adventure later. My purpose for this moment is only to get a primer solution.

Points of Interest

MySQL is not my favorite database engine, but, a lot of large websites have this RDBMS due to simplicity with the use of many programming languages from web development. Also, my job is focused on platforms integrations where the cookbook has some tasks of ETL, Datawarehouse, Data Pipelines, Data lakes, Data hubs and, of course, Databases for Data Mining & Data Science tasks.

History

  • 9th April, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer
Colombia Colombia
Systems Engineer. Computer Scientist. Data Development Consultant currently supporting on mission critical systems. Current academic researcher for application of Numerical Analysis and Genetic Algorithms on Data Mining techniques for Cloud Computing's Database As A Service (DaaS) in Computational Physics area.

Comments and Discussions

 
-- There are no messages in this forum --