Squeeze Sort: A New Sorting Technique

Mohammad Ashraful Alam

3.19/5 (32 votes)

Mar 18, 2006

CPOL

7 min read

65510

This article presents a new sorting technique. According to a certain characteristic of the distribution of data elements, performance of sorting can be highly improved.

1. Introduction

1.1 A New Sorting Algorithm

One of the most common and fundamental problems in computer science is ordering a list of items which is well known as ‘Sorting’. Considering information processing performed by millions of computers along the whole world, ‘Sorting’ takes a very important role.
As it gained its importance in real world problems, it also gets the attention of computer scientists. As a matter of fact, various types of sorting algorithms have been developed. Some well known sorting algorithms are Bubble sort, Heap sort, Insertion sort, Merge sort, Quick sort, Selection sort, Shell sort, etc. [1-4]. An implementer uses a certain algorithm depending on the characteristics of distribution of the data elements or on some other context.
My algorithm, however, can provide optimized performance in a context where it is known that the data elements are distributed in an order that, there exists a considerable low amount of sorted subsets. Computer applications, that contain data with the characteristic like this and need to be sorted, can be highly benefited using ‘Squeeze Sort’ algorithm.

1.2 The Concept

The concept beneath the algorithm is, say, given a global data array as d [0: n-1] with n number of unsorted data and another global data array as s [0: n-1], which initially doesn't contain any data elements. On each call of the algorithm with one argument ‘t’ (where t<=n), it basically does two things.

First, it forms a sorted sequence p [0…k-1] (where k<=t<=n) from d [0: t-1]. Thus, the unsorted sequence d [0: t-1] is compressed as a new unsorted sequence as d [0: t-k-1]. As the name ‘Squeeze Sort’ implies, the amount of data elements keep getting Squeezed.

Second, the sorted sequence p [0: k-1] is then merged to an existing sorted sequence s [0: j-1] (where j<=n), and thus a new sorted sequence will be formed as s [0: j+k-1].
The algorithm will be recursively called until the unsorted sequence becomes (conceptually) empty. Finally there is a sorted sequence s [0: n-1], which is actually the desired sorted data.

2. Understanding the Detail

We have two global arrays named data [0: n-1] and sorted [0: n-1], where n indicates the number of elements to be sorted. We have another 2 global data named sorted_so_far and unsorted_so_far which indicate the number of elements sorted so far and the number of elements unsorted so far respectively.

Initially, data [0: n-1] contain n unsorted elements, whereas sorted [0: n-1] is conceptually empty. My algorithm will finally keep the desired sorted data into sorted [0: n-1].
The algorithm Squeeze_Sort has an argument named ‘size’, which indicates the number of elements to be sorted.

Here is the algorithm:

/*global data */
n, sorted_so_far=0,unsorted_so_far=n;
data[n];sorted[n];

/*merge function */
merge (start, end) {

/*
This function merges the newly found sorted sequence 
(sorted [start: n-1] and sorted [sorted_so_far: end-1] together consecutively) 
with the existing sorted array item sorted [0: sorted_so_far ], 
thus increasing the number of sorted elements. */

h=0;j=start;i=unsorted_so_far;
mid=sorted_so_far-1;

while(h<=mid) {
 if(sorted[h]>=sorted[j]) 
  data[i++]=sorted[h++];
  else { 
  data[i++]=sorted[j++];
    if(j==n)j=mid+1;
  if(j==end) break;
   }  //else ends
}  //while ends

if(h>mid)//copy second s
 for(k=j;;) {
   data[i++]=sorted[k++]; //problem here
   if(k==n)k=mid+1;
   if(k==end)break;
 } //for ends
else for( k=h; k<=mid;) 
 data[i++]=sorted[k++];

sorted_so_far=0;

for(k=unsorted_so_far<n; k++)
 sorted[sorted_so_far++]=data[k];

}// merge function ends 

/* squeeze sort */
Squeeze_Sort(size)
{
/*This algorithm sorts the data array containing ‘size’ number of elements. */
start=n; end=sorted_so_far;
max=data[0];min=data[0];
unsorted_so_far=0;

for(i=0;i<size; i++)
 if(data[i] >= max ) 
  sorted[--start]=max=data[i];
 else if (data[i] < min)   
  sorted[end++]=min=data[i];
 else 
  data[unsorted_so_far++]=data[i];
merge(start, end);

if (unsorted_so_far>0)
 Squeeze_Sort(unsorted_so_far);

} // squeeze sort ends

2.1 Compressing the Unsorted Data

The first portion of the algorithm checks consecutive maximum and minimum with respect to the first element of data [0: size-1] (i, e, data [0]) and stores the result in a sorted sequential manner. This results in compression of data as data [0:unsorted_so_far-1] with unsorted data and two array items sorted [start: n-1] and sorted [sorted_so_far: end-1] together, consecutively form a sorted sequence.

2.2 Extending the Number of Sorted Elements

The second portion of the algorithm merges the newly found sorted sequence (sorted [start: n-1] and sorted [sorted_so_far: end-1] together consecutively) with the existing sorted array item sorted [0: sorted_so_far], thus increasing the number of sorted elements.

2.3 Recursion Until All the Elements are Sorted

The algorithm Squeeze_Sort will be recursively called until the number of unsorted elements becomes not greater than zero.

3. Simulation

The data blocks below show the simulation process of the sorting, as described in this section.

Initial Data:

data [0:unsorted_so_far]: 2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3
sorted: <empty>

First Execution of Squeeze Sort:

sorted [start: end]: 1, [2], 40, 100, 102
sorted [0:sorted_so_far]: 1, 2, 40, 100, 102
data [0:unsorted_so_far]: 9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3

Second Execution of Squeeze Sort:

sorted [start: end]: 3, 5, 7, [9], 22, 54, 71, 79
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102
data [0:unsorted_so_far]: 8,11,53,66

Finally:

sorted [start: end]: [8], 11, 53, 66
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102
data [0:unsorted_so_far] : <empty>

To simulate the execution process of Squeeze Sort simply, let’s go through the example.
We initialize the data [0: n-1] array with some random data.

data [0:16]: {2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3}

sorted [0:16] :{< empty>}.

When the execution of Squeeze Sort begins, based on the first element in data [0:16], a sorted sequence is generated and stored with respect to the sorted sequence as sorted [start: n-1] and sorted [sorted_so_far: end-1] consecutively. The term start indicates the starting index, whereas the term end indicates the ending tag, of newly found sorted array. The remaining data elements that can’t be involved are stored in data [0:unsorted_so_far] (i.e., the unsorted data getting compressed!). As the name implies, the term sorted_so_far indicates the number of elements that have been sorted so far and stored in sorted[0: sorted_so_far-1], whereas the term unsorted_so_far indicates the number of elements that have not been sorted so far and stored in data[0: unsorted_so_far-1].

During the first execution, we get a new sorted sequence as sorted [start: end] = {1, [2], 40, 100, 102} which was generated based on the first element of the previous data [0:unsorted_of_far-1]. This sequence then merged to the existing sorted array sorted [0:sorted_so_far] (which is empty now). As well as, we get the remaining unsorted data in data [0:unsorted_so_far] = {9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3}.
Since there is some data remaining as unsorted, a recursion of Squeeze Sort will be happen.

During the second execution of Squeeze Sort, we get the further compressed data as data [0:unsorted_so_far] = {8, 11, 53, 66} and a more sorted sequence as sorted [0:sorted_so_far] = {1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102}.

The third execution of the Squeeze Sort gives us the desired result as sorted [0:16] = {1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102}.

4. Complexity

4.1 Time Complexity

As usual, the time complexity of the Squeeze Sort depends on the characteristics of the distribution of the data element. However, the actual time complexity of this algorithm is proportional to the number of times its recursion occurred. Without considering the recursion, the general time complexity is O (n). So, it is clear to say that, considering recursion, if the number of recursion is k, then the time complexity can be defined as T (n) = k.O (n).

As the value of k can be decreased, more performance can be achieved.
So, the best case time complexity can be achieved (as shown in the Figure 1), if k=1.
That is, T (n) =1.O (n).

We have to consider the worst case as well.
The data set, containing the characteristics, as shown in Figure 2, will produce a worst case time complexity. For example, if the data elements are distributed as the manner below,
data [0: n-1] = {1, 100, 2, 99, 3, 98, 4, 97 …}, then for each time the algorithm is executed (considering recursion), the data array will be compressed into only 2 elements. Then, the complexity will be,

T (n) = 2 + 4 + …………+ (n-4) + (n-2) + n

=O (n²).

A situation that depicts the worse case complexity O (n²) of Squeeze Sort

Figure 1: A situation that depicts the best case complexity O (n) of Squeeze Sort

A situation that depicts the worse case complexity O (n²) of Squeeze Sort

Figure 2: A situation that depicts the worse case complexity O (n²) of Squeeze Sort

Comparing with two well known algorithms, we can get a rough estimate about the time complexity of Squeeze Sort. In the table below, I compared the time complexity of Quick Sort (average case time complexity is O (nlogn)) [1-4] and Bubble Sort (average case time complexity is O (n²)) [2] with Squeeze Sort.

Table: Comparative Results (Time Complexity)

Data Size (000)	Quick Sort	Squeeze Sort	Bubble Sort
30	.054945	.21978	3.901099
35	.054945	.32967	5.10989
70	.10989	.604396	9.450549
77	.10989	.659341	10.384615
105	.164835	.879121	13.846154
112	.164835	.989011	14.835165
154	.21978	1.208791	20.27475
200	.274725	1.273626	28.813188

4.2 Space Complexity

Since Squeeze Sort requires an extra array with the same size of the data array, 2n memory space is needed, so the space complexity is S (n) =O (n).

5. Conclusion

Although the sorting problem is old enough, it still retains the attention of the researchers. According to the various characteristics of the distribution of the real world data element and different situations that are faced by the programmers, more effective and creative ideas considering sorting usually provide a broader way.

I hope, researchers in this area will find my algorithm ‘Squeeze Sort’ useful to expand their better and creative ideas.

Reference

[1] Horowitz E. and Sahni S., “Fundamentals of Computer Algorithms”, Galgotia Pubs Ltd., New Delhi, 1995.
[2] Edward M. Reingold, “Data Structures”: Page 384, Little Brown and Company, Boston, 1999.
[3] Robert L. Kruse, “Data Structures and Program Design in C”, Prentice-Hall, Inc., 1991.
[4] Thomas H. Cormen, “Introduction to Algorithms”, MIT Press, 1990.

History

18^th March, 2006: Initial post