Click here to Skip to main content
15,881,898 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi Guys,
I have a string of hex characters in a column on my SQL Server.
Each row in the database is a backup of an email from my Exchange server.

One column is basically the entire contents of an email.
I need to decoded the hex back into readable characters so I can perform other analysis on the email.

I'm part way to achieving the decode part. I have figured out that the hex string is also compressed. I have turned off the compression so I only have to solve one thing at a time.

I have a simple console application that takes the string from the standard input and writes to standard output.

So far so good. It works when I turn compression off.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace HexToText
{
    class Program
    {
        static void Main(string[] args)
        {
            TextReader textReader = Console.In;
            string input = textReader.ReadToEnd();
            string result = hexStringToString(input);
            Console.WriteLine(result);
        }

        static string hexStringToString(string hexString)
        {
            if (hexString == null || (hexString.Length & 1) == 1)
            {
                throw new ArgumentException();
            }
            var sb = new StringBuilder();
            for (var i = 0; i < hexString.Length; i += 2)
            {
                var hexChar = hexString.Substring(i, 2);
                if (hexChar != "0x")
                {
                    sb.Append((char)Convert.ToByte(hexChar, 16));
                }
            }
            return sb.ToString();
        }
    }
}


The uncompressed hex string starts like this...
0x52656365697665643A2066726F6D20434F4D50414E595352562E63616D70736965<snip>
Decoded Result = Received: from COMPANYSRV.campsie

The compressed hex string starts like this... (not the same email as above but it is going to be very similar)
0xCD396B8C23C959E599BB7DCCED240B977B26DCD47995DBDBBB69BBDBAFB17B6<snip>

I figure that I must first decode it back into a binary file then decompress it. Either my little program doesn't decode it properly back into a binary file or I haven't yet guessed what the backup software has used to compress the text.

I'm Using Visual Studio 2008 and SQL Server 2008 R2 and the backup software is Policy Patrol.

The database table also has a column named "ZIPCOMPRESSION" and is 1 or 0 depending on wether compression is on or off.

Does anyone have any ideas how I can read the compressed text.

Thanks in advance for any suggestions.
David

What I have tried:

I have tried to open the resulting file with 7zip and Windows built in ZIP.
Posted
Updated 2-Jun-16 20:42pm
v2
Comments
Garth J Lancaster 1-Jun-16 3:30am    
I'm pretty sure MS use a proprietary format/algorithm for compression in SQL Server - I've seen a UDF built that does 'standard' compression, but thats not going to help you

Use SQL to do it: DECOMPRESS (Transact-SQL)[^] - since it compressed it, the safest way to decompress it is to get it to do it...
 
Share this answer
 
Comments
David_Pollard 1-Jun-16 20:39pm    
Hi OriginalGriff, That link points to SQL 2016 and I'm using 2008 R2. It is only one column in my table that is compressed. Not the whole table or Index. Let me know if I'm missing something.
I have sniffed the SQL using a profiler when the application writes to the database and it is a straight SQL Insert command with nothing special about any column. This is why I assumed that the application is doing the compression. Good idea but. :)
David_Pollard 1-Jun-16 20:45pm    
I also sniffed the SQL when reading the data back from the database into the application and no sign of any decompression there either. Just two normal looking select statements.
OriginalGriff 2-Jun-16 2:49am    
So how is it being inserted originally?
David_Pollard 3-Jun-16 0:04am    
Sorry, I haven't explained that part very well. I'm using an application named Policy Patrol that creates a backup in SQL Server of every email coming and going in the organisation. There is an option within that software that compresses the data in one column only, named RFC822_msg. If I turn the compression off it would be a breeze but it would also reduce the amount of history I can store by around half.
OK,
I think I have come up with a workaround solution.
The RFC822_msg column that I'm trying to decode is actually a FileStream column.
That is to say it is a varbinary(max) column with FileStream enabled and the actual data is stored directly in the NTFS file system outside of the database.mdb file.

After much reading of best practices and "never do this's" I have enabled NTFS file compression on the FileStream data folder and turned off compression in the Backup application.

I know this could potentially have an unknown performance impact as I have moved the compression / decompression from one place to another but who knows it may get better :) Overall performance isn't a big deal as there is almost no online user interaction with the backup database. I plan to do the reporting as a scheduled task and if someone really wants to find and restore an archived email guess who they are going to ask! The search routing for locating messages in the backup doesn't look through this compressed data in any case.

There could be an initial growth in my overall storage size as the file system tried to compress already compressed data.

I can't report on historic data as it is still compressed but this is no big deal as the users are interested in reporting on trends over the last week or so.

No I have to put my money where my mouth is and make it happen but at least I have a workable plan now.

Thanks to all those who thought about this problem :)

I'll mark this as an answer once I get the project to the next stage.


David

Update
This approached worked. The only thing that was still confusing was that I needed to used a convert SQL clause to retrieve the data as text although it was still Hex encoded just no longer binary.

SQL
CONVERT(VARCHAR(MAX),dbo.PP4_MAILBACKUP.RFC822_msg, 1) AS RFC822_txt


Then the data that was returned now was simply hex encoded text. I could then use my hexToString code above to convert it into readable text and perform my analysis. I'd expect there is a built in way to do this conversion somewhere but I tend to stop looking once I find a solution that works :)

My data size didn't explode either with disk usage about the same.

I hope this helps someone else.
David
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900