Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Steganography 14 - What Text Lists, GIF Images, and HTML Pages have in common

0.00/5 (No votes)
22 Jan 2005 1  
A simple way to hide binary data in any kind of list

Introduction

What do a GIF image, an HTML page, and a shopping list have in common? They all are or contain lists that don't require a specific order of elements. The colours in the GIF image's palette can have any order, just as the attributes in an HTML tag, or the words on your shopping list on the kitchen table.

In every list with count elements, we can hide count-1 bits just by sorting the items. No tricks, no complicated formulas. Let us begin with a simple list of words, then proceed to GIF image palettes, and at last to HTML documents. The algorithm is always the same, because all lists are the same.

The first step is to force the list items into a specific order, which can be alphabetically, or a customized sorting. Then the sorted list is re-sorted, depending on the bits of the secret message.

If you know the default sorting, you can sort the list again and compare the stegano-list with the sorted list. The differences tell you everything about the message-bits that produced the word order:

Line by Line

You don't really need a computer for list-steganography. Take a piece of paper and a pencil, and write down nine animals. No, this is not the beginning of a psycho test, it is a carrier document. The words need a standard sorting we can refer to when re-sorting, so we sort them alphabetically:

  1. Bird
  2. Cat
  3. Dinosaur
  4. Dog
  5. Fish
  6. Horse
  7. Rabbit
  8. Sheep
  9. Unicorn

Without any mathematical tricks, nine list entries can hide 9-1 = 8 bits of information, for example, the ASCII character 'c' (99 or 01100011). Each list item (except the last one) represents one bit. For each 1 bit, we move the first item from the original list into the new list, for 0 bits, we move one of the other list items into the new list. Starting with the highest bit, the process works like that:

Original List        New List

Bird                 ---
Cat                  ---
Dinosaur             ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

Bird                 Dinosaur
Cat                  ---
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---


Hide '1' - Move the first item

---                  Dinosaur
Cat                  Bird
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  Rabbit
Fish                 ---
Horse                ---
---                  ---
Sheep                ---
Unicorn              ---

... ... and so on ... ...

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
Sheep                Horse
---                  ---

No not-first item for a zero-bit left,
that means, no capacity for more bits.
Copy the last item, anyway.

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
---                  Horse
---                  Sheep

The new list contains the same items as the original list and an additional sub-content which only we know about. The hidden byte can be read again, if we work through the same process the other way round.

Sorted List          Carrier List

Bird                 Dinosaur
Cat                  Bird
Dinosaur             Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list, so the hidden bit was '0'.
Note that down, and remove the item from both lists.

Bird                 ---
Cat                  Bird
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Bird' from the carrier list is the first item in the sorted list => '01'.

---                  ---
Cat                  ---
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Cat' from the carrier list is the first item in the sorted list => '011'.

---                  ---
---                  ---
---                  ---
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list => '0110'.
... ... and so on for all animals ... ...

The Latin alphabet is not a good default sorting, because everybody would try that one first. I suggest you mix up the letters and write a customized alphabet, before you sort your list. We can do the same sorting thing in C#:

public StringCollection Hide(String[] lines, Stream message, String alphabet)
{
 //sort the lines according to a custom alphabet

 StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
 StringCollection resultList = new StringCollection();

 int messageByte = message.ReadByte();
 bool messageBit = false;
 int listElementIndex = 0;
 Random random = new Random();

 //for each byte of the message

 while (messageByte > -1) {
       //for each bit

       for (int bitIndex=0; bitIndex<8; bitIndex++) {

           //decide which line is going to be the next one in the new list


           messageBit = ((messageByte & (1 << bitIndex)) > 0) ? true : false;

           if (messageBit) {
              //pick the first line from the remaining original list

              listElementIndex = 0;
           }else{
              //pick any line but the first one

              listElementIndex = random.Next(1, originalList.Count);
           }

           //move the line from old list to new list


           resultList.Add(originalList[listElementIndex]);
           originalList.RemoveAt(listElementIndex);
       }

       //repeat this with the next byte of the message

       messageByte = message.ReadByte();
 }

 //copy unused list elements, if there are any

 foreach (String s in originalList) {
         resultList.Add(s);
 }

 return resultList;
}

Given a list and an alphabet, reverting the process is easy:

public Stream Extract(String[] lines, String alphabet)
{
 //initialize empty writer for the message

 BinaryWriter messageWriter = new BinaryWriter(new MemoryStream());

 StringCollection carrierList = new StringCollection();
 carrierList.AddRange(lines);
 carrierList.RemoveAt(carrierList.Count - 1);

 //sort -> get original list

 StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
 String[] unchangeableOriginalList = new String[originalList.Count];
 originalList.CopyTo(unchangeableOriginalList, 0);

 int messageBit = 0;
 int messageBitIndex = 0;
 int messageByte = 0;

 foreach (String s in carrierList) {

         //decide which bit the entry's position hides


         if (s == originalList[0]) {
            messageBit = 1;
         }else{
            messageBit = 0;
         }

         //remove the item from the sorted list

         originalList.Remove(s);

         //add the bit to the message

         messageByte += (byte)(messageBit << messageBitIndex);

         messageBitIndex++;
         if (messageBitIndex > 7) {
            //append the byte to the message

            messageWriter.Write((byte)messageByte);
            messageByte = 0;
            messageBitIndex = 0;
         }
 }

 //return message stream

 messageWriter.Seek(0, SeekOrigin.Begin);
 return messageWriter.BaseStream;
}

Colourful Bits

Every indexed bitmap contains a list that can be abused just that way. These two palettes belong to the same GIF picture, one is the original, the other one carries a hidden text of 31 characters:

Again, the first thing we need is a palette with a default sorting. In this example, we will sort the colours by their ARGB values.

public Bitmap Hide(Bitmap image, Stream message)
{
 //list the palette entries an integer values

 int[] colors = new int[image.Palette.Entries.Length];
 for (int n = 0; n < colors.Length; n++) {
     colors[n] = image.Palette.Entries[n].ToArgb();
 }

 //initialize empty list for the resulting palette

 ArrayList resultList = new ArrayList(colors.Length);

 //initialize and fill list for the sorted palette

 ArrayList originalList = new ArrayList(colors);
 originalList.Sort();

Many pixels are linked to the palette entries, and we will have to change those, too. So, whenever we move a colour to the new palette, we should map the old index to the new index.

 //initialize list for the mapping of old indices to new indices

 SortedList oldIndexToNewIndex = new SortedList(colors.Length);

Now that the lists are finished, we can dive into the message and move one colour for each bit...

 Random random = new Random();
 int listElementIndex = 0;
 bool messageBit = false;
 int messageByte = message.ReadByte();

 //for each byte of the message

 while (messageByte > -1) {
       //for each bit

       for (int bitIndex = 0; bitIndex < 8; bitIndex++) {

           //decide which color is going to be the next one in the new palette


           messageBit = ((messageByte & (1 < bitIndex)) > 0) ? true : false;

           if (messageBit) {
              listElementIndex = 0;
           }else{
              listElementIndex = random.Next(1, originalList.Count);
           }

...but don't forget from which position in the original palette the entries came! There are thousands of pixels waiting for updated colour indices.

           //log change of index for this color

           int originalPaletteIndex = Array.IndexOf(colors, 
                           originalList[listElementIndex]);
           if( ! oldIndexToNewIndex.ContainsKey(originalPaletteIndex)) {
               //add mapping, ignore if the original palette 

               //contains more than one entry for this color

               oldIndexToNewIndex.Add(originalPaletteIndex, resultList.Count);
           }

           //move the color from old palette to new palette

           resultList.Add(originalList[listElementIndex]);
           originalList.RemoveAt(listElementIndex);
     }

     //repeat this with the next byte of the message

     messageByte = message.ReadByte();
     }
 //copy unused palette entries

 foreach (object obj in originalList) {
    int originalPaletteIndex = Array.IndexOf(colors, obj);
    oldIndexToNewIndex.Add(originalPaletteIndex, resultList.Count);
    resultList.Add(obj);
 }

 //create new image

 Bitmap newImage = CreateBitmap(image, resultList, oldIndexToNewIndex);
 return newImage;
}

The corresponding Extract method works very similar to a combination of the Hide and the Extract methods for text lists. It sorts the palette, reconstructs the message, and then removes the message from the image (returns an image with a palette, in default sorting). This is the palette from the example above the code, after the message has been extracted and removed:

Tags full of Text Lists

An HTML document itself is no sortable list, the tags must stay in their order. But inside the tags, there are attributes, and they can be sorted. That means, every tag in an HTML page can store Attributes.Count-1 bits. Most tags have just enough attributes for one or two bits, but we can distribute the message's bits over all tags of the page.

public void Hide(String sourceFileName, String destinationFileName, 
                                 Stream message, String alphabet) {
       //initializations skipped

       // ... ... ...


       //list the HTML tags

       HtmlTagCollection tags = FindTags(htmlDocument);

       //loop over tags

       foreach (HtmlTag tag in tags) {

       //write beginning of the tag

       insertTextBuilder.Remove(0, insertTextBuilder.Length);
       insertTextBuilder.AppendFormat("<{0}", tag.Name);

       //list attribute names

       String[] attributeNames = new String[tag.Attributes.Count];
       for (int n = 0; n < attributeNames.Length; n++) {
          attributeNames[n] = tag.Attributes[n].Name;
       }

From here, the code is nearly the same as in the other Hide methods:

       //get default sorting

       StringCollection originalList = Utilities.SortLines(attributeNames, alphabet);
       StringCollection resultList = new StringCollection();

       if (tag.Attributes.Count > 1) {
          //the tag has capacoty for one or more bits


          for (int n = 0; n < attributeNames.Length - 1; n++) {

              //get next bit of the message


              bitIndex++;
              if (bitIndex == 8) {
                 bitIndex = 0;
                 if (messageByte > -1) {
                    messageByte = message.ReadByte();
                 }
              }

              if (messageByte > -1) {

                 //decide which attribute is going 

                 //to be the next one in the new tag


                 messageBit = 
                   ((messageByte & (1 << bitIndex)) > 0) ? true : false;

                 if (messageBit) {
                    listElementIndex = 0;
                 }else{
                    listElementIndex = random.Next(1, originalList.Count);
                 }
              }else{
                 listElementIndex = 0;
              }

              //move the attribute from old list to new list

              resultList.Add(originalList[listElementIndex]);
              originalList.RemoveAt(listElementIndex);
          }
       }

       if (originalList.Count > 0) {
          //add the last element - it never hides data

          resultList.Add(originalList[0]);
       }

The sorted attributes have to be written back into the document. Most of the code has been copied from the previous article in this series, the one only about HTML attributes.

       HtmlTag.HtmlAttribute attribute;
       foreach (String attributeName in resultList) {
              attribute = tag.Attributes.GetByName(attributeName);

              insertTextBuilder.Append(" ");
              if (attribute.Value.Length > 0) {
                 insertTextBuilder.AppendFormat("{0}={1}", 
                         attribute.Name, attribute.Value);
              }else{
                 insertTextBuilder.Append(attributeName);
              }
       }

       //replace old tag with new tag

       //.. ... ...

}

Have you noticed that the three Hide methods do nearly the same? The Extract methods are just as similar. Please look them up in the complete source code, if you don't have a binary allergy, yet. ;-)

The Demo Application

The application consists of three "half duplex dialogs". You can either enter an empty carrier as a message for hiding, or a carrier for extracting. For text lists and images, you can see the result immediately.

In the dialogs for text lists and HTML documents, you can select an "alphabet file". That is a text file with a customized alphabet. See testdata/demo.txt for a possible custom alphabet.

For any other IDE than Visual Studio 2005/C# Express or SharpDevelop, please unzip the .NET 1.1 version, open an empty project, and add all code files.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here