Steganography 14 - What Text Lists, GIF Images, and HTML Pages have in common

Corinna John

Rate me:

4.83/5 (31 votes)

22 Jan 2005CPOL4 min read

77.4K

2.2K

A simple way to hide binary data in any kind of list

Introduction

What do a GIF image, an HTML page, and a shopping list have in common? They all are or contain lists that don't require a specific order of elements. The colours in the GIF image's palette can have any order, just as the attributes in an HTML tag, or the words on your shopping list on the kitchen table.

In every list with count elements, we can hide count-1 bits just by sorting the items. No tricks, no complicated formulas. Let us begin with a simple list of words, then proceed to GIF image palettes, and at last to HTML documents. The algorithm is always the same, because all lists are the same.

The first step is to force the list items into a specific order, which can be alphabetically, or a customized sorting. Then the sorted list is re-sorted, depending on the bits of the secret message.

If you know the default sorting, you can sort the list again and compare the stegano-list with the sorted list. The differences tell you everything about the message-bits that produced the word order:

Line by Line

You don't really need a computer for list-steganography. Take a piece of paper and a pencil, and write down nine animals. No, this is not the beginning of a psycho test, it is a carrier document. The words need a standard sorting we can refer to when re-sorting, so we sort them alphabetically:

Bird
Cat
Dinosaur
Dog
Fish
Horse
Rabbit
Sheep
Unicorn

Without any mathematical tricks, nine list entries can hide 9-1 = 8 bits of information, for example, the ASCII character 'c' (99 or 01100011). Each list item (except the last one) represents one bit. For each 1 bit, we move the first item from the original list into the new list, for 0 bits, we move one of the other list items into the new list. Starting with the highest bit, the process works like that:

Original List        New List

Bird                 ---
Cat                  ---
Dinosaur             ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

Bird                 Dinosaur
Cat                  ---
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---


Hide '1' - Move the first item

---                  Dinosaur
Cat                  Bird
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  Rabbit
Fish                 ---
Horse                ---
---                  ---
Sheep                ---
Unicorn              ---

... ... and so on ... ...

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
Sheep                Horse
---                  ---

No not-first item for a zero-bit left,
that means, no capacity for more bits.
Copy the last item, anyway.

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
---                  Horse
---                  Sheep

The new list contains the same items as the original list and an additional sub-content which only we know about. The hidden byte can be read again, if we work through the same process the other way round.

Sorted List          Carrier List

Bird                 Dinosaur
Cat                  Bird
Dinosaur             Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list, so the hidden bit was '0'.
Note that down, and remove the item from both lists.

Bird                 ---
Cat                  Bird
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Bird' from the carrier list is the first item in the sorted list => '01'.

---                  ---
Cat                  ---
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Cat' from the carrier list is the first item in the sorted list => '011'.

---                  ---
---                  ---
---                  ---
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list => '0110'.
... ... and so on for all animals ... ...

The Latin alphabet is not a good default sorting, because everybody would try that one first. I suggest you mix up the letters and write a customized alphabet, before you sort your list. We can do the same sorting thing in C#:

public StringCollection Hide(String[] lines, Stream message, String alphabet)
{
 //sort the lines according to a custom alphabet
 StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
 StringCollection resultList = new StringCollection();

 int messageByte = message.ReadByte();
 bool messageBit = false;
 int listElementIndex = 0;
 Random random = new Random();

 //for each byte of the message
 while (messageByte > -1) {
       //for each bit
       for (int bitIndex=0; bitIndex<8; bitIndex++) {

           //decide which line is going to be the next one in the new list

           messageBit = ((messageByte & (1 << bitIndex)) > 0) ? true : false;

           if (messageBit) {
              //pick the first line from the remaining original list
              listElementIndex = 0;
           }else{
              //pick any line but the first one
              listElementIndex = random.Next(1, originalList.Count);
           }

           //move the line from old list to new list

           resultList.Add(originalList[listElementIndex]);
           originalList.RemoveAt(listElementIndex);
       }

       //repeat this with the next byte of the message
       messageByte = message.ReadByte();
 }

 //copy unused list elements, if there are any
 foreach (String s in originalList) {
         resultList.Add(s);
 }

 return resultList;
}

Given a list and an alphabet, reverting the process is easy:

public Stream Extract(String[] lines, String alphabet)
{
 //initialize empty writer for the message
 BinaryWriter messageWriter = new BinaryWriter(new MemoryStream());

 StringCollection carrierList = new StringCollection();
 carrierList.AddRange(lines);
 carrierList.RemoveAt(carrierList.Count - 1);

 //sort -> get original list
 StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
 String[] unchangeableOriginalList = new String[originalList.Count];
 originalList.CopyTo(unchangeableOriginalList, 0);

 int messageBit = 0;
 int messageBitIndex = 0;
 int messageByte = 0;

 foreach (String s in carrierList) {

         //decide which bit the entry's position hides

         if (s == originalList[0]) {
            messageBit = 1;
         }else{
            messageBit = 0;
         }

         //remove the item from the sorted list
         originalList.Remove(s);

         //add the bit to the message
         messageByte += (byte)(messageBit << messageBitIndex);

         messageBitIndex++;
         if (messageBitIndex > 7) {
            //append the byte to the message
            messageWriter.Write((byte)messageByte);
            messageByte = 0;
            messageBitIndex = 0;
         }
 }

 //return message stream
 messageWriter.Seek(0, SeekOrigin.Begin);
 return messageWriter.BaseStream;
}

Colourful Bits

Every indexed bitmap contains a list that can be abused just that way. These two palettes belong to the same GIF picture, one is the original, the other one carries a hidden text of 31 characters:

Again, the first thing we need is a palette with a default sorting. In this example, we will sort the colours by their ARGB values.

public Bitmap Hide(Bitmap image, Stream message)
{
 //list the palette entries an integer values
 int[] colors = new int[image.Palette.Entries.Length];
 for (int n = 0; n < colors.Length; n++) {
     colors[n] = image.Palette.Entries[n].ToArgb();
 }

 //initialize empty list for the resulting palette
 ArrayList resultList = new ArrayList(colors.Length);

 //initialize and fill list for the sorted palette
 ArrayList originalList = new ArrayList(colors);
 originalList.Sort();

Many pixels are linked to the palette entries, and we will have to change those, too. So, whenever we move a colour to the new palette, we should map the old index to the new index.

//initialize list for the mapping of old indices to new indices
SortedList oldIndexToNewIndex = new SortedList(colors.Length);

Now that the lists are finished, we can dive into the message and move one colour for each bit...

Random random = new Random();
int listElementIndex = 0;
bool messageBit = false;
int messageByte = message.ReadByte();

//for each byte of the message
while (messageByte > -1) {
      //for each bit
      for (int bitIndex = 0; bitIndex < 8; bitIndex++) {

          //decide which color is going to be the next one in the new palette

          messageBit = ((messageByte & (1 < bitIndex)) > 0) ? true : false;

          if (messageBit) {
             listElementIndex = 0;
          }else{
             listElementIndex = random.Next(1, originalList.Count);
          }

...but don't forget from which position in the original palette the entries came! There are thousands of pixels waiting for updated colour indices.

           //log change of index for this color
           int originalPaletteIndex = Array.IndexOf(colors, 
                           originalList[listElementIndex]);
           if( ! oldIndexToNewIndex.ContainsKey(originalPaletteIndex)) {
               //add mapping, ignore if the original palette 
               //contains more than one entry for this color
               oldIndexToNewIndex.Add(originalPaletteIndex, resultList.Count);
           }

           //move the color from old palette to new palette
           resultList.Add(originalList[listElementIndex]);
           originalList.RemoveAt(listElementIndex);
     }

     //repeat this with the next byte of the message
     messageByte = message.ReadByte();
     }
 //copy unused palette entries
 foreach (object obj in originalList) {
    int originalPaletteIndex = Array.IndexOf(colors, obj);
    oldIndexToNewIndex.Add(originalPaletteIndex, resultList.Count);
    resultList.Add(obj);
 }

 //create new image
 Bitmap newImage = CreateBitmap(image, resultList, oldIndexToNewIndex);
 return newImage;
}

The corresponding Extract method works very similar to a combination of the Hide and the Extract methods for text lists. It sorts the palette, reconstructs the message, and then removes the message from the image (returns an image with a palette, in default sorting). This is the palette from the example above the code, after the message has been extracted and removed:

Tags full of Text Lists

An HTML document itself is no sortable list, the tags must stay in their order. But inside the tags, there are attributes, and they can be sorted. That means, every tag in an HTML page can store Attributes.Count-1 bits. Most tags have just enough attributes for one or two bits, but we can distribute the message's bits over all tags of the page.

public void Hide(String sourceFileName, String destinationFileName, 
                                 Stream message, String alphabet) {
       //initializations skipped
       // ... ... ...

       //list the HTML tags
       HtmlTagCollection tags = FindTags(htmlDocument);

       //loop over tags
       foreach (HtmlTag tag in tags) {

       //write beginning of the tag
       insertTextBuilder.Remove(0, insertTextBuilder.Length);
       insertTextBuilder.AppendFormat("<{0}", tag.Name);

       //list attribute names
       String[] attributeNames = new String[tag.Attributes.Count];
       for (int n = 0; n < attributeNames.Length; n++) {
          attributeNames[n] = tag.Attributes[n].Name;
       }

From here, the code is nearly the same as in the other Hide methods:

//get default sorting
StringCollection originalList = Utilities.SortLines(attributeNames, alphabet);
StringCollection resultList = new StringCollection();

if (tag.Attributes.Count > 1) {
   //the tag has capacoty for one or more bits

   for (int n = 0; n < attributeNames.Length - 1; n++) {

       //get next bit of the message

       bitIndex++;
       if (bitIndex == 8) {
          bitIndex = 0;
          if (messageByte > -1) {
             messageByte = message.ReadByte();
          }
       }

       if (messageByte > -1) {

          //decide which attribute is going
          //to be the next one in the new tag

          messageBit =
            ((messageByte & (1 << bitIndex)) > 0) ? true : false;

          if (messageBit) {
             listElementIndex = 0;
          }else{
             listElementIndex = random.Next(1, originalList.Count);
          }
       }else{
          listElementIndex = 0;
       }

       //move the attribute from old list to new list
       resultList.Add(originalList[listElementIndex]);
       originalList.RemoveAt(listElementIndex);
   }
}

if (originalList.Count > 0) {
   //add the last element - it never hides data
   resultList.Add(originalList[0]);
}

The sorted attributes have to be written back into the document. Most of the code has been copied from the previous article in this series, the one only about HTML attributes.

       HtmlTag.HtmlAttribute attribute;
       foreach (String attributeName in resultList) {
              attribute = tag.Attributes.GetByName(attributeName);

              insertTextBuilder.Append(" ");
              if (attribute.Value.Length > 0) {
                 insertTextBuilder.AppendFormat("{0}={1}", 
                         attribute.Name, attribute.Value);
              }else{
                 insertTextBuilder.Append(attributeName);
              }
       }

       //replace old tag with new tag
       //.. ... ...
}

Have you noticed that the three Hide methods do nearly the same? The Extract methods are just as similar. Please look them up in the complete source code, if you don't have a binary allergy, yet. ;-)

The Demo Application

The application consists of three "half duplex dialogs". You can either enter an empty carrier as a message for hiding, or a carrier for extracting. For text lists and images, you can see the result immediately.

In the dialogs for text lists and HTML documents, you can select an "alphabet file". That is a text file with a customized alphabet. See testdata/demo.txt for a possible custom alphabet.

For any other IDE than Visual Studio 2005/C# Express or SharpDevelop, please unzip the .NET 1.1 version, open an empty project, and add all code files.