Introduction
The last twelve articles only dealt with hiding binary data in binary files. It's getting boring, isn't it? Let's take the first text format you can think of right now, and hide binary data in such a document. You are just reading an HTML page - alright, HTML is our file format for this article!
Find a Hiding Place
We cannot insert anything into an HTML document. Whatever we insert would be either visible in the browser, or visible in the source text as useless stuff. But the order of attributes can be changed, without changing the visible document or the file's size.
<span class="bigText" style="color:#0088ff">
Text with a CSS class and special color
</span>
<span style="color:#0088ff" class="bigText">
Do you see the difference?
</span>
The example above shows two variations of the same content. Let's define a very simple key from it:
Key Attribute |
Corresponding Attribute |
class |
style |
if( class-attribute before style-attribute ){
the tag encodes a "1"-bit
}
else{
the tag encodes a "0"-bit
}
With this key, every combination of class
and style
stands for one bit. We need 80 text spans to hide 10 characters of a secret text. That's very much carrier text, for a little bit of secret text. Fortunately, HTML documents have more common attribute combinations, especially if we use old HTML with inline formatting instead of CSS. Here are a few examples. Key attribute first may mean "1", corresponding attribute first may mean "0".
Key Attribute |
Corresponding Attribute |
width |
height |
src |
alt |
align |
valign |
href |
target |
A Short Example
The carrier documents must be quite long, because every tag can only hide a few bits. The home page of pc-errors.de contains just enough attributes to hide 16 ASCII characters. Anyway, a short example document with hiding places for three bytes should be enough. Would you expect secrets in that page?
Above, you see a typical homepage of a bird fanatic, who has never heard about HTML 4 and uses a WYSIWYG editor he found on an old magazine CD. The page begins like that:
<html>
<head>
<title>Canary Birds</title>
<meta name="author" content="Peter Miller">
<style>
.bigText{ font-size:14px; font-weight:bold; }
</style>
</head>
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
alink="#FF0000" vlink="#FF0000">
<div align="center" width="50%">
<h1>Canaries</h1>
<span class="bigText" style="color:#0088ff">
The Finches who got their Name from Islands
which got their Name from Dogs
</span>
</div>
There are five useful attribute couples:
Key Attribute |
Corresponding Attribute |
name |
content |
text |
bgcolor |
alink |
vlink |
align |
width |
class |
style |
Each couple occurs only once, so the first part of the document can hide only five bits. Let's go on with the rest of the page:
<table width="60%" height="100" cellpadding="4" cellspacing="0"
bgcolor="white" align="center">
<tr>
<td align="right" valign="middle">
<img src="exampleImage.jpg" width="164" height="116"
alt="Yellow Bird" title="Yellow Bird" border="0">
</td>
<td align="left" valign="top">
The most canaries are yellow, even though they can have
all thinkable patterns of
<span class="bigText"
style="color:#ffffff; background:#000000">white</span>,
<span class="bigText" style="color:#bb0000">red</span> and
<span class="bigText" style="color:#888888">grey</span>.
<a href="#" target="_blank">click here to see photos.</a>
</td>
</tr>
<tr>
<td align="right" valign="top">
Male birds are great singers.
<a href="#" target="_blank">click here to listen to a sample.</a>
</td>
<td align="left" valign="middle">
<img src="exampleImage2.jpg" width="164" height="176"
alt="Singing Bird" title="A Canary is singing" border="0">
</td>
</tr>
<tr>
<td align="left" valign="top">
You cannot keep canaries in a cage all day long.
They can get sick, if you don't let them fly.
</td>
<td align="left" valign="top">
Another big mistake is to keep one canary alone.
Every birds need at least one partner,
loneliness can lead to bad disorders.
</td>
</tr>
<tr>
<td colspan="2">
<img src="exampleImage3.jpg" width="194" height="35"
alt="Feather" title="A Canary Feather" border="0">
</td>
</tr>
</table>
</body>
</html>
In this part of the document, additional attribute couples are possible:
Key Attribute |
Corresponding Attribute |
width |
height |
src |
alt |
title |
border |
cellspacing |
cellpadding |
bgcolor |
align |
align |
valign |
href |
target |
The combination of width
and height
occurs four times, that's a capacity of four bits. src
and alt
appear three times, that's a capacity for three bits. Three more bits from title
and border
. cellpadding
/cellspacing
occurs only once, just as bgcolor
/align
, that's another two bits. align
/valign
adds capacity for six bits, href
/target
adds three bits. Together with the five bits from above, the document has enough capacity to hide 26 bits, that's three characters and two unused bits.
Three characters are not enough for a long letter, but enough to say "no!", or, in ASCII values, "110 111 033" ("01101110 01101111 00100001"). Let's go through the document and find the first tag with a useable attribute couple...
<meta name="author" content="Peter Miller">
name/content is "1", content/name is "0".
We have to re-order the attributes, to hide a value of "0":
<meta content="Peter Miller" name="author">
One bit is done. Next bit...
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
alink="#FF0000" vlink="#FF0000">
text/bgcolor is "1", bgcolor/text is "0".
alink/vlink is "1", vlink/alink is "0".
We want to hie "1" and "1", no changes to this line are required.
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
alink="#FF0000" vlink="#FF0000">
... and so on... for every bit, we need to swap two attributes. Image tags can carry up to three bits, if also the deprecated attributes are there:
<img src="exampleImage.jpg" width="164" height="116" alt="Yellow Bird"
title="Yellow Bird" border="0">
We want to hide "010".
The first key attribute in this tag is "src",
so we take the corresponding attribute "alt".
The bit to hide is "0", the combination for "0" is alt/src,
so we place the "alt"-attribute before the "src"-attribute.
<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
title="Yellow Bird" border="0">
The next key attribute is "width", the corresponding attribute is "height".
Now, the bit to hide is "1", so we put "height" after "width".
The third key attribute is "title", and its corresponding attribute is "border".
To hide a "0", we move "title" behind "border".
<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
border="0" title="Yellow Bird">
No more Examples, Show the Implementation!
Alright, first we need two classes to store HTML tags and their attributes. Attributes don't have many properties, they only have a name
and a value
. Each attribute in a tag can be used for only one message bit. The program has to mark it as already handled.
public class HtmlAttribute {
private String name;
private String value;
private bool handled;
public String Name {
get { return name; }
}
public String Value {
get { return this.value; }
set { this.value = value; }
}
public bool Handled {
get { return handled; }
set { this.handled = value; }
}
public HtmlAttribute(String name) {
this.name = name.ToLower();
this.value = String.Empty;
handled = false;
}
}
An HTML tag has a name
and a number of attributes
. The constructor searches the tag's text for attributes
and their value
s.
public class HtmlTag {
public int beginPosition;
public int endPosition;
private String name;
public int BeginPosition {
get { return beginPosition; }
set { beginPosition = value; }
}
public int EndPosition {
get { return endPosition; }
set { endPosition = value; }
}
public String Name {
get { return name; }
}
private HtmlAttributeCollection attributes;
public HtmlAttributeCollection Attributes{
get{ return attributes; }
}
public HtmlTag(String text, int beginPosition, int endPosition) {
}
}
The Hide
method lists all HTML tags, and then loops over the tags and their attributes
. Attributes that have already been handled are being ignored. If an attribute
is still fresh and unused, the method looks it up in the key table...
public void Hide(String sourceFileName,
String destinationFileName,
Stream message,
DataTable keyTable)
{
StreamReader reader = new StreamReader(sourceFileName, Encoding.Default);
String htmlDocument = reader.ReadToEnd();
reader.Close();
message.Position = 0;
HtmlTagCollection tags = FindTags(htmlDocument);
StringBuilder insertTextBuilder = new StringBuilder();
DataRow[] rows;
HtmlAttribute secondAttribute;
int offset = 0;
int bitIndex = 7;
int messageByte = 0;
foreach (HtmlTag tag in tags) {
insertTextBuilder.Remove(0, insertTextBuilder.Length);
insertTextBuilder.AppendFormat("<{0}", tag.Name);
foreach (HtmlAttribute attribute in tag.Attributes) {
if (!attribute.Handled) {
rows =
keyTable.Select(String.Format("firstAttribute = '{0}'",
attribute.Name));
... If the program finds the attribute
's name in the first key column, it is a primary key attribute and its secondary key attribute is looked up in the attribute collection of the current tag. If the secondary key attribute exists, we have found a key attribute couple and are able to hide one bit.
if (rows.Length > 0) {
secondAttribute = FindAttribute(
rows[0]["secondAttribute"].ToString(),
tag.Attributes);
if (secondAttribute != null) {
if (bitIndex == 7) {
bitIndex = 0;
messageByte = message.ReadByte();
} else {
bitIndex++;
}
HideBit(messageByte,
bitIndex,
attribute,
secondAttribute,
insertTextBuilder);
attribute.Handled = true;
secondAttribute.Handled = true;
}
}
If the attribute
was not a primary key attribute
, it can be a secondary key attribute
. That means, it will be handled later on, together with its primary key attribute
. If the attribute
is not found in any key column, it is not meant to be used and must be copied into the new tag as it is.
if (!attribute.Handled) {
bool copyAttribute = false;
rows =
keyTable.Select(String.Format("secondAttribute = '{0}'",
attribute.Name));
if(rows.Length > 0){
HtmlAttribute firstAttribute = FindAttribute(
rows[0]["firstAttribute"].ToString(),
tag.Attributes);
if (firstAttribute == null) {
copyAttribute = true;
}else{
copyAttribute = firstAttribute.Handled;
}
}
else if (rows.Length == 0) {
copyAttribute = true;
}
if (copyAttribute) {
insertTextBuilder.AppendFormat(
@" {0}={1}",
attribute.Name, attribute.Value);
attribute.Handled = true;
}
}
}
}
At this point, you see the reason why we saved the start and end positions with every tag. When we're finished with a tag's attributes, we have to replace the old tag with the new one. Just for the case that a few white spaces got lost on the way, we compare old length and new length. If there is a difference, all following tags will still be found, even though they have been moved.
tag.BeginPosition += offset;
tag.EndPosition += offset;
String insertText = insertTextBuilder.ToString();
int newLength = insertText.Length;
if (newLength > 0) {
int oldLength = tag.EndPosition - tag.BeginPosition;
htmlDocument = htmlDocument.Remove(tag.BeginPosition, oldLength);
htmlDocument = htmlDocument.Insert(tag.BeginPosition, insertText);
offset += (newLength - oldLength);
}
if (messageByte < 0) {
break;
}
}
StreamWriter writer = new StreamWriter(destinationFileName);
writer.Write(htmlDocument);
writer.Close();
}
How to Reconstruct the Message
Extracting a message is much easier, because we need not care about unused attributes. Loop through the tags and attributes, find a primary key attribute, get its corresponding attribute, and compare the positions, that's all.
public void Extract(String sourceFileName, Stream message, DataTable keyTable) {
foreach (HtmlTag tag in tags) {
foreach (HtmlAttribute attribute in tag.Attributes) {
if (!attribute.Handled) {
rows =
keyTable.Select(String.Format("firstAttribute = '{0}'",
attribute.Name));
if (rows.Length > 0) {
secondAttribute = FindAttribute(
rows[0]["secondAttribute"].ToString(),
tag.Attributes);
if (secondAttribute != null) {
attributePosition = htmlDocument.IndexOf(
attribute.Name,
tag.BeginPosition);
secondAttributePosition = htmlDocument.IndexOf(
secondAttribute.Name,
tag.BeginPosition);
messageByte = ExtractBit(
attributePosition,
secondAttributePosition,
messageByte,
bitIndex,
message);
Like in the previous articles, the Extract
methods expect to find the message
's length
, before the actual message
begins. Because of a document's limited capacity, the length
value is only one byte long, not four.
if (bitIndex == 7) {
bitIndex = 0;
if ((message.Length == 1) && (messageLength == 0)) {
message.Position = 0;
BinaryReader binaryReader =
new BinaryReader(message);
messageLength = binaryReader.ReadByte();
reader = null;
message.SetLength(0);
message.Position = 0;
}
else if ((messageLength > 0) &&
(message.Length == messageLength)) {
break;
}
} else {
bitIndex++;
}
attribute.Handled = true;
secondAttribute.Handled = true;
}
}
}
Building a Key
The key is not any binary file anymore, it is a table of attributes
. You should build your key with the key editor, and save it to an XML file. The *.zip archive contains two example files, maybe they are useful as key templates.
History
- 14th November, 2004: Initial post
- 13th March, 2008: Article updated - bug fixed in source archive