Ole and Accessing Files Embedded in Access - Part #2

mrjvdveen

0/5 (0 vote)

Aug 18, 2009

CPOL

3 min read

13266

Ole and accessing file embedded in Access

This series seems to be plagued by delays. Unfortunately, I don't have much time lately. Work is really busy and I'm moving to a new home! So please be patient with me. I'll try to post as often as I can.

In the last post in this series, I've talked about some of the theoretical background of opening files that were stored by Access. We've looked at the different headers and at metafilepict blocks. Today, it's time to look at some code.

Before we do dive into the code, I would like to point out that part of this code is the library written by Eduardo Morcillo. My code wouldn't run without it.

First order of business is to have structs that can hold my header information:

internal struct PackageHeader
{
public short Signature;
public short HeaderSize;
public uint ObjectType;
public short FriendlyNameLen;
public short ClassNameLen;
public short FrNameOffset;
public short ClassNameOffset;
public int ObjectSize;
public string FriendlyName;
public string ClassName;
}

internal struct OleHeader
{
public uint OleVersion;
public uint Format;
public int ObjectTypeNameLen;
public string ObjectTypeName;
}

As you can see, we have two structs that contain information about the package header and the OLE header, based on the information we gathered in the last article. The types are based on the number of bytes each entry can store.

The next order of business is to define some constant values we need during the process.

private const int FixedPackageHeaderSize = 20;
private const int FixedOleHeaderSize = 12;
private const int MetaFileHeaderSize = 45;
private const int BufferSize = 1024;
private const string ContentsEntryName = "CONTENTS";
private const string WorkBookEntryName = "Workbook";
private const string MSPhotoFriendlyName = "MSPhotoEd.3\0";

As you can see, some numbers are here for fixed sizes of headers, a buffer size which is arbitrary, and some string constants we need to identify what type of data we are dealing with.

Finally, we also need some private fields in our class that can hold some data for us:

private System.IO.Stream _input;
private long _endOfHeaderPosition;
private int _dataLength;
private PackageHeader _packageHeader;
private OleHeader _oleHeader;

The constructor of our class actually has a parameter of type Stream that is the input of the class. From the constructor, a method ReadHeader is called:

private void ReadHeader()
{
if (_input.Position > 0 && _input.CanSeek)
{
_input.Seek(0, SeekOrigin.Begin);
}

byte[] fixedPackageHeaderData = new byte[FixedPackageHeaderSize];
_input.Read(fixedPackageHeaderData, 0, FixedPackageHeaderSize);

PackageHeader packageHeader = new PackageHeader();
packageHeader.Signature = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[0],
    fixedPackageHeaderData[1] });
packageHeader.HeaderSize = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[2],
    fixedPackageHeaderData[3] });
packageHeader.ObjectType = CalcUIntFromBytes(new byte[] { fixedPackageHeaderData[4],
    fixedPackageHeaderData[5], fixedPackageHeaderData[6], fixedPackageHeaderData[7] });
packageHeader.FriendlyNameLen = CalcShortFromBytes(new byte[] { 
    fixedPackageHeaderData[8], fixedPackageHeaderData[9] });
packageHeader.ClassNameLen = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[10],
    fixedPackageHeaderData[11] });
packageHeader.FrNameOffset = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[12],
    fixedPackageHeaderData[13] });
packageHeader.ClassNameOffset = CalcShortFromBytes(new byte[] {
    fixedPackageHeaderData[14], fixedPackageHeaderData[15] });
packageHeader.ObjectSize = CalcIntFromBytes(new byte[] { fixedPackageHeaderData[16],
    fixedPackageHeaderData[17], fixedPackageHeaderData[18], fixedPackageHeaderData[19] });

byte[] friendlyNameData = new byte[packageHeader.FriendlyNameLen];
_input.Read(friendlyNameData, 0, packageHeader.FriendlyNameLen);
packageHeader.FriendlyName = Encoding.UTF8.GetString(friendlyNameData);

byte[] classNameData = new byte[packageHeader.ClassNameLen];
_input.Read(classNameData, 0, packageHeader.ClassNameLen);
packageHeader.ClassName = Encoding.UTF8.GetString(classNameData);

_packageHeader = packageHeader;

byte[] fixedOleHeaderData = new byte[FixedOleHeaderSize];
_input.Read(fixedOleHeaderData, 0, FixedOleHeaderSize);

OleHeader oleHeader = new OleHeader();
oleHeader.OleVersion = CalcUIntFromBytes(new byte[] { fixedOleHeaderData[0],
    fixedOleHeaderData[1], fixedOleHeaderData[2], fixedOleHeaderData[3] });
oleHeader.Format = CalcUIntFromBytes(new byte[] { fixedOleHeaderData[4],
    fixedOleHeaderData[5], fixedOleHeaderData[6], fixedOleHeaderData[7] });
oleHeader.ObjectTypeNameLen = CalcIntFromBytes(new byte[] { fixedOleHeaderData[8],
    fixedOleHeaderData[9], fixedOleHeaderData[10], fixedOleHeaderData[11] });

byte[] objectTypeNameData = new byte[oleHeader.ObjectTypeNameLen];
_input.Read(objectTypeNameData, 0, oleHeader.ObjectTypeNameLen);
oleHeader.ObjectTypeName = Encoding.UTF8.GetString(objectTypeNameData);

_oleHeader = oleHeader;

for (int index = 0; index < 8; index++)
{
_input.ReadByte();
}

byte[] lengthData = new byte[4];
_input.Read(lengthData, 0, 4);
_dataLength = BitConverter.ToInt32(lengthData, 0);

_endOfHeaderPosition = _input.Position;
}

This method reads the header and decomposes it into entries. This allows us to get to the variable bits and read them correctly as well. It also gives us some important information, being the length of the data block and the end position of the header.

So on to the business end of this class, it's GetStrippedStream method:

public System.IO.Stream GetStrippedStream()
{
if (_input.Position != _endOfHeaderPosition && _input.CanSeek)
{
_input.Seek(_endOfHeaderPosition, SeekOrigin.Begin);
}
if (_packageHeader.ClassName.Equals(MSPhotoFriendlyName, StringComparison.OrdinalIgnoreCase))
{
_input.Seek(_dataLength + MetaFileHeaderSize, SeekOrigin.Current);
}

string tempFileName = Path.GetTempFileName();
FileStream tempFileStream = File.OpenWrite(tempFileName);

byte[] buffer = new byte[BufferSize];
int loadedBytes = _input.Read(buffer, 0, BufferSize);
while (loadedBytes > 0)
{
tempFileStream.Write(buffer, 0, loadedBytes);
loadedBytes = _input.Read(buffer, 0, BufferSize);
}
tempFileStream.Close();

System.IO.Stream outputStream;
bool isCompoundFile = Storage.IsCompoundStorageFile(tempFileName);
if (isCompoundFile)
{
Storage storage = new Storage(tempFileName);
Storage.StorageElementsCollection elements = storage.Elements();
// element.Name.Equals(WorkBookEntryName, StringComparison.OrdinalIgnoreCase)
var result = from StatStg element in elements
where (element.Name.Equals(ContentsEntryName, StringComparison.OrdinalIgnoreCase)
element.Name.Equals(WorkBookEntryName, StringComparison.OrdinalIgnoreCase))
&& element.Type == StatStg.ElementType.Stream
select element;
if (result.Any())
{
outputStream = storage.OpenStream(result.First().Name);
}
else
{
storage.Close();
outputStream = File.OpenRead(tempFileName);
}
}
else
{
outputStream = File.OpenRead(tempFileName);
}
return outputStream;
}

As you can see, we first make sure we set the position of the stream to the right position, based on the header information. We then write the stream to a temp file. The temp file is used to work with Eduardo's library. We use it to determine if this file is actually a structured storage and if so, we extract only the stream we need. If this is a Microsoft Word document, then it will not have the elements we look for in the structured storage. In this case, we want to send the complete file as a result. If this was not a structured storage, then we want to send the complete file as well.

Warning: The code may suggest this works for Microsoft Excel as well, but unfortunately it doesn't. The reason for this is that whenever Access embeds an Excel file, it will change the structured storage completely, until the point that the original file can not be recovered. I have managed to get the data out of it, but I've not managed to get a properly working Excel file. If anyone can provide me with some more insight into this, I would be very grateful.

Below, you can find the .cs file with the complete class.

This concludes the series on Ole. I hope you have found it helpful. Please leave any comments and/or questions below. I'm always happy to read and reply.