Click here to Skip to main content
15,867,986 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Access

Ole and Accessing Files Embedded in Access - Part #2

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
18 Aug 2009CPOL3 min read 12.8K   4  
Ole and accessing file embedded in Access

This series seems to be plagued by delays. Unfortunately, I don't have much time lately. Work is really busy and I'm moving to a new home! So please be patient with me. I'll try to post as often as I can.

In the last post in this series, I've talked about some of the theoretical background of opening files that were stored by Access. We've looked at the different headers and at metafilepict blocks. Today, it's time to look at some code.

Before we do dive into the code, I would like to point out that part of this code is the library written by Eduardo Morcillo. My code wouldn't run without it.

First order of business is to have structs that can hold my header information:

C++
internal struct PackageHeader
{
public short Signature;
public short HeaderSize;
public uint ObjectType;
public short FriendlyNameLen;
public short ClassNameLen;
public short FrNameOffset;
public short ClassNameOffset;
public int ObjectSize;
public string FriendlyName;
public string ClassName;
}

internal struct OleHeader
{
public uint OleVersion;
public uint Format;
public int ObjectTypeNameLen;
public string ObjectTypeName;
}

As you can see, we have two structs that contain information about the package header and the OLE header, based on the information we gathered in the last article. The types are based on the number of bytes each entry can store.

The next order of business is to define some constant values we need during the process.

C++
private const int FixedPackageHeaderSize = 20;
private const int FixedOleHeaderSize = 12;
private const int MetaFileHeaderSize = 45;
private const int BufferSize = 1024;
private const string ContentsEntryName = "CONTENTS";
private const string WorkBookEntryName = "Workbook";
private const string MSPhotoFriendlyName = "MSPhotoEd.3\0";

As you can see, some numbers are here for fixed sizes of headers, a buffer size which is arbitrary, and some string constants we need to identify what type of data we are dealing with.

Finally, we also need some private fields in our class that can hold some data for us:

C++
private System.IO.Stream _input;
private long _endOfHeaderPosition;
private int _dataLength;
private PackageHeader _packageHeader;
private OleHeader _oleHeader;

The constructor of our class actually has a parameter of type Stream that is the input of the class. From the constructor, a method ReadHeader is called:

C++
private void ReadHeader()
{
if (_input.Position > 0 && _input.CanSeek)
{
_input.Seek(0, SeekOrigin.Begin);
}

byte[] fixedPackageHeaderData = new byte[FixedPackageHeaderSize];
_input.Read(fixedPackageHeaderData, 0, FixedPackageHeaderSize);

PackageHeader packageHeader = new PackageHeader();
packageHeader.Signature = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[0],
    fixedPackageHeaderData[1] });
packageHeader.HeaderSize = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[2],
    fixedPackageHeaderData[3] });
packageHeader.ObjectType = CalcUIntFromBytes(new byte[] { fixedPackageHeaderData[4],
    fixedPackageHeaderData[5], fixedPackageHeaderData[6], fixedPackageHeaderData[7] });
packageHeader.FriendlyNameLen = CalcShortFromBytes(new byte[] { 
    fixedPackageHeaderData[8], fixedPackageHeaderData[9] });
packageHeader.ClassNameLen = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[10],
    fixedPackageHeaderData[11] });
packageHeader.FrNameOffset = CalcShortFromBytes(new byte[] { fixedPackageHeaderData[12],
    fixedPackageHeaderData[13] });
packageHeader.ClassNameOffset = CalcShortFromBytes(new byte[] {
    fixedPackageHeaderData[14], fixedPackageHeaderData[15] });
packageHeader.ObjectSize = CalcIntFromBytes(new byte[] { fixedPackageHeaderData[16],
    fixedPackageHeaderData[17], fixedPackageHeaderData[18], fixedPackageHeaderData[19] });

byte[] friendlyNameData = new byte[packageHeader.FriendlyNameLen];
_input.Read(friendlyNameData, 0, packageHeader.FriendlyNameLen);
packageHeader.FriendlyName = Encoding.UTF8.GetString(friendlyNameData);

byte[] classNameData = new byte[packageHeader.ClassNameLen];
_input.Read(classNameData, 0, packageHeader.ClassNameLen);
packageHeader.ClassName = Encoding.UTF8.GetString(classNameData);

_packageHeader = packageHeader;

byte[] fixedOleHeaderData = new byte[FixedOleHeaderSize];
_input.Read(fixedOleHeaderData, 0, FixedOleHeaderSize);

OleHeader oleHeader = new OleHeader();
oleHeader.OleVersion = CalcUIntFromBytes(new byte[] { fixedOleHeaderData[0],
    fixedOleHeaderData[1], fixedOleHeaderData[2], fixedOleHeaderData[3] });
oleHeader.Format = CalcUIntFromBytes(new byte[] { fixedOleHeaderData[4],
    fixedOleHeaderData[5], fixedOleHeaderData[6], fixedOleHeaderData[7] });
oleHeader.ObjectTypeNameLen = CalcIntFromBytes(new byte[] { fixedOleHeaderData[8],
    fixedOleHeaderData[9], fixedOleHeaderData[10], fixedOleHeaderData[11] });

byte[] objectTypeNameData = new byte[oleHeader.ObjectTypeNameLen];
_input.Read(objectTypeNameData, 0, oleHeader.ObjectTypeNameLen);
oleHeader.ObjectTypeName = Encoding.UTF8.GetString(objectTypeNameData);

_oleHeader = oleHeader;

for (int index = 0; index < 8; index++)
{
_input.ReadByte();
}

byte[] lengthData = new byte[4];
_input.Read(lengthData, 0, 4);
_dataLength = BitConverter.ToInt32(lengthData, 0);

_endOfHeaderPosition = _input.Position;
}

This method reads the header and decomposes it into entries. This allows us to get to the variable bits and read them correctly as well. It also gives us some important information, being the length of the data block and the end position of the header.

So on to the business end of this class, it's GetStrippedStream method:

C++
public System.IO.Stream GetStrippedStream()
{
if (_input.Position != _endOfHeaderPosition && _input.CanSeek)
{
_input.Seek(_endOfHeaderPosition, SeekOrigin.Begin);
}
if (_packageHeader.ClassName.Equals(MSPhotoFriendlyName, StringComparison.OrdinalIgnoreCase))
{
_input.Seek(_dataLength + MetaFileHeaderSize, SeekOrigin.Current);
}

string tempFileName = Path.GetTempFileName();
FileStream tempFileStream = File.OpenWrite(tempFileName);

byte[] buffer = new byte[BufferSize];
int loadedBytes = _input.Read(buffer, 0, BufferSize);
while (loadedBytes > 0)
{
tempFileStream.Write(buffer, 0, loadedBytes);
loadedBytes = _input.Read(buffer, 0, BufferSize);
}
tempFileStream.Close();

System.IO.Stream outputStream;
bool isCompoundFile = Storage.IsCompoundStorageFile(tempFileName);
if (isCompoundFile)
{
Storage storage = new Storage(tempFileName);
Storage.StorageElementsCollection elements = storage.Elements();
// element.Name.Equals(WorkBookEntryName, StringComparison.OrdinalIgnoreCase)
var result = from StatStg element in elements
where (element.Name.Equals(ContentsEntryName, StringComparison.OrdinalIgnoreCase)
element.Name.Equals(WorkBookEntryName, StringComparison.OrdinalIgnoreCase))
&& element.Type == StatStg.ElementType.Stream
select element;
if (result.Any())
{
outputStream = storage.OpenStream(result.First().Name);
}
else
{
storage.Close();
outputStream = File.OpenRead(tempFileName);
}
}
else
{
outputStream = File.OpenRead(tempFileName);
}
return outputStream;
}

As you can see, we first make sure we set the position of the stream to the right position, based on the header information. We then write the stream to a temp file. The temp file is used to work with Eduardo's library. We use it to determine if this file is actually a structured storage and if so, we extract only the stream we need. If this is a Microsoft Word document, then it will not have the elements we look for in the structured storage. In this case, we want to send the complete file as a result. If this was not a structured storage, then we want to send the complete file as well.

Warning: The code may suggest this works for Microsoft Excel as well, but unfortunately it doesn't. The reason for this is that whenever Access embeds an Excel file, it will change the structured storage completely, until the point that the original file can not be recovered. I have managed to get the data out of it, but I've not managed to get a properly working Excel file. If anyone can provide me with some more insight into this, I would be very grateful.

Below, you can find the .cs file with the complete class.

This concludes the series on Ole. I hope you have found it helpful. Please leave any comments and/or questions below. I'm always happy to read and reply.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) KnowledgePlaza
Netherlands Netherlands
Since early 2001 I've been working full time as a software developer and since 2004 I've been working mostly with Microsoft technology.
I started out as a product developer, but after a few years I switched to a project company where my roles ranged from developer up to consultant and from team lead and coach to manager.
Eventually I switched jobs and focused on the consultant part and then I got back to building a product once again. Now I work in a job where I get to do both.

Comments and Discussions

 
-- There are no messages in this forum --