Click here to Skip to main content
15,867,939 members
Articles / General Programming / Threads

SimplePack Library

Rate me:
Please Sign up or sign in to vote.
4.93/5 (30 votes)
23 Nov 2010CPOL7 min read 72K   1.9K   97   27
Simple way how to pack data into one file

Introduction

SimplePack is a library for storing many files into one. SimplePack is similar to ordinary TAR archive in some ways, because both archives allow storing directory structure into one file without compression. SimplePack provide this basic functionality:

Archive.cs

  • Add file/directory into archive
  • Delete file/directory from archive
  • Extract file/directory from archive
  • Perform this operation in synchronous and asynchronous way (with progress information)

ArchiveInfo.cs

  • Calculate basic statistic information about archive

ArchiveFileStream.cs

  • Read only file stream providing direct reading of file content without necessary extraction

Note: Archive.cs can perform only one asynchronous operation in time. Changing this behavior will be a little bit tricky and will lead to decreased performance of the Archive itself. All asynchronous operations in this class have suffix Async.

Note 2: SimplePack source code contains documentation of library.

Advantages and Disadvantages

Advantages

  • Working without temporary files
  • Synchronous and asynchronous way to work (no additional thread required)
  • No data compression (faster archiving/extracting of data, possible direct access to specific file)

Disadvantages

  • No data compression

Background

In simplePack archive are physically stored only files and serialized Footer. Directory structure which is in archive, is represented only with objects. This hierarchical structure is serialized at the end of archive. Root “directory” is represented with object of type Footer. Class Footer implements interface IArchiveStructure same like DirectoryArchive class. There are several differences between Footer and DirectoryArchive (e.g. Footer don’t have parent directory) so this is the reason why Footer does not inherit class DirectoryArchive. There are two collections in classes Footer and DirectoryArchive. The first collection contains objects of class FileArchive which Footer or directory directly contains. The second collection contains objects of DirectoryArchive class (so nested directories of current directory or Footer). Class FileArchive holds information about files which are stored in archive. FileArchive and DirectoryArchive also contain additional information about attributes of original files or directories. These attributes are restored after extracting file or directory. FileArchive and DirectoryArchive also refer to parent IArchiveStructure object (Footer or DirectoryArchive). As you can see, hierarchical structure is represented by nested objects, so recursion is often used to go through this structure, but I think this does not slow the archive. Anyway, SimplePack also stores a list of all files which are in archive, to speed up operations with files (the main reason why I did this is: I’m too lazy to program a few additional recursion methods). Deleted files are replaced with unused space (Gap). Files are stored in archive sequentially, so one file follows another. When this is not true, there is a gap between two files. There is very simple way how to detect whether there are some gaps in archive. As you can remember, we have still a list of all files in archive, so let’s sort this list by start position in archive (StartSeek), the rest is just a simple algorithm. Data stored in Footer (objects of types FileArchive and DirectoryArchive) contains information about parentDirectory. This information is not serialized into Footer (if you use XML serializer), because then the cycle will occur. This information is calculated from existing hierarchical structure. Each directory also holds size. After adding or removing file or directory, this Size information is updated hierarchically until root (also here it is good to store information about parentDirectory).

I’ll explain how SimplePack works in some examples. Examples demonstrated will only work with files, working with directories is analogical. All information about files and directories are stored in Footer part of Archive. Footer is in fact serialized object of Footer class. Default serializer for Footer is XML serializer, but user can implement his own serializer and put it into the constructor. This serializer must implement IFooterStorage interface. Footer is written into Archive only when method Close() is called. This behavior can be changed if set attribute Atomic to TRUE, then footer will be written after performing each writing operation into archive. This also slows down the Archive a little bit. You can specify the virtual path to files and directories in SimpePack archive. It’s possible to have several copies of the same file but each file will be stored in a different virtual directory (this will be demonstrated also in examples). Root of Archive is arch:\\. You cannot delete root directory (basically you can, but then you will be honored with exception).

Add File

add_file.jpg

Figure 1.A: This is archive before new file will be inserted. Footer is at the end of file.
Figure 1.B: Footer is removed from archive (Footer is still present in memory), because new fill will be written at the end of archive
Figure 1.C: New file is inserted at the end of archive
Figure 1.D: Updated Footer is written at the end of archive file

Note: Archive can have only one Footer at all!

Remove File

remove_file.jpg

Figure 2.A: Archive before removing file
Figure 2.B: Removed file is replaced with empty data, Footer was updated and written at the end of file

Note: Size of archive remains unchanged (when we don’t count small change of Footer record). Removing files from archive this way does not require temporary files, because deleted file is overwritten with empty bytes (0x00). SimplePack provides methods (synchronous and of course asynchronous) which removes unused space from Archive.

Removing Unused Space from Archive

After removing file from archive, is created unused space. This unused space will not change even when you add new file into directory, because then data will be defragmented and this will cause slow down of whole SimplePack. I decided to implement VacuumArchive method (or VacuumArchiveAsync) which is very similar to method vacuum in SQLite. Unused space is shifted at the end of archive and then is archive truncated.

vacuum_archive.jpg

Figure 3.A: Archive with 2 unused spaces before call VacuumArchive method
Figure 3.B: File 2 is moved right behind file 1 and so unused space is moved behind file 2
Figure 3.C: Archive is truncated to sum of all files
Figure 3.D: Footer is written at the end of Archive

Note: As you can see here, Footer is unchanged, because Footer does not hold information about unused spaces.

Using the Code

The first example demonstrates how to use base class Archive in a synchronous way.

C#
using(Archive simpelPackTestArchive = new Archive(@"c:\myTestArchive.smp"))
{
     //open archive
     simpelPackTestArchive.Open();

     //add file into archive
     simpelPackTestArchive.AddFile(@"c:\test1.txt", 
	@"arch:\nameOfFileInArchive");   //second path is virtual path in archive
     simpelPackTestArchive.AddFile(@"c:\test1.txt", 
	@"arch:\nameOfFileInArchive2");  //different virtual path but same source file
     simpelPackTestArchive.AddFile(@"c:\test2.txt", 
	@"arch:\archiveDirectory\myArchivedFile");   //file is placed into 
				//virtual directory which is automatically created
     simpelPackTestArchive.AddFile(@"c:\test3.txt", 
	@"arch:\archiveDirectory\myArchivedFile2");

     //add directory into archive
     simpelPackTestArchive.AddDirectory(@"c:\testDirectory\", 
	@"arch:\someDirInsideArchive\Additional Directory\myArchivedDirectory\");  
				//archive virtual directory MUST ends with \

     //removing files from archive, file is specified with virtual path
     simpelPackTestArchive.RemoveFile
	(@"arch:\nameOfFileInArchive2");    //unused space is created
     simpelPackTestArchive.RemoveFile
	(@"arch:\archiveDirectory\myArchivedFile2");    //unused space is created

     //remove directory from archive
     simpelPackTestArchive.RemoveDirectory
	(@"arch:\someDirInsideArchive\");  	//used parent^2 directory for 
					//removing directory
     //simpelPackTestArchive.RemoveDirectory
	(@"arch:\");  //this will leads to exception (Root path of archive is mandatory)

     //vacuuming archive
     simpelPackTestArchive.VacuumArchive();  //removing unused space in archive
}   //here is archive closed and disposed

This is a demonstration of how to use base class Archive in an asynchronous way. This demonstration is done on ordinary Form class. All methods have synchronous and asynchronous versions. Most of asynchronous methods also provide progress information (except OpenAsync and VacuumArchiveAsync). See documentation for more information.

C#
private void Form2_Load(object sender, EventArgs e)
{
     testArchive = new Archive(@"c:\testArchive");
     testArchive.OpenCompleted += 
	(testArchive_OpenCompleted);   //assign method which will be called 
					//when opening of archive is completed
     testArchive.OpenAsync();    //calling asynchronous method
} 

void testArchive_OpenCompleted(Archive sender, 
	SimplePack.Events.OpenCompletedEventArgs openCompletedEventArgs)
{
     if (openCompletedEventArgs.Error == null)
     {
          //no error occurred during opening archive
          MessageBox.Show("Archive was opened correctly");
     } else {
          //omg, some error happened
          MessageBox.Show(openCompletedEventArgs.Error.Message);
     }
}

private void Form2_FormClosing(object sender, FormClosingEventArgs e)
{
     //prevent close form until operation is not done
     if (!testArchive.IsBusy)
          return;
     e.Cancel = true;
     MessageBox.Show("Can not close form, operation in progress");
}

How to get basic statistic information of Archive.

C#
ArchiveInfo archiveInfo = new ArchiveInfo
	(@"c:\testArchive");   //create ArchiveInfo object, 
			//path to SimplePack Archive is parameter in constructor
MessageBox.Show(archiveInfo.BiggestFile.Length.ToString());   //read and display 
						       //statistic information 

How to read data directly from Archive (without extraction):

C#
private void Form1_Load(object sender, EventArgs e)
{
     using (Archive testArchive = new Archive
		(@"c:\testArchive"))    //create SimplePack archive
     {
          testArchive.Open(); //open archive properly
          //create ArchiveFileStream where file 
	 //arch:\myPicture.jpg will be accessible via Stream (READ ONLY!)
          using (ArchiveFileStream archiveFileStream = 
		new ArchiveFileStream(testArchive, @"arch:\myPicture.jpg"))
          {
                Image testImage = new Bitmap
			(archiveFileStream);    //create image object from stream
                pictureBox1.Image = testImage;  //display image in pictureBox1
          }
     }
} 

Points of Interest

The reason why I created this library is because I need to store several directories into one file. Providing asynchronous approach makes code easy to read and you can implement displaying progress of operation with only a few lines of code. You don’t even need to know anything about threads and re-invoking event in correct thread is also not necessary, so code is at the end more readable. Compressing of data is not supported and Archive will be never be extended this way (but you can if you want :)). I decide this way, because primary usage of this archive is focused on storing multimedia data where zipping make source data even bigger. SimplePack is limited only with possibilities of the file system.

Benchmark

I made a simple benchmark of 3 libraries support directory packing (SimplePack, ChilkatDotNet2 for tar packing, SharpZipLib for zip packing with 0 compression level). Test program creates 20 times same archive and count minimum, maximum and average time in ms. Test directory contains 5230 files and 1004 subdirectories with total size 237,799,409 bytes. Results are displayed in following graph. In case of SimplePack, the latest version 1.0.3 with custom binary serialization of Footer was used which is of course faster than XML serialization and costs less resources.

statistic.gif

History

  • Version 1.0.3 - Custom Binary Serialization/Deserialization of Footer
  • Version 1.0.2.2 – Base exception ArchiveException was implemented + some small improvements
  • Version 1.0.2 – ArchiveFileStream was implemented
  • Version 1.0.1 – File and directory attributes are preserved
  • Version 1.0.0 – Implementation of Asynchronous method call

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Slovakia Slovakia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
PraiseGreat Library Pin
jase.online20-Jun-16 2:48
jase.online20-Jun-16 2:48 
GeneralMy vote of 5 Pin
Mike (Prof. Chuck)10-Jan-13 3:39
professionalMike (Prof. Chuck)10-Jan-13 3:39 
QuestionQuestion About extracting. Pin
SeniorEng5-May-12 3:24
SeniorEng5-May-12 3:24 
QuestionCan we compress files before adding them to the archive?? Pin
SeniorEng16-Apr-12 1:48
SeniorEng16-Apr-12 1:48 
AnswerRe: Can we compress files before adding them to the archive?? Pin
Michal Stehlik16-Apr-12 6:02
Michal Stehlik16-Apr-12 6:02 
GeneralRe: Can we compress files before adding them to the archive?? Pin
SeniorEng16-Apr-12 9:30
SeniorEng16-Apr-12 9:30 
Sorry I meant how to do it one time ... how to add the code to do compression ??
QuestionHow to encrypt data in this file Pin
vijaykumar10718-Aug-11 23:21
vijaykumar10718-Aug-11 23:21 
AnswerRe: How to encrypt data in this file Pin
Michal Stehlik19-Aug-11 3:32
Michal Stehlik19-Aug-11 3:32 
QuestionHow to work with listview with this code?? Pin
vijaykumar1074-Aug-11 20:28
vijaykumar1074-Aug-11 20:28 
GeneralMy vote of 5 Pin
Sidgilles25-Nov-10 18:54
Sidgilles25-Nov-10 18:54 
GeneralRe: My vote of 5 Pin
Michal Stehlik25-Nov-10 20:35
Michal Stehlik25-Nov-10 20:35 
GeneralRe: My vote of 5 Pin
Sidgilles28-Nov-10 21:27
Sidgilles28-Nov-10 21:27 
GeneralMy vote of 4 Pin
WillianBR24-Nov-10 7:39
WillianBR24-Nov-10 7:39 
GeneralMy vote of 5 Pin
linuxjr23-Nov-10 12:40
professionallinuxjr23-Nov-10 12:40 
GeneralRe: My vote of 5 Pin
Michal Stehlik23-Nov-10 20:31
Michal Stehlik23-Nov-10 20:31 
QuestionHow many data you can store in this way Pin
RDoes19-Nov-10 23:23
RDoes19-Nov-10 23:23 
AnswerRe: How many data you can store in this way [modified] Pin
Michal Stehlik20-Nov-10 13:01
Michal Stehlik20-Nov-10 13:01 
QuestionProgram used for creating the picture above? Pin
TWoebke17-Nov-10 3:31
TWoebke17-Nov-10 3:31 
AnswerRe: Program used for creating the picture above? Pin
Michal Stehlik18-Nov-10 1:30
Michal Stehlik18-Nov-10 1:30 
GeneralMy vote of 5 Pin
Manuel Arriola16-Nov-10 5:32
Manuel Arriola16-Nov-10 5:32 
GeneralRe: My vote of 5 Pin
Michal Stehlik18-Nov-10 9:45
Michal Stehlik18-Nov-10 9:45 
GeneralCompression might be faster... Pin
Keith Vinson16-Nov-10 4:22
Keith Vinson16-Nov-10 4:22 
GeneralMy vote of 5 Pin
malac110-Nov-10 0:01
malac110-Nov-10 0:01 
GeneralRe: My vote of 5 Pin
Michal Stehlik10-Nov-10 0:49
Michal Stehlik10-Nov-10 0:49 
GeneralRe: My vote of 5 Pin
Michal Stehlik23-Nov-10 12:02
Michal Stehlik23-Nov-10 12:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.