Click here to Skip to main content
15,867,453 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello everyone, Im trying to create my own data compressor and decompressor I saw some posts here and I tried to change them but there is a problem.

Let's say I created a file trial.txt program compressing it but adding a lot of 00 00 00 00 which is making compressed version bigger than original one.
So what I want is a good compression logic also program will save compressed and decompressed versions as txt. I tried huffman algorithm but it didn't work as well.

so my codes are below. "ncurses.h" library may give error in your compiler if it doesn't work you can delete and add "conio.h" as well

+++update I fixed my codes as below now I have only one problem Decompression function doesn't take last letter of Message for example:

Message: Hello World
Compressed A0 B3 DD ...
Decompressed: Hello Worl


#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <ncurses.h>
#include <errno.h>


void Compression(unsigned char *sizeOut, const char *Message_size) {  
    
	unsigned long long Buffer = 0;  
	char Bits = 0;
	
	while (*Message_size != 0) {
		Buffer |= (unsigned long long)(*Message_size++) << Bits;
		Bits += 7;
		if (Bits == 7 * 8) { 
			while (Bits > 0) {
				*sizeOut++ = Buffer;
				Buffer >>= 8;
				Bits -= 8;
			}
			Bits = 0;
			Buffer = 0;
		}
	}
	while (Bits > 0) {
		*sizeOut++ = Buffer;
		Buffer >>= 8;
		Bits -= 8;
	}
}


void Decompression (char *Out_size, const unsigned char *Compressed_size, unsigned CompressedLength) {
    
	unsigned long long Buffer = 0;
	char Bits = 0;
	while (CompressedLength) {
		while (CompressedLength && Bits < 7 * 8) {
			Buffer |= (unsigned long long)*Compressed_size++ << Bits;
			Bits += 8;
			--CompressedLength;
		}
		while (Bits > 0) {
			*Out_size++ = Buffer & 0x7F;
			Buffer >>= 7;
			Bits -= 7;
		}
		
		Bits = 0;
		Buffer = 0;
	}
}


int main(void) {
    
    
    
    
    int letters;
    
    printf("Creator: Ceyhun Kivanc Demir, data compressor and decompressor\n\n");
    
    
    
    
    
    FILE *file;
    char filename[100]="";	
    printf("\n\nPlease Enter the name of file: ");
    scanf("%99s",filename);
    file=fopen(filename,"r");
    if(file==NULL){
		printf("\n%s File not found.\n\n",filename, strerror(errno));
		exit(1);
	}
	
	    
    
   
    fseek(file, 0, SEEK_END);
    long count = ftell(file);
    fseek(file, 0, SEEK_SET);
    
    
    
    char Message[count];
    
    
    fread(Message, strlen(Message)+1, count, file);  
    Message[count]='\0';
    
	
	unsigned CompressedSize = sizeof(Message)*7/8; 
	unsigned char CompressedBytes[CompressedSize]; 
	char DecompressedSize[sizeof(Message)];
	
	
	
	printf("\nMessage: %s\n", Message);
	Compression(CompressedBytes, Message);
	printf("char number of message: %d\n", strlen(Message));
	printf("\nCompressed version: ");
	for (int Byte = 0; Byte < CompressedSize; ++Byte) {
		printf("%02X ", CompressedBytes[Byte]);
	}
	printf("\n");
	
	
	Decompression(DecompressedSize, CompressedBytes, CompressedSize);
	DecompressedSize[sizeof(Message)] = 0; 
	
	printf("\nDecompressed version: %s\n", DecompressedSize);
	printf("char number of message: %d\n", strlen(DecompressedSize));
	
	if (strcmp(Message, DecompressedSize) == 0) {
		printf("\nCompression done.\n");
	} else {
		printf("\nCompression crushed!\n");
	}
	fclose(file);
	return 0;
}


What I have tried:

hufman algorithm, file save, compressing, decompressing
Posted
Updated 6-Jan-23 20:37pm
v2
Comments
jeron1 22-Dec-21 13:02pm    
char Message[count];
fseek(file, 0, SEEK_SET);
fread(Message, strlen(Message)+1, count, file); // what does strlen() return? as Message is an uninitialized array
Message[count]='\0'; // is count a valid index for this array?
Lones 22-Dec-21 13:20pm    
strlen(Message) returns 27 which is equal to character numbers in .txt file

Message[count]='\0'; I added this because compressing was crushing before it, it was adding some U*? strange chars at the end of file (I assume program was not seeing end of the file)
jeron1 22-Dec-21 13:31pm    
"strlen(Message) returns 27 which is equal to character numbers in .txt file"
it can't be consistent, as no data has been put into the Message array at the point strlen is called, it may have garbage values as the array is not initialized.


the max index for
"char Message[count]" is count - 1. Writing outside of the bounds (0 - (count-1)) is going to cause problems.
Dave Kreskowiak 22-Dec-21 19:26pm    
Did you say your text file is 27 characters long?

1 solution

Look at this:
C
int count;
/* assign a value to count */
char Message[count];  /* creates a buffer of  size count */

/* ... a bit later */
unsigned CompressedSize = sizeof(Message)*7/8;

Creating an array on the stack with a size determined at runtime is a GNU extension. Which isn't to say that its wrong, but there are some things to be aware of:
1) if count is larger than maximum stack size, you can corrupt other variables already on the stack. It might be better to use malloc/free to create your Message variable.
2) The size of the Message variable is set at runtime, based on the size of the input file. However the sizeof operator is calculated at compile time. Since the size of the array isn't known at compile time, sizeof(Message) evaluates to sizeof(char *). This means that the value of CompressedSize is going to be wrong.
3) Its not portable. I think clang probably supports this, but I'm fairly sure that MSVC does not. That may or may not be an issue for you, either now or later.

As an aside, this code
C
while( (letters = fgetc(file)) != EOF) {
  count++;
} 
fseek(file, 0, SEEK_SET);
could be replaced with
C
fseek(file, 0, SEEK_END);
long count = ftell(file);
fseek(file, 0, SEEK_SET);
This way, you don't have to spend time reading each byte in the file, and you can save yourself a (small) amount of space, since the variable "letters" is no longer needed.
 
Share this answer
 
Comments
Lones 22-Dec-21 15:32pm    
Thanks a lot for answer, sorry but what did you mean by 1) one creating message with malloc

char Message[count] = (long*)malloc(count *sizeof(long));

I tried like this but compiler says "error: variable-sized object may not be initialized" however even if it would accept I think this would be already pointless because you said "if count is larger than maximum stack size, you can corrupt other variables already on the stack" I couldn't figure out how to create array Message with malloc also with limit of count,
sorry my questions might be bad or ridiculous, I'm very new to programming (just been 2 months) and only used C and C++ till now
k5054 22-Dec-21 18:42pm    
malloc and free are essential tools for the C programmer. You should google for some malloc tutorials if you don't know about them, yet.
Basically malloc returns a block of memory, suitably aligned for the storeage of any data. so a call to malloc(1024), returns a pointer to a block of memory of 1024 bytes. In your case you would do
char *Message = (char *)alloc(count);
and then later call free(Message)
I'm not sure why you think you need to multiply count by sizeof(long). count is the size of the file, so you don't need to multiply anything here.
In C, you don't need to cast the return value of malloc to the type of the object you are assigning it to. Whether you do or not is a matter of style, and/or coding standards. For C++ a cast is required, but for C++ you should use one of the C++ cast operators.
Lones 22-Dec-21 15:50pm    
also changing my codes to this

fseek(file, 0, SEEK_END);
long count = ftell(file);
fseek(file, 0, SEEK_SET);

solved 00 00 00 problem but now decompressing fails

Message: hello world this is a trial
Compressed version: E8 32 9B FD 06 DD DF 72 36 19 44 47 A7 E7 A0 F4 1C 14 06 D1 E5 E9 30
Decompressed version: hello world this is a tria
Compression crushed!

decompression doesn't take last letter
k5054 22-Dec-21 18:48pm    
You should probably investigate how to use the debugger. It will help you understand why either your compress or decompress routines are failing. I have not looked into your code too deeply, but an "off by one" error is a common mistake, even for experienced programmers!

In the previous reply, I should have pointed out that you might need to add 1 to the amount of memory you malloc() to allow for a NUL ('\0') byte at the end of the string. Not allowing for the fact that C strings have a NUL terminating char is one of the many sources of the 'off by one' bug.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900