I am trying to merge binary files

Pages: 12
Jan 5, 2013 at 7:42pm
I am attempting to merge binary files. However, this is to no avail. The program keeps segfaulting. I want to merge the buffers the files are stored in and then write the new one to disk. Anyway, here is my code. Any help is greatly appreciated.

Main.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include "getsize.h"
long lSize;
char * buffer;
size_t result;
FILE * pFile;
FILE * pFile2;
FILE * pFile3;

void read1()
{
    pFile = fopen ( "uTorrent.exe", "rb");
    fseek (pFile , 0 , SEEK_END);
    lSize = ftell (pFile);
    rewind (pFile);
    buffer = (char*) malloc (sizeof(char)*lSize);
    result = fread (buffer,1,lSize,pFile);
}
void read2()
{
    pFile2 = fopen ( "CCleaner.exe", "rb");
    fseek (pFile2 , 0 , SEEK_END);
    lSize = ftell (pFile2);
    rewind (pFile2);
    buffer = (char*) malloc (sizeof(char)*lSize);
    result = fread (buffer,1,lSize,pFile);
}
void write()
{
    pFile3 = fopen ( "test.exe", "a+");
    FILE * buffer[] = {pFile2, pFile}; // It would not let me compile with "char * buffer[] = {pFile2, pFile};"
    fwrite (buffer , 1 , z , pFile3 );
}
int main()
{
    calcsize();
    read1();
    fclose (pFile);
    read2();
    fclose (pFile2);
    write();
    fclose (pFile3);
    free (buffer);
    return 0;
}


getsize.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <stdio.h>
#include <stdlib.h>

FILE * file1;
FILE * file2;
long long x, y, z,a ,b;
long fSize, fSize2;
int calcsize()
{
 file1 = fopen ( "uTorrent.exe", "rb");
 file2 = fopen ( "CCleaner.exe", "rb");
 fseek (file1, 0, SEEK_END);
 fSize = ftell (file1);
 fSize2 = ftell (file2);
x = sizeof(file1);
y = sizeof(file2);
b = x * fSize;
a = y * fSize2;
return z = a + b;
}
Last edited on Jan 5, 2013 at 9:47pm
Jan 6, 2013 at 3:21am
You have written functions that can potentially lead to memory leaks. Having said that, you have some problems with your code logic. read1() loads the file into buffer. read2() creates a new heap space which buffer will point to, then loads another file into it. Now, how will write() know where the buffer created by read1() is?

On top of that, do not put your clean up code outside the scope of where you created your resources. This is a very bad practice! Your read1() function obtains file handle yet this handle is closed after read1() returns. Obviously, for a small program like yours, it may not be a big deal, but correcting these coding practices early on is much more important.

In any case, your write() function needs to be re-written. You can actually improve this program by getting rid of the write() function. The logic will go like this:

1. Create a new file for writing
2. Open first file, read contents, write to the new file, close first file
3. Open second file, read contents, append to the new file, close second file
4. Close new file.


Also, your calcsize() function (if the purpose is to calculate the sum of the number of octets of two files), has incorrect logic/code. Just do something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/* code is in C */
long calcsize()
{
    FILE* fp;
    long size = 0;
    fp = fopen("file1", "rb");
    if (fp != NULL) {
        fseek(fp, 0, SEEK_END);
        size += ftell(fp);
        fclose(fp); /* do not forget this! */
    }
    fp = fopen("file2", "rb");
    if (fp != NULL) {
        fseek(fp, 0, SEEK_END);
        size += ftell(fp);
        fclose(fp); /* do not forget this! */
    }
    return (size);
}


Some ideas:
1. Do not use heap memory unless necessary. Furthermore, do not dynamically allocate memory based on the size of the file. Consider a case when the file size is 16GiB? Can you guarantee you have enough memory to load it all?

2. If you can work with C++, then do so. And when doing so, follow RAII idiom.
Jan 6, 2013 at 10:48pm
Hello again, here is my revised code. I took into account the advise given to me in the previous post. Also, thank you for the help.

Main.cpp
1
2
3
4
5
6
7
8
9
10
#include "getsize.h"
#include "read.h"

int main()
{
    read();
    free (buffer);
    free (buffer2);
    return 0;
}


read.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
long lSize;
long lSize2;
char * buffer;
char * buffer2;
FILE * pFile;
FILE * pFile2;
FILE * pFile3;
void read()
{
    pFile3 = fopen ( "test.exe", "ab+");
    pFile = fopen ( "uTorrent.exe", "rb");
    pFile2 = fopen ( "CCleaner.exe", "rb");
    fseek (pFile , 0 , SEEK_END);
    fseek (pFile2 , 0 , SEEK_END);
    lSize = ftell (pFile);
    lSize2 = ftell (pFile);
    rewind (pFile);
    rewind (pFile2);
    buffer = (char*) malloc (sizeof(char)*lSize);
    buffer2 = (char*) malloc (sizeof(char)*lSize2);
    fwrite (buffer , 1 , calcsize() , pFile3 );
    fwrite (buffer2 , 1 , calcsize() , pFile3 );
    fclose (pFile3);
    fclose (pFile2);
    fclose (pFile);
}


getsize.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <stdlib.h>

FILE * file1;
FILE * file2;
long long size;
long long x, y;
long long calcsize()
{
    file1 = fopen ( "uTorrent.exe", "rb");
    file2 = fopen ( "CCleaner.exe", "rb");
    fseek (file1, 0, SEEK_END);
    fseek (file2, 0, SEEK_END);
    x = sizeof(file1);
    y = sizeof(file2);
    size += x + y;
    return(size);
}


It no longer has any segmentation faults. However, I does not write out the whole file. It writes 80 bytes of 4.041320801 MiB.
Last edited on Jan 6, 2013 at 10:53pm
Jan 6, 2013 at 11:52pm
1. Your read() function does not read the contents of pFile and pFile2 to buffer and buffer2, respectively, yet you are writing them to pFile3. Take a look at fread() function here (http://www.cplusplus.com/reference/cstdio/fread/) for more information.

2. Your calcsize() function is still incorrect. You will need to use the ftell() function to know the offset of the final octet at the end of file (which is essentially the size of the file). See here: http://www.cplusplus.com/reference/cstdio/fseek/
Jan 7, 2013 at 12:40am
If you had my C++ library, it was something like:
1
2
3
4
5
6
7
8
StreamA Data;
LFile FirstFile(L"uTorrent.exe");
LFile SecondFile(L"Cleaner.exe");
LFile ThirdFile(L"test.exe",FILEMODE_WRITE);
FirstFile.DumpToStream(Data);
ThirdFile.DumpFromStream(Data);
SecondFile.DumpToStream(Data);
ThirdFile.DumpFromStream(Data);

And that's done.
Using standard C:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FILE * pFirst = fopen("uTorrent.exe","rb");
FILE * pSecond = fopen("CCleaner.exe","rb");
FILE * pThird = fopen("test.exe","wb");
unsigned int Length = 0;

// Read First File
fseek(pFirst,0,SEEK_END);
Length = ftell(pFirst);
unsigned char * Data = new unsigned char[Length];
fseek(pFirst,0,SEEK_SET);
fread(Data,1,Length,pFirst);
fclose(pFirst);

// And write it
fwrite(Data,1,Length,pThird);
delete[] Data;

// Read second file
fseek(pSecond,0,SEEK_END);
Length = ftell(pSecond);
unsigned char * Data = new unsigned char[Length];
fseek(pSecond,0,SEEK_SET);
fread(Data,1,Length,pSecond);
fclose(pSecond);

// And write it
fwrite(Data,1,Length,pThird);
delete[] Data;


No error checking, beware.
Jan 7, 2013 at 3:30am
Just out of curiosity, what do you expect to happen when you concatenate to binaries? I doubt test.exe is going to be runnable.
Jan 7, 2013 at 3:48am
It's not going to be usable (maybe the first one? but idk if theres any bound checking or crc check) but my first 'valid' tought is that hes planning a filepacker?
Last edited on Jan 7, 2013 at 3:49am
Jan 7, 2013 at 7:20am
> If you had my C++ library, it was something like....

With the standard C++ library, it is:
1
2
3
4
5
std::ifstream first_file( "uTorrent.exe", std::ios::binary ) ;
std::ifstream second_file( "CCleaner.exe", std::ios::binary ) ;

std::ofstream output_file( "test.exe", std::ios::binary ) ;
output_file << first_file.rdbuf() << second_file.rdbuf() ;



If I had to do this in C:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FILE* output_file = fopen( "Test.exe", "wb" ) ;
const char* const input_files[] = { "uTorrent.exe", "CCleaner.exe" } ;

if( output_file )
{
    for( size_t i = 0 ; i < sizeof(input_files) / sizeof( input_files[0] ) ; ++i )
    {
        FILE* input_file = fopen( input_files[i], "rb" ) ;
        if( input_file )
        {
            int c ;
            while( ( c = fgetc(input_file) ) != EOF ) fputc( c, output_file ) ;
            fclose(input_file) ;
        }
    }

    fclose(output_file) ;
}
Jan 7, 2013 at 8:37am
while( ( c = fgetc(input_file) ) != EOF ) fputc( c, output_file ) ;


This works, but I would recommend using fread and fwrite (with buffers matching closely of the libc implementation used). Unless it is necessary to read each single byte, fgetc/fputc is less efficient than fread/fwrite.
Jan 7, 2013 at 9:02am
C file streams are fully buffered by default. The size of the memory buffer used in fread or fwrite has no effect on the internal buffering done by the stream. To control the buffering of the stream, use setvbuf.

That fgetc/fputc could be less efficient than fread/fwrite is because these functions may not be inlined, and there would be extra function-call overhead. It has got nothing to do with the buffer sizes.
Jan 7, 2013 at 1:15pm
Thanks, I will try these suggestions when I get home from school.
Jan 7, 2013 at 4:55pm
JLBorges wrote:
That fgetc/fputc could be less efficient than fread/fwrite is because these functions may not be inlined, and there would be extra function-call overhead. It has got nothing to do with the buffer sizes.

Let's not forget this is implementation-dependent, as for example, the MSVS has a threadsafe option, which locks/unlocks the stream every time it is accessed, slowing down every operation for single-threaded programs.
Jan 7, 2013 at 5:20pm
This is an issue with VS 2012; the earlier versions of Visual Studio had both single-threaded and multi-threaded versions of the library.

Regardless of the version, if you have a single-threaded program, its make file should have
-D_CRT_DISABLE_PERFCRIT_LOCKS.
With that, the functions map to the _xxx_nolock versions (fgetc to _fgetc_nolock etc.). Otherwise, there is going to be a performance hit *everywhere*. For instance in malloc(); which is typically more critical for performance than disk i/o.
Jan 7, 2013 at 6:52pm
JLBorges wrote:
This is an issue with VS 2012
EssGeEich wrote:
the MSVS has a threadsafe option

I was talking about VS08/VS10 anyways.

JLBorges wrote:
For instance in malloc(); which is typically more critical for performance than disk i/o.

+1, or you could simply use the Windows-dependent options (HeapAlloc/HeapFree)

EDIT: Uhm, we going OT.
Last edited on Jan 7, 2013 at 6:54pm
Jan 7, 2013 at 9:04pm

This is an issue with VS 2012; the earlier versions of Visual Studio had both single-threaded and multi-threaded versions of the library.

Regardless of the version, if you have a single-threaded program, its make file should have
-D_CRT_DISABLE_PERFCRIT_LOCKS.
With that, the functions map to the _xxx_nolock versions (fgetc to _fgetc_nolock etc.). Otherwise, there is going to be a performance hit *everywhere*. For instance in malloc(); which is typically more critical for performance than disk i/o.


I am using Code::Blocks 12.11, Windows 8 pro x64 and mingw-64. So, I am not concerned with -D_CRT_DISABLE_PERFCRIT_LOCKS as that does not apply to my setup.
Jan 7, 2013 at 9:12pm

Just out of curiosity, what do you expect to happen when you concatenate to binaries? I doubt test.exe is going to be runnable.

It is practice with file handling. Also, "test.exe" is runnable.

It's not going to be usable (maybe the first one? but idk if theres any bound checking or crc check) but my first 'valid' tought is that hes planning a filepacker?


Actually, I am trying to figure how to handle files in c++. This way learning to use zlib will be easier.
Last edited on Jan 7, 2013 at 9:17pm
Jan 8, 2013 at 1:51am
Also, "test.exe" is runnable.
Really?
That surprises me, my test using JLBorges c++ example didn't run. I don't have utorent so I couldn't test with that.
Jan 8, 2013 at 7:34am
> Also, "test.exe" is runnable.
>> my test using JLBorges c++ example didn't run.

The PE header has a checksum field which can contain a checksum of the bytes in the image file. If this is set to zero, there is no problem. If not, on recent versions of Windows, the loader verifies the checksum prior to loading the image.
Jan 8, 2013 at 11:53am
So, does the checksum check vary by executable?
Jan 8, 2013 at 12:07pm
It varies depending on a file's content.
Pages: 12