Published on Wednesday, February 28th, 2007 at 6:55 pm

A task arose recently from one my clients requesting me to help process some binary files they had. They needed me to create a new file based on their input file with where all occurrences of a specific byte get replaced with a different specific byte. No problem I thought, but deciding which method to use to perform the reading and replacing of the values proved to be not immediately obvious.

There are a several approaches to reading and writing files, as well as several approaches to the replacing of values, in fact Java almost always offers a multitude of solutions to a problem.

So i decided to embark upon a little experiment to see what I could find to be the most efficient method.

Firstly I decided to create my own test data. To do so I just simply created a file containing many random bytes. I chose a random number between 0 and 32 as that most closely matched my original client requirement. CreateTest.java will create a 10mb file.

I wrote 4 little tests during the experiment, trying String replacement, byte comparison and replacement and buffered readers/writers. The result surprised me.

Method 1.

My first method I decided to read in the data into a 4k buffer using a FileInputStream. I then create a new String object based upon the bytes read and replace the old value with the new value using String.replace(). Then the bytes from the String are written to a FileOutputStream.

Method 2

The second method I simply wondered if I could save any processing time by doing all the functions of Method1 for comparison and replacing on as little code lines as possible, as I expected this did not make any real difference.

Method 3

For method 3 I used the same method of reading the file by reading into a buffer using FileInputStream but rather than use the String replace function, I first compare the bytes and then write either the old or the new value into a second buffer. The result of this change was encouraging, the time taken to process the file was about 20-40% the time it took using the String object.

Method 4

Feeling encouraged by the results in method 3, I thought now if I use the Buffered Input & Output Stream objects java gives us, then I was sure to reap extra benefits, But the time taken rose dramatically.

Timing Results.

To come up with these results, I ran the classes from the command prompt 10 times each and recorded the average time taken.

Method1 : 500 ms
Method2 : 700 ms
Method3 : 150 ms
Method4 : 2800 ms

I wondered if I could tweak the buffer size in Method 3 to see what difference that makes when processing the files, To be honest I don’t know why I picked 4kb in the first place, it just happens to be habit when defining buffer sizes.

I tried 1k, 16k, 32k, 100k, and 512k.

With a 1k buffer the average time was 280 ms.
A 16k buffer gave me an average of 90 ms.
A 32k buffer gave me an average of 78 ms.
A 100k buffer averages out at 125 ms.
And the 512k buffer was back to roughly 200ms.

So using a 32k buffer gave me the best results.

Conclusion.

I was a little surprised that the time taken rose so dramatically when using the buffered reader and writer object. I must admit my understanding of the buffering objects is not at an expert level but I did expect to see improvement, else why use them if controlling the buffer myself proves to be so much more efficient.

Below you will find all the code used in these tests:

/*
* CreateTest.java
*
* Created on 04 February 2007, 11:19
*
* Purpose: To create a binary test file used in file processing tests.
*/

import java.io.FileOutputStream;
import java.util.Random;

/**
*
* @author DAVE
*/
public class CreateTest {

static final long fileSize = 10 * 1024 * 1024; // Lets make it 10mb
static final String outFile = “testfile.dat”;
private static Random rn = new Random();

/** Creates a new instance of CreateTest */
public CreateTest() {
}

public static void main(String[] args) throws Exception{
// Open file for output
FileOutputStream out = new FileOutputStream(outFile);

for (int i = 0; i < fileSize; i++) {
// Get Random int between 0 & 32
int idx = rand(0,32);
out.write(idx);
}
out.close();
System.out.println(“testfile.dat created.”);
}

// get Random number
private static int rand(int lo, int hi) {
int n = hi – lo + 1;
int i = rn.nextInt() % n;
if (i < 0)
i = -i;
return lo + i;
}
}

/*
* Method1.java
*
* Created on 02 February 2007, 17:49
*
* Process binary file, byte by byte performing a replace
* using Strings & replace() function.
*
*/

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Date;

public class Method1 {

public Method1() {
}

public static void main(String[] args) throws Exception{
Date startTime = new Date();
Date endTime;

String inFile = “testfile.dat”;
String outFile = “outputfile.dat”;

char oldValue = 0x00;
char newValue = 0xFF;
final int bufferSize = 4 * 1024; // 4kb buffer
byte[] buffer = new byte[bufferSize];

FileInputStream in = new FileInputStream(inFile);
FileOutputStream out = new FileOutputStream(outFile);

String s = null;
int read = in.read(buffer);
while (read >= 0) {
if (read > 0) {
s = new String(buffer);
if (s.length() != read) {
s = s.substring(0,read);
}
s = s.replace(oldValue,newValue);
out.write(s.getBytes());
}
read = in.read(buffer);
}

out.close();
in.close();
endTime = new Date();

System.out.println(” Method 1 – time taken (ms) : “+(endTime.getTime() – startTime.getTime()));

}
}

/*
* Method2.java
*
* Created on 02 February 2007, 17:49
*
* Process binary file, byte by byte performing a replace
* using Strings & replace() function.
*/

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Date;

public class Method2 {

public Method2() {
}

public static void main(String[] args) throws Exception{
Date startTime = new Date();
Date endTime;

String inFile = “testfile.dat”;
String outFile = “outputfile.dat”;

char oldValue = 0x00;
char newValue = 0xFF;
final int bufferSize = 4 * 1024; // 4kb buffer
byte[] buffer = new byte[bufferSize];

FileInputStream in = new FileInputStream(inFile);
FileOutputStream out = new FileOutputStream(outFile);

int read = in.read(buffer);
while (read >= 0) {
if (read > 0) {
out.write(new String(buffer,0,read).replace(oldValue,newValue).getBytes());
}
read = in.read(buffer);
}

out.close();
in.close();

endTime = new Date();
System.out.println(” Method 2 – time taken (ms) : “+(endTime.getTime() – startTime.getTime()));

}
}

/*
* Method3.java
*
* Created on 02 February 2007, 17:49
*
* Process binary file, byte by byte performing a replace
* using byte comparison only
*/

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.util.Date;

public class Method3 {

public Method3() {
}

public static void main(String[] args) throws Exception{
Date startTime = new Date();
Date endTime;

String inFile = “testfile.dat”;
String outFile = “outputfile.dat”;

byte oldValue = (byte)0x00;
byte newValue = (byte)0xFF;

final int bufferSize = 32 * 1024; // 32kb buffer

byte[] buffer = new byte[bufferSize];
byte[] cBuffer = new byte[bufferSize];

FileInputStream in = new FileInputStream(inFile);
FileOutputStream out = new FileOutputStream(outFile);

int read = in.read(buffer);
while (read >= 0) {
if (read > 0) {
for (int i = 0; i < read; i++) {
if (buffer[i] == oldValue)
cBuffer[i] = newValue;
else
cBuffer[i] = buffer[i];
}
out.write(cBuffer,0,read);
}
read = in.read(buffer);
}

out.close();
in.close();
endTime = new Date();
System.out.println(” Method 3 – time taken (ms) : “+(endTime.getTime() – startTime.getTime()));
}
}

/*
* Method4.java
*
* Created on 02 February 2007, 17:49
*
* Process binary file, byte by byte performing a replace
* using buffered streams
*/

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Date;

public class Method4 {

public Method4() {
}

public static void main(String[] args) throws Exception{
Date startTime = new Date();
Date endTime;

String inFile = “testfile.dat”;
String outFile = “outputfile.dat”;

byte oldValue = (byte)0x00;
byte newValue = (byte)0xFF;

BufferedInputStream bis = null;
BufferedOutputStream bos = null;

bis = new BufferedInputStream(new FileInputStream(inFile));
bos = new BufferedOutputStream(new FileOutputStream(outFile));

int theByte;
while ((theByte = bis.read()) != -1) {
if (theByte != oldValue)
bos.write(theByte);
else
bos.write(newValue);
}

bos.close();
bis.close();
endTime = new Date();
System.out.println(” Method 4 – time taken (ms) : “+(endTime.getTime() – startTime.getTime()));
}
}

Be Sociable, Share!

Related Posts

One Response to “Binary file processing with Java”

  1. … track backe bei http://jessieagbisit.sabaiii.com/ ……

    grand , votre blog site disposition style est véritablement nice , Je suis chasse pour un nouveau disposition style pour mon moncler doudoune propre weblog , j’aime vôtre, maintenant je vais aller chercher le identiques thème !…

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


About Me

Welcome to my blog. Here you'll find mainly work related content, a suppository for all my notes which otherwise would end up on a post it note.