<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBWS.NET &#187; file</title>
	<atom:link href="http://www.dbws.net/blog/tag/file/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbws.net/blog</link>
	<description>Software development mutterings and maybe a little something about myself.</description>
	<lastBuildDate>Wed, 09 Nov 2011 16:56:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Binary file processing with Java</title>
		<link>http://www.dbws.net/blog/2007/02/28/binary-file-processing-with-java/</link>
		<comments>http://www.dbws.net/blog/2007/02/28/binary-file-processing-with-java/#comments</comments>
		<pubDate>Wed, 28 Feb 2007 18:55:18 +0000</pubDate>
		<dc:creator>Bolo</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[binary]]></category>
		<category><![CDATA[file]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://www.dbws.net/blog/?p=8</guid>
		<description><![CDATA[A task arose recently from one my clients requesting me to help process some binary files they had. They needed me to create a new file based on their input file with where all occurrences of a specific byte get replaced with a different specific byte. No problem I thought, but deciding which method to [...]]]></description>
			<content:encoded><![CDATA[<p>A task arose recently from one my clients requesting me to help process some binary files they had. They needed me to create a new file based on their input file with where all occurrences of a specific byte get replaced with a different specific byte. No problem I thought, but deciding which method to use to perform the reading and replacing of the values proved to be not immediately obvious.</p>
<p>There are a several approaches to reading and writing files, as well as several approaches to the replacing of values, in fact Java almost always offers a multitude of solutions to a problem.</p>
<p>So i decided to embark upon a little experiment to see what I could find to be the most efficient method.</p>
<p>Firstly I decided to create my own test data. To do so I just simply created a file containing many random bytes. I chose a random number between 0 and 32 as that most closely matched my original client requirement. CreateTest.java will create a 10mb file.</p>
<p>I wrote 4 little tests during the experiment,  trying String replacement, byte comparison and replacement and buffered readers/writers. The result surprised me.</p>
<p><strong>Method 1.</strong></p>
<p>My first method I decided to read in the data into a 4k buffer using a FileInputStream.  I then create a new String object based upon the bytes read and replace the old value with the new value using String.replace(). Then the bytes from the String are written to a FileOutputStream.</p>
<p><strong>Method 2</strong></p>
<p>The second method I simply wondered if I could save any processing time by doing all the functions of Method1 for comparison and replacing on as little code lines as possible, as I expected this did not make any real difference.</p>
<p><strong>Method 3</strong></p>
<p>For method 3 I used the same method of reading the file by reading into a buffer using FileInputStream but rather than use the String replace function, I first compare the bytes and then write either the old or the new value into a second buffer. The result of this change was encouraging, the time taken to process the file was about 20-40% the time it took using the String object.</p>
<p><strong>Method 4</strong></p>
<p>Feeling encouraged by the results in method 3, I thought now if I use the Buffered Input &amp; Output Stream objects java gives us, then I was sure to reap extra benefits, But the time taken rose dramatically.</p>
<p>Timing Results.</p>
<p>To come up with these results, I ran the classes from the command prompt 10 times each and recorded the average time taken.</p>
<p>Method1 : 500 ms<br />
Method2 : 700 ms<br />
Method3 : 150 ms<br />
Method4 : 2800 ms</p>
<p>I wondered if I could tweak the buffer size in Method 3 to see what difference that makes when processing the files, To be honest I donâ€™t know why I picked 4kb in the first place, it just happens to be habit when defining buffer sizes.</p>
<p>I tried 1k, 16k, 32k, 100k, and 512k.</p>
<p>With a 1k buffer the average time was 280 ms.<br />
A 16k buffer gave me an average of 90 ms.<br />
A 32k buffer gave me an average of 78 ms.<br />
A 100k buffer averages out at 125 ms.<br />
And the 512k buffer was back to roughly 200ms.</p>
<p>So using a 32k buffer gave me the best results.</p>
<p>Conclusion.</p>
<p>I was a little surprised that the time taken rose so dramatically when using the buffered reader and writer object. I must admit my understanding of the buffering objects is not at an expert level but I did expect to see improvement, else why use them if controlling the buffer myself proves to be so much more efficient.</p>
<p>Below you will find all the code used in these tests:</p>
<div class="source">
<p>/*<br />
* CreateTest.java<br />
*<br />
* Created on 04 February 2007, 11:19<br />
*<br />
* Purpose: To create a binary test file used in file processing tests.<br />
*/</p>
<p>import java.io.FileOutputStream;<br />
import java.util.Random;</p>
<p>/**<br />
*<br />
* @author DAVE<br />
*/<br />
public class CreateTest {</p>
<p>static final long fileSize = 10 * 1024 * 1024; // Lets make it 10mb<br />
static final String outFile = &#8220;testfile.dat&#8221;;<br />
private static Random rn = new Random();</p>
<p>/** Creates a new instance of CreateTest */<br />
public CreateTest() {<br />
}</p>
<p>public static void main(String[] args) throws Exception{<br />
// Open file for output<br />
FileOutputStream out = new FileOutputStream(outFile);</p>
<p>for (int i = 0; i &lt; fileSize; i++) {<br />
// Get Random int between 0 &amp; 32<br />
int idx = rand(0,32);<br />
out.write(idx);<br />
}<br />
out.close();<br />
System.out.println(&#8220;testfile.dat created.&#8221;);<br />
}</p>
<p>// get Random number<br />
private static int rand(int lo, int hi) {<br />
int n = hi &#8211; lo + 1;<br />
int i = rn.nextInt() % n;<br />
if (i &lt; 0)<br />
i = -i;<br />
return lo + i;<br />
}<br />
}</p>
</div>
<div class="source">/*<br />
* Method1.java<br />
*<br />
* Created on 02 February 2007, 17:49<br />
*<br />
* Process binary file, byte by byte performing a replace<br />
* using Strings &amp; replace() function.<br />
*<br />
*/</p>
<p>import java.io.FileInputStream;<br />
import java.io.FileOutputStream;<br />
import java.util.Date;</p>
<p>public class Method1 {</p>
<p>public Method1() {<br />
}</p>
<p>public static void main(String[] args) throws Exception{<br />
Date startTime = new Date();<br />
Date endTime;</p>
<p>String inFile = &#8220;testfile.dat&#8221;;<br />
String outFile = &#8220;outputfile.dat&#8221;;</p>
<p>char oldValue = 0&#215;00;<br />
char newValue = 0xFF;<br />
final int bufferSize = 4 * 1024; // 4kb buffer<br />
byte[] buffer = new byte[bufferSize];</p>
<p>FileInputStream in = new FileInputStream(inFile);<br />
FileOutputStream out = new FileOutputStream(outFile);</p>
<p>String s = null;<br />
int read = in.read(buffer);<br />
while (read &gt;= 0) {<br />
if (read &gt; 0) {<br />
s = new String(buffer);<br />
if (s.length() != read) {<br />
s = s.substring(0,read);<br />
}<br />
s = s.replace(oldValue,newValue);<br />
out.write(s.getBytes());<br />
}<br />
read = in.read(buffer);<br />
}</p>
<p>out.close();<br />
in.close();<br />
endTime = new Date();</p>
<p>System.out.println(&#8221; Method 1  &#8211; time taken (ms) : &#8220;+(endTime.getTime() &#8211; startTime.getTime()));</p>
<p>}<br />
}</p>
</div>
<div class="source">/*<br />
* Method2.java<br />
*<br />
* Created on 02 February 2007, 17:49<br />
*<br />
* Process binary file, byte by byte performing a replace<br />
* using Strings &amp; replace() function.<br />
*/</p>
<p>import java.io.FileInputStream;<br />
import java.io.FileOutputStream;<br />
import java.util.Date;</p>
<p>public class Method2 {</p>
<p>public Method2() {<br />
}</p>
<p>public static void main(String[] args) throws Exception{<br />
Date startTime = new Date();<br />
Date endTime;</p>
<p>String inFile = &#8220;testfile.dat&#8221;;<br />
String outFile = &#8220;outputfile.dat&#8221;;</p>
<p>char oldValue = 0&#215;00;<br />
char newValue = 0xFF;<br />
final int bufferSize = 4 * 1024; // 4kb buffer<br />
byte[] buffer = new byte[bufferSize];</p>
<p>FileInputStream in = new FileInputStream(inFile);<br />
FileOutputStream out = new FileOutputStream(outFile);</p>
<p>int read = in.read(buffer);<br />
while (read &gt;= 0) {<br />
if (read &gt; 0) {<br />
out.write(new String(buffer,0,read).replace(oldValue,newValue).getBytes());<br />
}<br />
read = in.read(buffer);<br />
}</p>
<p>out.close();<br />
in.close();</p>
<p>endTime = new Date();<br />
System.out.println(&#8221; Method 2  &#8211; time taken (ms) : &#8220;+(endTime.getTime() &#8211; startTime.getTime()));</p>
<p>}<br />
}</p>
</div>
<div class="source">/*<br />
* Method3.java<br />
*<br />
* Created on 02 February 2007, 17:49<br />
*<br />
* Process binary file, byte by byte performing a replace<br />
* using byte comparison only<br />
*/</p>
<p>import java.io.FileInputStream;<br />
import java.io.FileOutputStream;<br />
import java.io.OutputStreamWriter;<br />
import java.util.Date;</p>
<p>public class Method3 {</p>
<p>public Method3() {<br />
}</p>
<p>public static void main(String[] args) throws Exception{<br />
Date startTime = new Date();<br />
Date endTime;</p>
<p>String inFile = &#8220;testfile.dat&#8221;;<br />
String outFile = &#8220;outputfile.dat&#8221;;</p>
<p>byte oldValue = (byte)0&#215;00;<br />
byte newValue = (byte)0xFF;</p>
<p>final int bufferSize = 32 * 1024; // 32kb buffer</p>
<p>byte[] buffer = new byte[bufferSize];<br />
byte[] cBuffer = new byte[bufferSize];</p>
<p>FileInputStream in = new FileInputStream(inFile);<br />
FileOutputStream out = new FileOutputStream(outFile);</p>
<p>int read = in.read(buffer);<br />
while (read &gt;= 0) {<br />
if (read &gt; 0) {<br />
for (int i = 0; i &lt; read; i++) {<br />
if (buffer[i] == oldValue)<br />
cBuffer[i] = newValue;<br />
else<br />
cBuffer[i] = buffer[i];<br />
}<br />
out.write(cBuffer,0,read);<br />
}<br />
read = in.read(buffer);<br />
}</p>
<p>out.close();<br />
in.close();<br />
endTime = new Date();<br />
System.out.println(&#8221; Method 3  &#8211; time taken (ms) : &#8220;+(endTime.getTime() &#8211; startTime.getTime()));<br />
}<br />
}</p>
</div>
<div class="source">/*<br />
* Method4.java<br />
*<br />
* Created on 02 February 2007, 17:49<br />
*<br />
* Process binary file, byte by byte performing a replace<br />
* using buffered streams<br />
*/</p>
<p>import java.io.BufferedInputStream;<br />
import java.io.BufferedOutputStream;<br />
import java.io.FileInputStream;<br />
import java.io.FileOutputStream;<br />
import java.util.Date;</p>
<p>public class Method4 {</p>
<p>public Method4() {<br />
}</p>
<p>public static void main(String[] args) throws Exception{<br />
Date startTime = new Date();<br />
Date endTime;</p>
<p>String inFile = &#8220;testfile.dat&#8221;;<br />
String outFile = &#8220;outputfile.dat&#8221;;</p>
<p>byte oldValue = (byte)0&#215;00;<br />
byte newValue = (byte)0xFF;</p>
<p>BufferedInputStream bis = null;<br />
BufferedOutputStream bos = null;</p>
<p>bis = new BufferedInputStream(new FileInputStream(inFile));<br />
bos = new BufferedOutputStream(new FileOutputStream(outFile));</p>
<p>int theByte;<br />
while ((theByte = bis.read()) != -1) {<br />
if (theByte != oldValue)<br />
bos.write(theByte);<br />
else<br />
bos.write(newValue);<br />
}</p>
<p>bos.close();<br />
bis.close();<br />
endTime = new Date();<br />
System.out.println(&#8221; Method 4  &#8211; time taken (ms) : &#8220;+(endTime.getTime() &#8211; startTime.getTime()));<br />
}<br />
}</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dbws.net/blog/2007/02/28/binary-file-processing-with-java/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

