Monday, December 8, 2008

PERL: Approaches for FileSize

Hi,
I’ve been a great fan of all scripting languages, especially PERL (Practical Extraction and Report Language). My objective through this blog is to bring out some anomalies or rather some surprising facts that can swerve your program from the expected output.

Here’s one.
Consider a situation where you need to find out the size of a file present on the disk, say C:\test.txt.

The content of test.txt is:
-------------------------------------------------------------
Hi,”\n”
How about some interesting facts in PERL”\n”
Look at this!!!”\n”
Interesting!!”\n”
-Chetan”\n”
“\n”
-----------------------------------------------------------
Now there are two ways we approach this situation.

1. Open (FH, “< C:\\test.txt”) die $!; #open the file in read mode or exit with error code
print "Size of file (in bytes) is: ";
print –s FH; # -s options gets the size of filehandle, i.e. the file.
The output of the above snippet is: Size of file (in bytes) is: 90

2. open (FH, "< C:\\test.txt") die $!; #open the file in read mode or exit with error code
my @filecontent=; #get the contents of the file in an array
my $count =0; #initialize a counter
foreach(@filecontent) #Browsing through the file contents line by line
{
$count += length($_); #get the length of each line and add it to the counter
}
print "size of file (in bytes) is: $count"; #prints the file size.
The output here is: Size of file (in bytes) is: 84

Oh!! What’s the difference here? How come same file printing out to give different file sizes? Any wrong with the way either of programs is run? Which is the correct approach?

Explanation: The hitch here is, the function length($_) doesn’t consider the “\n” character that I have pointed out in the content of test.txt. So the 6 “\n” that are present, are not counted and hence the results differ.
But the correct size is given by the first approach as the new line character is also very much a character of the file. Now you know the next time you need to find the size of file what approach can you bank on!!

3 comments:

Rishi said...

Very good point to be noted and very nicely explained also. I think this may be very useful when dealing with files and especially while troubleshooting something related to file sizes.

But looks like in the 2nd point you have missed something in the code ... :)

Anonymous said...

Yes. It should be -
my @filecontent = FH;

Rishi said...

You got it correct, it is:
my @filecontent = <FH>;
:D