Perl: File Functions
File Functions
Table of Contents:
- Reading Directories
- Reading and Writing Files
- Binary Files
- Getting File Statistics
- Printing Revisited
File Functions
The following file functions are available in Perl:
- binmode(FILE_HANDLE) This function puts FILE_HANDLE
into a binary mode.
- chdir( DIR_NAME) Causes your program to use DIR_NAME
as the current directory. It will return true if the change was successful,
false if not.
- chmod(MODE, FILE_LIST) This UNIX-based
function changes the permissions for a list of files. A count of the number
of files whose permissions was changed is returned. There is no DOS
equivalent for this function.
- chown(UID, GID, FILE_LIST) This
UNIX-based function changes the owner and group for a list of files. A count
of the number of files whose ownership was changed is returned. There is no
DOS equivalent for this function.
- close(FILE_HANDLE) Closes the connection between your
program and the file opened with FILE_HANDLE.
- closedir( DIR_HANDLE) Closes the connection between
your program and the directory opened with DIR_HANDLE.
- eof(FILE_HANDLE) Returns true if the next read on
FILE_HANDLE will result in hitting the end of the file or if the file
is not open. If FILE_HANDLE is not specified the status of the last
file read is returned. All input functions return the undefined value when
the end of file is reached, so you\'ll almost never need to use eof().
- fcntl(FILE_HANDLE, Implements the fcntl()
function which lets FUncTION, SCALAR) you perform various
file control operations. Its use is beyond the scope of this course.
- fileno( FILE_HANDLE) Returns the file descriptor for
the specified FILE_HANDLE.
- flock(FILEHANDLE, OPERATION) This function
will place a lock on a file so that multiple users or programs can\'t
simultaneously use it. The flock() function is beyond the scope of
this book.
- getc(FILE_HANDLE) Reads the next character from
FILE_HANDLE. If FILE_HANDLE is not specified, a character will
be read from STDIN. glob( EXPRESSION) Returns a
list of files that match the specification of EXPRESSION, which can
contain wildcards. For instance, glob( "*.pl") will return
a list of all Perl program files in the current directory.
- ioctl(FILE_HANDLE, Implements the ioctl()
function which lets FUncTION, SCALAR) you perform various
file control operations. Its use is beyond the scope of this book. For more
in-depth discussion of this function see Que\'s Special Edition Using Perl
for Web Programming.
- link(OLD_FILE_NAME, This UNIX-based function creates a
new NEW_FILE_NAME) file name that is linked to the old file name.
It returns true for success and false for failure. There is no DOS
equivalent for this function. lstat( FILE_HANDLE_OR_Returns
file statistics in a 13-element array. FILE_NAME) lstat()
is identical to stat() except that it can also return information
about symbolic links.
- mkdir(DIR_NAME, MODE) Creates a directory
named DIR_NAME. If you try to create a subdirectory, the parent
must already exist. This function returns false if the directory can\'t be
created. The special variable $! is assigned the error message.
- open(FILE_HANDLE, EXPRESSION) Creates a link
between FILE_HANDLE and a file specified by EXPRESSION.
- opendir( DIR_HANDLE, DIR_NAME) Creates a link
between DIR_HANDLE and the directory specified by DIR_NAME.
opendir() returns true if successful, false otherwise.
- pipe(READ_HANDLE), Opens a pair of connected pipes
like the WRITE_HANDLE) corresponding system call. Its use is beyond
the scope of this book. For more on this function see Que\'s Special
Edition Using Perl for Web Programming. print FILE_HANDLE (LIST)
Sends a list of strings to FILE_HANDLE. If FILE_HANDLE is
not specified, then STDOUT is used.
- printf FILE_HANDLESends a list of strings in a format specified
by (FORMAT, LIST) FORMAT to FILE_HANDLE.
If FILE_HANDLE is not specified, then STDOUT is used.
- read(FILE_HANDLE, BUFFER, Reads bytes from
FILE_HANDLE starting at LENGTH,LENGTHOFFSET)
OFFSET position in the file into the scalar variable called BUFFER.
It returns the number of bytes read or the undefined value.
- readdir(DIR_HANDLE) Returns the next directory entry
from DIR_HANDLE when used in a scalar context. If used in an array
context, all of the file entries in DIR_HANDLE will be returned in
a list. If there are no more entries to return, the undefined value or a
null list will be returned depending on the context.
- readlink(EXPRESSION) This UNIX-based function returns
that value of a symbolic link. If an error occurs, the undefined value is
returned and the special variable $! is assigned the error message.
The $_ special variable is used if EXPRESSION is not
specified.
- rename(OLD_FILE_NAME, Changes the name of a file. You
can use this NEW_FILE_NAME) function to change the directory where
a file resides, but not the disk drive or volume.
- rewinddir(DIR_HANDLE) Resets DIR_HANDLE so
that the next readdir() starts at the beginning of the directory.
- rmdir(DIR_NAME) Deletes an empty directory. If the
directory can be deleted it returns false and $! is assigned the
error message. The $ special variable is used if DIR_NAME
is not specified.
- seek(FILE_HANDLE, POSITION, Moves to
POSITION in the file connected to WHEncE) FILE_HANDLE.
The WHEncE parameter determines if POSITION is an offset
from the beginning of the file ( WHEncE=0), the current position in
the file (WHEncE=1), or the end of the file (WHEncE=2).
- seekdir(DIR_HANDLE, POSITION) Sets the
current position for readdir(). POSITION must be a value
returned by the telldir() function.
- select(FILE_HANDLE) Sets the default FILE_HANDLE
for the write() and print() functions. It returns the
currently selected file handle so that you may restore it if needed.
- sprintf(FORMAT, LIST) Returns a string whose
format is specified by FORMAT.
- stat( FILE_HANDLE_OR_Returns file statistics in a
13-element array. FILE_NAME)
- symlink(OLD_FILE_NAME, This UNIX-based function
creates a new NEW_FILE_NAME) file name symbolically linked to the
old file name. It returns false if the NEW_FILE_NAME cannot be
created.
- sysread(FILE_HANDLE, BUFFER, Reads LENGTH
bytes from FILE_HANDLE starting LENGTH,OFFSET) at
OFFSET position in the file into the scalar variable called
BUFFER. It returns the number of bytes read or the undefined value.
- syswrite(FILE_HANDLE, BUFFER, Writes
LENGTH bytes from FILE_HANDLE starting LENGTH,
OFFSET) at OFFSET position in the file into the scalar
variable called BUFFER. It returns the number of bytes written or
the undefined value.
- tell(FILE_HANDLE) Returns the current file position
for FILE_HANDLE. If FILE_HANDLE is not specified, the file
position for the last file read is returned.
- telldir(DIR_HANDLE) Returns the current position for
DIR_HANDLE. The return value may be passed to seekdir() to
access a particular location in a directory.
- truncate(FILE_HANDLE, LENGTH) Truncates the
file opened on FILE_HANDLE to be LENGTH bytes long.
- unlink(FILE_LIST) Deletes a list of files. If
FILE_LIST is not specified, then $ will be used. It returns
the number of files successfully deleted. Therefore, it returns false or 0
if no files were deleted.
- utime( FILE_LIST) This UNIX-based function changes the
access and modification times on each file in FILE_LIST.
- write(FILE_HANDLE) Writes a formatted record to
FILE_HANDLE.
Reading Directories
Perl has several functions to operate on functions the opendir(), readdir()
and closedir() functions are a common way to achieve this.
opendir(DIR_HANDLE,"directory") returns a Directory handle
-- just an identifier (no $) -- for a given directory to be opened for
reading.
Note that exact or subpath directories may be required.
BE WARNED: Macintosh directory paths are denoted by : in this
instance UNIX directory paths are denoted by /.
readdir(DIR_HANDLE) returns a scalar (string) of the basename
of the file (no sub directories (: or /))
closedir(DIR_HANDLE) simply closes the directory.
Therefore to list all files a given directory we can do the following
readdir.pl:
opendir(IDIR,"Maclab:Internet") || die "NO SUCH Directory: Images";
while ($file = readdir(DIR) ) { print " $file\n"; } closedir(DIR);
The above reads a folder Internet on the top level of the Maclab
hard disk.
On UNIX we may do:
opendir(IDIR,"./Internet") || die "NO SUCH Directory: Images";
while ($file = readdir(DIR) ) { print " $file\n"; } closedir(DIR);
The above reads a sub-directory Internet assumed to be located in
the same directory from where the Perl script has been run.
One further example to alphabetically list files is alpha.pl:
opendir(IDIR,"./Internet") || die "NO SUCH Directory: Images";
foreach $file ( sort readdir(DIR) ) { print " $file\n"; } closedir(DIR);
Reading and Writing
Files
We have just introduced the concept of a Directory Handle for referring to a
Directory on disk.
We now introduce a similar concept of File Handle for referring to a File on
disk from which we can read data and to which we can write data.
Similar ideas of opening and closing the files exist.
You use the open() operator to open a file (for reading):
open(FILEHANDLE,"file_on_device");
To open a file for writing you must use the ``>\'\' symbol in the
open() operator:
open(FILEHANDLE,">outfile");
Write always starts writing to file at the start of the file. If the file
already exists and contains data. The file will be opened and the data
overwritten.
To open a file for appending you must use the ``>>\'\' symbol in
the open() operator:
open(FILEHANDLE,">>appendfile");
The close() operator closes a file handle:
close(FILEHANDLE);
To read from a file you simply use the command
which reads one line at a time from a FILEHANDLE and stores it in a
special Perl variable $_.
For example, read.pl:
open(FILE,"myfile") || die "cannot open file"; while() { print $_; # echo line read } close(FILE);
To write to a file you use the Print command and simply refer to the
FILEHANDLE before you format the output string via:
print FILEHANDLE "Output String\n";
Therefore to read from one file infile and copy line by line to
another outfile we could do readwrite.pl:
open(IN,"infile") || die "cannot open input file"; open(OUT,"outfile") || die "cannot open output file"; while() { print OUT $_; # echo line read } close(IN); close(OUT);
Binary Files
When you need to work with data files, you will need to know what binary mode
is. There are two major differences between binary mode and text mode:
- In DOS and Windows, line endings are indicated by two characters-the
newline and carriage return characters. When in text mode, these characters
are input as a single character, the newline character. In binary mode, both
characters can be read by your program. UNIX systems only use one character,
the newline, to indicate line endings.
- In DOS and Windows, the end of file character is 26. When a byte with
this value is read in text mode, the file is considered ended and your
program cannot read any more information from the file. UNIX considers the
end-of-file character to be 4. For both operating systems, binary mode will
let the end-of-file character be treated as a regular character.
Note The examples in this section relate to the DOS operating system.
In order to demonstrate these differences, we\'ll use a data file called
BINARY.DAT with the following contents:
01 02 03
First, we\'ll read the file in the default text mode.
We procede as follows:
- Initialize a buffer variable.
- Both read() and sysread() need their
buffer variables to be initialized before the function call is executed.
- Open the BINARY.DAT file for reading.
- Read the first 20 characters of the file using the read()
function.
- Close the file.
- Create an array out of the characters in the $buffer
variable and iterate over that array using a foreach loop.
- Print the value of the current array element in hexadecimal format.
- Print a newline character. The current array element is a newline
character.
The Perl to do this is, binary1.pl:
$buffer = ""; open(FILE, ">binary.dat");
read(FILE, $buffer, 20, 0);
close(FILE);
foreach (split(//, $buffer)) {
printf("%02x ", ord($_));
print "\n" if $_ eq "\n";
}
This program displays:
30 31 0a
30 32 0a
30 33 0a
This example does a couple of things that haven\'t been met before. The
Read() function is used as an alternative to the line-by-line input done
with the diamond operator. It will read a specified number of bytes from the
input file and assign them to a buffer variable. The fourth parameter specifies
an offset at which to start reading. In this example, we started at the
beginning of the file.
The split() function in the foreach loop breaks a string
into pieces and places those pieces into an array. The double slashes indicate
that each character in the string should be an element of the new array.
Once the array of characters has been created, the foreach loop
iterates over the array. The printf() statement converts the ordinal
value of the character into hexadecimal before displaying it. The ordinal
value of a character is the value of the ASCII representation of the character.
For example, the ordinal value of \'0\' is 0x30 or 48.
The next line, the print statement, forces the output onto a new line if the
current character is a newline character. This was done simply to make the
output display look a little like the input file.
Now, let\'s read the file in binary mode and see how the output is changed.
The new code is as follow, binary2.pl:
$buffer = "";
open(FILE, "
binmode(FILE);
read(FILE, $buffer, 20, 0);
close(FILE);
foreach (split(//, $buffer)) {
printf("%02x ", ord($_));
print "\n" if $_ eq "\n";
}
This program displays:
30 31 0d 0a
30 32 0d 0a
30 33 0d 0a
When the file is read in binary mode, you can see that there are really two
characters at the end of every line-the linefeed and newline characters.
Getting File
Statistics
The file test operators can tell you a lot about a file, but sometimes you
need more. In those cases, you use the stat() or lstat()
function. The stat() returns file information in a 13-element array.
You can pass either a file handle or a file name as the parameter. If the file
can\'t be found or another error occurs, the null list is returned. The listing
below shows how to use the stat() function to find out information
about the EOF.DAT file used earlier in the chapter.
The perl code stat.pl is:
($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat("eof.dat");
print("dev = $dev\n");
print("ino = $ino\n");
print("mode = $mode\n");
print("nlink = $nlink\n");
print("uid = $uid\n");
print("gid = $gid\n");
print("rdev = $rdev\n");
print("size = $size\n");
print("atime = $atime\n");
print("mtime = $mtime\n");
print("ctime = $ctime\n");
print("blksize = $blksize\n");
print("blocks = $blocks\n");
In the DOS environment, this program displays:
dev = 2
ino = 0
mode = 33206
nlink = 1
uid = 0
gid = 0
rdev = 2
size = 13
atime = 833137200
mtime = 833195316
ctime = 833194411
blksize =
blocks =
Some of this information is specific to the UNIX environmen and is not
displayed here. One interesting piece of information is the $mtime
value-the date and time of the last modification made to the file. You can
interpret this value by using the following line of code:
($sec, $min, $hr, $day, $month, $year, $day_Of_Week, $julianDate, $dst) =
localtime($mtime);
If you are only interested in the modification date, you can use the array
slice notation to just grab that value from the 13-element array returned by
stat().
For example:
$mtime = (stat("eof.dat"))[9];
Notice that the stat() function is surrounded by parentheses so that
the return value is evaluated in an array context. Then the tenth element is
assigned to $mtime. You can use this technique whenever a function
returns a list.
Printing Revisited
We\'ve been using the print() function throughout this book without
really looking at how it works. Let\'s remedy that now.
The print() function is used to send output to a file handle. Most
of the time, we\'ve been using STDOUT as the file handle. Because
STDOUT is the default, we did not need to specify it. The syntax for the
print() function is: print FILE_HANDLE (LIST)
You can see from the syntax that print() is a list operator because
it\'s looking for a list of values to print. If you don\'t specify a list, then $
will be used. You can change the default file handle by using the select()
function. Let\'s take a look at this:
open(OUTPUT_FILE, ">testfile.dat");
$oldHandle = select(OUTPUT_FILE);
print("This is line 1.\n");
select($oldHandle);
print("This is line 2.\n");
This program displays:
This is line 2.
and creates the TESTFILE.DAT file with a single line in it:
This is line 1.
Perl also has the printf() function which lets you be more precise
in how things are printed out. The syntax for printf() looks like this:
printf FILE_HANDLE (FORMAT_STRING, LIST)
Like print(), the default file handle is STDOUT. The
FORMAT_STRING parameter controls what is printed and how it looks. For
simple cases, the formatting parameter looks identical to the list that is
passed to printf(). For example:
$januaryCost = 123.34;
$februaryCost = 23345.45;
printf("January = \$$januaryCost\n");
printf("February = \$$februaryCost\n");
This program displays:
january = 3.34
February = 345.45
In this example, only one parameter is passed to the printf()
function-the formatting string. Because the formatting string is enclosed in
double quotes, variable interpolation will take place just like for the
print() function.
This display is not good enough for a report because the decimal points of
the numbers do not line up. You can use the formatting specifiers shown below:
Specifier
Description
| c |
Indicates that a single character |
|
should be printed. |
| s |
Indicates that a string should |
|
be printed. |
| d |
Indicates that a decimal number |
|
should be printed. |
| u |
Indicates that an unsigned decimal |
|
number should be printed. |
| x |
Indicates that a hexadecimal number |
|
should be printed. |
| o |
Indicates that an octal number |
|
should be printed. |
| e |
Indicates that a floating point |
|
number should be printed |
|
in scientific notation. |
| f |
Indicates that a floating point number |
|
should be printed. |
| g |
Indicates that a floating point number |
|
should be printed using |
|
the most space-spacing format, either e
or f. |
The formats can be modified as follows:
Modifier
Description
| - |
Indicates that the value should be
printed left-justified. |
| # |
Forces octal numbers to be printed with
a leading zero. |
|
Hexadecimal numbers will be printed
with a leading 0x. |
| + |
Forces signed numbers to be printed
with a leading + or - sign. |
|
Pads the displayed number with zeros
instead of spaces. |
| . |
Forces the value to be at least a
certain width. |
An example use of . is: %10.3f
which means that the value will be at least 10 positions wide. And because
f is used for floating point, at most 3 positions to the right of the
decimal point will be displayed. %.10s will print a string at most 10 characters
long.
Returning to our above example, to print the cost variables using format
specifiers, we may write print.pl
$januaryCost = 123.34;
$februaryCost = 23345.45;
printf("January = \$%8.2f\n", $januaryCost);
printf("February = \$%8.2f\n", $februaryCost);
This program displays:
January = $ 123.34
February = 345.45
This example uses the f format specifier to print a floating point
number. The numbers are printed right next to the dollar sign because
$februaryCost is 8 positions width.
If you did not know the width of the numbers that you need to print in
advance, you could use the following technique:
- Create two variables to hold costs for January and February.
- Find the length of the largest number.
- Print the cost variables using variable interpolation to determine the
width of the numbers to print. Define the max() function.
In Perl we would do, printdemo.pl:
$januaryCost = 123.34;
$februaryCost = 23345.45;
$maxLength = length(max($januaryCost, $februaryCost));
printf("January = \$%$maxLength.2f\n", $januaryCost);
printf("February = \$%$maxLength.2f\n", $februaryCost);
sub max {
my($max) = shift(@_); foreach $temp (@_) {
$max = $temp if $temp > $max;
}
return($max);
}
This program displays:
January = $ 123.34
February = 345.45
While taking the time to find the longest number is more work, the result is
worth it.
|