Collections » Text Files »

 

Text Files

Advanced List Features Table Of Contents Defining Classes


As you know from Java, we can use files stored on a secondary device for both input and output. For example, if we want to produce a report and print that report, we need to write the output to a text file and then print the file. The data needed to produce that report, may itself be stored in a file on disk. We would have to extract the data from the file, process it, and then produce the report.

File access in Python is much simpler than in Java. Files are represented by objects and are actually a built-in type. This means no additional modules are required in order to access a file from within your program.

Opening a File

Whether we are reading from an input file or writing to an output file, we must first open the file and create a Python file object. A file is opened using the built-in file() constructor1.

infile = file( 'records.txt', 'r' )
outfile = file( 'report.txt', 'w' )
 

The function takes two string arguments. The first is the name of the file and the second is the mode where r means open for reading and w means open for writing. The file() constructor returns a reference to the newly created file object. This object is then used to access the file.

To open the same files in Java, would require the following statements

// Java files
Scanner infile = new Scanner( new File( "records.txt" ) )
PrintWriter outfile = new PrintWriter(new File("report.txt"))
 

When you are finished using the file, it should be closed. By closing the file, the system will flush the buffer and unlink the external file from the interanl Python object.

infile.close()
outfile.close()
 

As in Java, a file can not be used once it has been closed. To reuse the file, you must first reopen it.

Writing to Files

Python provides several methods for outputing data to a file. In this chapter, we are only working with text files, though, you can also use binary files in Python. The easiest way to write text to a text file is with the write() method.

outfile.write( "Student Report\n" )
outfile.write( "-" * 40 + "\n" )
 

The write() method writes the given string to the output file represented by the given file object. To output other value types, you must first convert them to strings. To format the output written to the file you can use the string format operator

outfile.write( "%4d  %6.2f\n" % idNum, avgGrade )
 

Python also allows you to output the entire contents of a string list. Consider the following example

strList = [ "Line 1", "Line 2", "and yet more" ]
txtfile = file( "sample.txt", "w" )
txtfile.writelines( strList )
close( txtfile )
 

which writes each string in the list, one per line, to the text file and produces

Line 1
Line 2
and yet more

Reading from Files

Python provides two methods for extracting data from a text file. Both of which extract the data as strings. If you need to extract other data types, then you must explicitly convert the extracted string(s).

Extracting Strings

In the following example, the readline() method is used to extract an entire line from the text file with the contents returned as a string

line = infile.readline()
 

The end of file is flaged when there is no data to be extracted. In Python, this is done by the readline() method returning an empty string ("").

infile = file( "data.txt" )
line = infile.readline()
while line != "" :
   print line
   line = infile.readline()
infile.close()
 

The readline() method leaves the newline character at the end of the string when the line is extracted from the file. The rstrip() string method can be used to strip the white space from the end.

line = infile.readline()
stripped = line.rstrip()
 

If there is no newline at the end, which can occur for the last line in the file, then rstrip() does nothing.

Extracting Characters

To read individual characters from a text file, simply pass an integer value to the readline() method indicating the number of characters to be extracted

ch = infile.readline( 1 )  # read a single character
 

Extracting Multiple Lines

Python provides a convient method for extracting the entire contents of a file and storing it into a string list

lines = infile.readlines()
 

Consider the following example program which produces a double spaced version of the text file myreport.txt by inserting a blank line between each existing line.

Program: dblspace.py
# dblspace.py
#
# Create a double spaced version of a given text file.

# Open the input file and extract all of the lines.
infile = file( "myreport.txt", "r" )
fileInput = infile.readlines()
infile.close()

# Open the output file and print each string appended with a \n.
outfile = file( "double.txt", "w" )
i = 0
while i < len( fileInput ) :
   outfile.write( fileInput[ i ] + "\n" )
   i = i + 1
   
outfile.close()
 

File Iterator

An alternative approach to processing the entire contents of a text file is with the use of the file iterator. Python provides an iterator that can be used as part of a for loop. The following is a modified version of the dblspace.py program presented in the previous section

# Open the input and output files.
infile = file( "myreport.txt", "r" )
outfile = file( "double.txt", "w" )

# Iterate over each line in the file.
for line in infile:
   outfile.write( line + "\n" )

# Close the two files.
infile.close()
outfile.close()
 

In this example, each iteration of the loop causes the nxt line in the file to be extracted and stored in the line variable.

Processing Mixed Types

All of our previous examples delt with the extraction of strings from a text file. But what if we need to extract numeric values? Python only provides methods for extracting strings. To extract other data types, we must handle the conversions explicitly. Consider the following sample text file pertaining to student data for a given course.

Computer Programming I
100 
Smith, John 
92.4
208 
Roberts, Jane
88.05
334 
Green, Patrick
76.35

The first line in the file is the name of the course. The remaining lines contain three student records. Each record is spread over three lines: the first line contains the student’s identification number; the second line the student’s name and the last that student’s average grade for the course. Suppose we want to extract this data and produce a report similar to the following

STUDENT REPORT
Computer Programming I
----------------------------------------
 100  Smith, John            92.40
 208  Roberts, Jane          88.05
 334  Green, Patrick         76.35
----------------------------------------
Average Grade                85.60

We have no alternative but to extract each line of the data file as a string. But we need to treat the grades as real values in order to computer the average grade for the course. We can accomplish this the same as we did with user interaction; typecast the strings to the appropriate data type.

The following program is an implementation of a solution which extracts the student data to produce the report illustrated above.

# gradereport.py
#
# Extracts data from a text file containing student records and
# produces a grade report.

# Use constants for the filenames.
INPUT_FILE = "coursegrades.txt"
OUTPUT_FILE = "gradereport.txt"

  # Open the two files.
studentFile = file( INPUT_FILE, "r" )
reportFile = file( OUTPUT_FILE, "w" )

  # Extract the course name.
courseName = studentFile.readline();

  # Print the report header.
reportFile.write( "STUDENT REPORT\n" )
reportFile.write( courseName )
reportFile.write( "-" * 40 + "\n" )

  # Initialize the two running totals.
gradeTotal = 0
numStudents = 0

  # Process each record in the grade file.
studentId = studentFile.readline()
while studentId != "" :         
   
     # Extract the other two parts of the record.
   studentName = studentFile.readline()
   studentGrade = float( studentFile.readline() )
   
     # Add this grade to the running total.
   gradeTotal += studentGrade
   numStudents += 1

     # Print the output for this student.
   output = "%4s  %-20s  %6.2f\n" % \
            ( studentId.rstrip(),
              studentName.rstrip(),
              studentGrade )
   reportFile.write( output )
     
     # Extract the next record.
   studentId = studentFile.readline()
   
 # Compute the average grade.
avgGrade = gradeTotal / float( numStudents )

 # Print the footer.
reportFile.write( "-" * 40 + "\n" )
reportFile.write( "%-26s  %6.2f\n" % ("Average Grade", avgGrade) )

 # Close the two files.
studentFile.close()
reportFile.close()
 

To extract mixed type data stored on the same input line, we must first split or tokenize the string into individual parts using the split() method of the string class. Consider the following code segment

aString = "12 45.5 abc 9"
strList = aString.split()
print strList
 

which produces

['12', '45.5', 'abc', '9']

The split() method splits or tokenizes a string into substrings and stores the results in a string list. By default, the string is split using whitespace characters as the delimiter. You can also specific a set of delimiters as a argument

ssn = "412-45-8900"
parts = ssn.split( "-" )
 



Advanced List Features Table Of Contents Defining Classes

© 2006 - 2008: Rance Necaise - Page last modified on September 17, 2006, at 06:32 PM