faqts : Computers : Programming : Languages : Python : Snippets

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

2 of 20 people (10%) answered Yes
Recently 0 of 10 people (0%) answered Yes


ASCII delimited files

Jul 5th, 2000 10:02
Nathan Wallace, Hans Nowak, Snippet 275, Thomas A. Bryan

Packages: text.delimited_files
> Is there any function or module available for parsing ASCII delimited files,
> before I go and re-invent the wheel writing my own.
I'm not sure exactly what you're looking for.  I've appended something 
that I was playing with one day.  It was just a way to create an object 
easily that could parse and validate ascii, delimited files.  
It might be terribly slow: I never timed it.  
Basically, you create a DelimFldParser object with a list of 
DelimParserField subclasses and a delimiter.  Each 
DelimParserField subclass knows how to handle a specific "column" 
of the ASCII file.  The DelimFldParser is then handed a file 
object (anything with a readline() method, really), and it 
returns a list of lists.  The inner list is a list of values 
returned by the DelimFldParser objects for a specific line.
Oh, I also assume that each line of the file has the same number 
of "columns."  
I implemented three sample DelimParserField objects.  One converts 
ascii values to floats.  Another checks that the field value is 
in a specified list of values.  The last is designed to perform 
a verification of field values based on a regular expression.
I wrote this thing to read and verify files before importing them 
into a database.  I never really had much chance to use it, though.
I would love to see someone optimize this thing because it makes the 
task of building a parser for a new format of an ASCII file very 
simple.  It would be great, for example, for dealing with delimited 
data exported from a database or for parsing a delimited file 
for for import into a database.
import string
import re
class DelimFldParser:
    def __init__(self, fields, delimiter=None):
        """fields is an ordered list of DelimParserField instances"""
        self.delimiter = delimiter
        self.fields = fields
        self.numCols = len(fields)
        self.cols = []
        for el in fields:
    def parseLine(self, line):
        list = string.split(line, self.delimiter)
        assert len(list) == self.numCols, \
            "The following line doesn't have enough  fields.\n%s" % line
        for idx in range(self.numCols):
            list[idx] = self.fields[idx].convert(list[idx])
        return list
    def parseFile(self, fileObj):
        data = []
        line = fileObj.readline()
        while line:
            line = fileObj.readline()
        return data
    def __str__(self):
        s = '<DelimFldParser: '
        for el in self.fields:
            s = s + el.name + ', '
        s = s[:-2] + ' >'
        return s
class DelimParserField:
    def __init__(self, name):
        self.name = name
    def convert(self,value):
        return value
    def verify(self,value):
class EnumField(DelimParserField):
    def __init__(self,name,validValues):
        self.validValues = validValues[:]
    def verify(self,value):
        assert value in self.validValues, \
            "%s not in %s on the following line" % (value,self.validValues)
class NumericRngField(DelimParserField):
    def __init__(self,name,start,stop):
        self.min = start
        self.max = stop
    def convert(self,value):
        return float(value)
    def verify(self,value):
        assert value >= self.min and value <= self.max, \
          "%s is not between %s an d %s" % (value,self.min,self.max)
class RegexpField(DelimParserField):
    def __init__(self,name,regexp,flags=None):
        if flags:
            self.re = re.compile(regexp,flags)
            self.re = re.compile(regexp)
    def verify(self,value):
        assert self.re.search(value), \
           "%s does not match the pattern '%s'" % (value,
if __name__ == '__main__':
    fh = open('delimParser.test','w')
    fh.write("""a 10 9/10/1999
b 3.5 10/11/1974
c 5.7 09/10/1974
    fh = open('delimParser.test','r')
    myParser = DelimFldParser((EnumField('Enum',('a','b','c')),
    print myParser
    output = myParser.parseFile(fh)
    print output