faqts : Computers : Programming : Languages : Python : Language and Syntax : "New Style" (2.2) classes

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

2 of 2 people (100%) answered Yes
Recently 2 of 2 people (100%) answered Yes

Entry

Example of subclassing from a type, iterators, generators

Dec 14th, 2001 13:00
Wolfgang Lipp, Michael Chermside,


"""
Here is a script that provides a simple class, cycStr, that acts just like a string, except that
when you iterate over it, you cycle, possibly several times. The loop
    for c in cs:
        print c,
will give the following results for these instances:
                                # five elements:
    cs = cycStr( 'ab|', 5 )     # --> a b | a b
                                # five full cycles:
    cs = cycStr( 'ab|', -5 )    # --> a b | a b | a b | a b | a b |
                                # one full cycle:
    cs = cycStr( 'ab|' )        # --> a b |
A special case is the infinitely long cyclic string, which results from passing None to the
constructor:
    cs = cycStr( 'ab|', None )      # --> a b | a b | a b | a b | a ...
Therefore, the second argument passed to the constructor, the 'cyclic length', controls how long
the sequence will look like when being iterated over.
The definition of cycStr is very short, the main trick is actually put into the module function
cycle() (for which see below):
    class cycStr( str ):
        def __new__( cls, data, cyclen = NA ):              #(1)
            return str.__new__( cls, data )
        def __init__( self, data, cyclen = NA ):            #(2)
            if cyclen is NA:
                cyclen = len( data )
            self.cyclen = cyclen
        def __iter__( self ):                               #(3)
            return cycle( str( self ), self.cyclen )
The following points may be noted:
    (1) Method __new__() has a signature like __init__(), it returns the result of calling
        __new__() of str and is called *before* __init__(); its first argument is not the
        instance (which doesn't exist at this point), but the class cycStr.
    (2) NA in __init__() is an instance of an empty class and only serves as a magic value to
        distinguish a missing second argument from an explicit None. -- cyclen, cyclic length,
        is detailed below.
    (3) __iter__() is called whenever an iterator is wanted from the instance. Be cs be an
        instance of cycStr, then __iter__() will be called either implicitly (in a for-x-in-cs
        like situation) or explicitly (when iter(cs) is called). The __iter__() function is then
        responsible for returning an iterator; since a generator is a kind of an iterator and
        cycle() is a generator, cycle() is a valid result here.
Now, the interesting part is really the generator function. We want a generator to iterate over
v elements of sequence s -- if v is greater than the length of s, we want to start over from the
first element in s whenever we pass the last one. -- We start out like this:
    def cycle1( s, v ):
        count = 0
        it = iter( s )
        while v is None or count < v:
            count += 1
            yield it.next()
    s = 'abcdef'
    v = 10
    for e in cycle1( s, v ):
        print e,
This function only gives us maximally as many elements as there actually are in s. What we can
do, then, is to catch the StopIteration (which is generated when count exceeds len(s) and went
unnoticed, since the for-in-loop silently stops when encountering the exception) and 'rewind' the
iterator, like this:
    def cycle2( s, v ):
        count = 0
        it = iter( s )
        while v is None or count < v:
            try:
                count += 1
                yield it.next()
            except StopIteration:
                it.rewind()
This solution, however, is not possible, since iterators do generally not have a method
rewind(). It would be nice if we could retrieve the original sequence from the iterator and
build a new iterator from the sequence, but I don't see this is a possibility. In this concrete
situation, since we know s anyway, its possible to get away with this:
    def cycle3( s, v ):
        count = 0
        it = iter( s )
        while v is None or count < v:
            try:
                count += 1
                yield it.next()
            except StopIteration:
                it = iter( s )
This gives us 'a b c d e f a b c' -- almost perfect, except for the one missing element. The solution
is either to count only successful calls to method next(), or to subtract one from count in case
of an exception; I think the first solution is better:
    def cycle4( s, v ):
        count = 0
        it = iter( s )
        while v is None or count < v:
            try:
                R = it.next()
                count += 1
                yield R
            except StopIteration:
                it = iter( s )
This, in fact, gives 'a b c d e f a b c d', ten elements,when called with
    for e in cycle4( s, 10 )
The actual code used here is but a small optimization: adding UnboundLocalError to the
exceptions caught means we can have a single line that defines our local variable it, and the
call to cyclen2virlen() means we can pass the more general concept of a cyclic length instead of
a virtual length to the generator.
The transition from cyclic length cl of a sequence s to its virtual length vl and the
relationship to real length rl (the result of calling len(sequence)) of a cyclic sequence is
defined as follows:
    --  If a sequence's cl is negative, then its vl equals minus cl times the sequence's rl,
        resulting in so and so many cycles over the *entire* sequence.
    --  If a sequence's cl is positive or zero, then its vl equals its cl, resulting in a cycle
        over so and so many *elements* (starting over from first element when passing past the
        last (real) element).
    --  If a sequence's cl is None, then its vl is interpreted as being indefinitely large,
        yielding the same result as a -- practically impossible -- infinitely large positive or
        negative cl.
This is the conversion done by cyclen2virlen():
    cyclen2virlen('abc',-5) --> 15      # 5 full cycles over 5 * 3 == 15 elements
    cyclen2virlen('abc') --> 3          # real length of one cycle of 'abc')
    cyclen2virlen('abc',5) --> 5        # 5 elements
    cyclen2virlen('abc',None) --> None  # symbol for infinetely many cycles of 'abc')
"""
#   from __future__ import nested_scopes
from __future__ import generators
#   from __future__ import division
def cyclen2virlen( seq, cyclen = 1 ):
    """Compute 'virtual length' (length of a sequence in terms of its elements) from 'cyclic length',
    and return it."""
    if cyclen is not None and cyclen < 0:
        return cyclen * -len( seq )
    return cyclen
def cycle( seq, cyclen = 1 ):
    """Generator that, given a sequence and cyclic length, returns the next element from seq,
    starting over from the sequence start when having passed beyond its end, until sequence is
    exhausted (if ever)."""
    #   Start count with zero elements;
    #   convert cyclic length to real length:
    count = 0
    virlen = cyclen2virlen( seq, cyclen )
    while virlen is None or count < virlen:
        try:
            #   Try to fetch next element from sequence; if successful,
            #   count that element, and yield it:
            R = it.next()
            count += 1
            yield R
        except ( UnboundLocalError, StopIteration ):
            #   Create a fresh iterator whenever iterator is non-existant
            #   or we have run off the end of sequence:
            it = iter( seq )
#   Helper class for default arguments:
class NA: pass
NA = NA()
class cycStr( str ):
    """Cyclic string class that has +cycle()+ as its iterator. Creating an instance and omitting
    explicit cyclic length +cyclen+ is equal to creating it with the real length of the sequence
    as second argument."""
    def __init__( self, data, cyclen = NA ):
        if cyclen is NA:
            cyclen = len( data )
        self.cyclen = cyclen
        #   #   Next line necessary???
        #   str.__init__( self, data )
    def __new__( cls, data, cyclen = NA ):
        return str.__new__( cls, data )
    def __iter__( self ):
        return cycle( str( self ), self.cyclen )
if __name__ == '__main__':
    def show( instance ):
        """This would be a simple for-c-in-instance loop most of the time, but we want to catch
        infinitely long cyclic things here."""
        print '-' * 25
        print '~.cyclen == %s' % instance.cyclen
        it = iter( instance )
        maxcount = len( instance ) * 500
        count = 0
        while 1:
            try:
                assert count < maxcount
                print it.next(),
                count += 1
            except StopIteration:
                break
            except AssertionError:
                print '... and so on and on...'
                break
        print
    show( cycStr( 'ab|',    5 ) )
    show( cycStr( 'ab|',   -5 ) )
    show( cycStr( 'ab|'       ) )
    show( cycStr( 'ab|', None ) )
    s = cycStr( 'foo', None )