Tagged: Python RSS

  • trung 11:14 pm on March 17, 2011 Permalink | Reply
    Tags: Concurrency, Python   

    Quick concurrent programming in Python 

    Download: github

    Even though multithreading in python is quite inefficient because of GIL, it is helpful in some cases, like scraping data. While making some scrapers for my PhD work, I have created a lightweight module that allows you to quickly implement parallel worker pattern easily. The design is sketched as follow:

    Distributor Worker Worker Worker Worker Input queue Output queue lock automatically automatically

    A distributor has many workers (threads), one input queue and one output queue. It also has a lock which the workers can use for synchronization.

    The below command initializes a 20-worker distributor:

    1
    distributor = ThreadDistributor(20)

    To add a task to distributor, use:
    distributor.add_task(task_type, input):task_type is a class which inherits from Task class. Before running, the run method of this class has to be implemented. Notice that run function must be iteratable and, hence, yields something. Make it yield None if you don’t want it to return any value to the output queue.

    The following lines define a squaring task class:

    1
    2
    3
    4
    class ComputationTask(Task):
       
        def run(self):
            yield self.inp**2;

    What you need to do is to add tasks to distributor. The distributor will automatically distribute tasks to its workers. When there is no more task, it automatically stops all workers.

    This is a simple example of computing squares of 1,000,000 integers:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    from ThreadDistributor import ThreadDistributor, Task

    class ComputationTask(Task):
       
        def run(self):
            yield self.inp**2;

       
    class InitializeTask(Task):
       
        def run(self):
            for i in xrange(1000000):    
                self.distributor.add_task(ComputationTask, i)

            yield None
           
           
    if __name__ == "__main__":
        distributor = ThreadDistributor(20)
        distributor.add_task(InitializeTask)
        distributor.run()

    Other functions:

    stop(self): stops all workers by putting STOP signal at the end of work queue. This means threads stop only when all works in the work queue (before STOP) are done.

    results(self): yields results from the output queue.

    add_worker(self, name=None): add one more worker to ThreadDistributor instance. name is the name of the new worker.

     
  • trung 11:20 am on March 17, 2011 Permalink | Reply
    Tags: Functional Programming, Python   

    Functional programming in Python cheatsheet 

    Map a function to a variable:

    1
    2
    >>> (lambda x: x*2)(3)
    6

    iff shortcut in Python:

    1
    2
    >>> (lambda x: "even" if x%2==0 else "odd")(5)
    odd

    or:

    1
    2
    >>> (lambda x: x%2==0 and "even" or "odd")(5)
    odd

    Map a function to a list:

    1
    2
    3
    4
    >>> map(lambda x: x*2, [1, 2, 3])
    [1, 4, 6]
    >>> [x*2 for x in [1, 2, 3]]
    [1, 4, 6]

    Filter a list:

    1
    2
    3
    4
    >>> filter(lambda x: x % 2 == 1, [1, 2, 3])
    [1, 3]
    >>> [x for x in [1, 2, 3] if x % 2 ==1]
    [1, 3]

    Reduce a list:

    1
    2
    >>> reduce(lambda x, y: x + y, [1, 2, 3])
    6

    Find maximum value in a list use key function:

    1
    2
    >>> max([1, 2, 3], key=lambda x: x % 3)
    2

    Sort a list using key function:

    1
    2
    >>> sorted([1, 2, 3], key=lambda x: x % 3)
    [3, 1, 2]
     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
esc
cancel