[gt-users] gt python with (multi)processing
Sascha Steinbiss
steinbiss at zbh.uni-hamburg.de
Fri Jan 22 18:59:51 CET 2010
On 01/20/2010 06:12 PM, Brent Pedersen wrote:
>> Sorry to come back to this, but I cannot see why this script uses
>> multithreading. It looks very sequential as there are no tasks
>> distributed to the workers in the pool.
> hi, yes it is just a dummy script to demonstrate the problem. that's the
> minimum require to cause problems on my machine.
Hmm. I can't reproduce that. This is what I get with a version before
the threadsafety patches (commit 02345ac73f9b...):
$ cat seq_fi_test.py
#!/usr/bin/env python
import processing
import gt
p = processing.Pool(4)
f = gt.FeatureIndexMemory()
f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')
f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')
print f.get_seqids()
$ ./seq_fi_test.py
['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16',
'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr5', 'chr6',
'chr7', 'chr8', 'chr9', 'chrX']
This works reliably and repeatedly with several input files.
> regardless, i'll wait as the new threadsafe stuff
> progresses.
There are some news at this front. The next examples are all using my
thread-safe FeatureIndex class.
- Firstly, I am afraid that the Python processing package is not the
right tool to test the thread-safety of GenomeTools, since it does not
wrap threads, but rather full-fledged processes
(http://pypi.python.org/pypi/processing).
Which means that using the same FeatureIndex object transparently across
multiple pool workers will not work as intended, as I had to find out:
$ cat mp_fi_test.py
#!/usr/bin/env python
import processing
import gt
import sys
numthreads = 4
f = gt.FeatureIndexMemory()
p = processing.Pool(numthreads)
print p.map(f.add_gff3file, [sys.argv[1] for i in range(numthreads)])
print f.get_seqids()
$ ./mp_fi_test.py testdata/encode_known_genes_Mar07.gff3
[None, None, None, None]
[]
- Secondly, I used the following script to test whether threading works
from the Python bindings:
$ cat mp_fi_test2.py
#!/usr/bin/env python
import threading
import gt
import sys
f = gt.FeatureIndexMemory()
def get_nof_features_in_index(index):
return \
reduce(lambda acc, id: acc + len(index.get_features_for_seqid(id)),
index.get_seqids(), 0)
class TestThread(threading.Thread):
def __init__(self, index, file, number):
threading.Thread.__init__(self)
self.fi = index
self.file = file
self.number = number
def run(self):
self.fi.add_gff3file(self.file)
print ("%d finished, index now has %d features " + \
"in %d sequences") % \
(self.number,
get_nof_features_in_index(self.fi),
len(self.fi.get_seqids()))
threads = []
for i in range(4):
t = TestThread(f, sys.argv[1], i)
threads.append(t)
t.start()
for thread in threads:
thread.join()
print f.get_seqids()
print "%d features in index altogether" % get_nof_features_in_index(f)
$ ./mp_fi_test2.py testdata/encode_known_genes_Mar07.gff3
2 finished, index now has 2991 features in 20 sequences
1 finished, index now has 7790 features in 20 sequences
3 finished, index now has 10624 features in 20 sequences
0 finished, index now has 11964 features in 20 sequences
['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16',
'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr5', 'chr6',
'chr7', 'chr8', 'chr9', 'chrX']
11964 features in index altogether
which id correct and now also works reliably and well with the new C
patches (in my 'mt-featureindex' branch on github).
I will complete the unit tests (which are quite tedious to write in
order not to miss anything) and then push a new version into this branch
(and announce it here too).
> also, i notice in recent commits, you use the @function.setter
> decorator for the
> range module. that's cool syntax i hadn't used, but it is only
> available in >=python 2.6
> i'm attaching a patch that works with (at least) 2.4 and 2.5
> as well.
Thank you very much. It's now in the master.
> -brent
Sascha
--
Sascha Steinbiss
Center for Bioinformatics
University of Hamburg
Bundesstr. 43
20146 Hamburg
Germany
Email: steinbiss at zbh.uni-hamburg.de
URL: http://www.zbh.uni-hamburg.de/steinbiss
Phone: +49 (40) 42838 7322
FAX: +49 (40) 42838 7312
More information about the gt-users
mailing list