[gt-users] gt python with (multi)processing
Sascha Steinbiss
steinbiss at zbh.uni-hamburg.de
Wed Jan 20 16:07:17 CET 2010
On 01/16/2010 04:43 PM, Brent Pedersen wrote:
>>> #import multiprocessing as processing
>>> import processing
>>> import gt
>>> p = processing.Pool(4)
>>>
>>> f = gt.FeatureIndexMemory()
>>> f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')
>>> f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')
Sorry to come back to this, but I cannot see why this script uses
multithreading. It looks very sequential as there are no tasks
distributed to the workers in the pool.
To reproduce your problem, I tried the following script (using the
parallelized map() function) with 4 workers and
testdata/encode_known_genes_Mar07.gff3 and a much larger D. melanogaster
annotation and got no segfault:
$ cat mp_fi_test.py
#!/usr/bin/env python
import processing
import gt
import sys
numthreads = 4
f = gt.FeatureIndexMemory()
p = processing.Pool(numthreads)
print p.map(f.add_gff3file, [sys.argv[1] for i in range(0,numthreads)])
print "done."
$ ./mp_fi_test.py testdata/encode_known_genes_Mar07.gff3
[None, None, None, None]
done.
This seems to work without problems repeatedly, even with a GenomeTools
version from before the MT patches were added. Very strange.
Sascha
--
Sascha Steinbiss
Center for Bioinformatics
University of Hamburg
Bundesstr. 43
20146 Hamburg
Germany
Email: steinbiss at zbh.uni-hamburg.de
URL: http://www.zbh.uni-hamburg.de/steinbiss
Phone: +49 (40) 42838 7322
FAX: +49 (40) 42838 7312
More information about the gt-users
mailing list