[gt-users] gt python with (multi)processing

Sascha Steinbiss steinbiss at zbh.uni-hamburg.de
Wed Jan 20 16:07:17 CET 2010


On 01/16/2010 04:43 PM, Brent Pedersen wrote:
>>> #import multiprocessing as processing
>>> import processing
>>> import gt
>>> p = processing.Pool(4)
>>>
>>> f = gt.FeatureIndexMemory()
>>> f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')
>>> f.add_gff3file('./testdata/encode_known_genes_Mar07.gff3')

Sorry to come back to this, but I cannot see why this script uses
multithreading. It looks very sequential as there are no tasks
distributed to the workers in the pool.

To reproduce your problem, I tried the following script (using the
parallelized map() function) with 4 workers and
testdata/encode_known_genes_Mar07.gff3 and a much larger D. melanogaster
annotation and got no segfault:

$ cat mp_fi_test.py
#!/usr/bin/env python
import processing
import gt
import sys
numthreads = 4
f = gt.FeatureIndexMemory()
p = processing.Pool(numthreads)

print p.map(f.add_gff3file, [sys.argv[1] for i in range(0,numthreads)])
print "done."

$ ./mp_fi_test.py testdata/encode_known_genes_Mar07.gff3
[None, None, None, None]
done.

This seems to work without problems repeatedly, even with a GenomeTools
version from before the MT patches were added. Very strange.

Sascha

-- 
Sascha Steinbiss
Center for Bioinformatics
University of Hamburg
Bundesstr. 43
20146 Hamburg
Germany

Email:  steinbiss at zbh.uni-hamburg.de
URL:    http://www.zbh.uni-hamburg.de/steinbiss
Phone:  +49 (40) 42838 7322
FAX:    +49 (40) 42838 7312



More information about the gt-users mailing list