[gt-users] gt python with (multi)processing
Gordon Gremme
gremme at gmail.com
Wed Jan 20 14:03:56 CET 2010
>> libgenometools is not thread-safe yet.
>
> However, this raises the question of how we deal with that. I understand
> this is not something very urgent, but in the long run could it be
> rewarding to have a thread-safe GenomeTools version?
Yes, definitely ;-)
> For the scripting language bindings, one could handle such issues using
> the calling language's synchronization support. However, when a lot of
> 'action' happens on the C side, using big locks on the high-level caller
> side can also result in low performance gain. It may also not always
> work reliably (e.g. in the Java/JNA bindings).
I agree that this has to be solved on the C side.
> Any ideas or opinions? Could making the C library thread-safe be worth
> the effort (and how much would that be)? And does anyone have experience
> with that?
I started making libgenometools multithreading-safe (MT-safe), thanks
for pushing the issue ;-)
The following pointers should make it easy to make the FeatureIndex MT-safe.
I haven't looked into the details, but it would probably be best to
use the Mutexes (or Read/Write locks) on the interface level to avoid
reimplementation for every FeatureIndex implementation.
- I added abstractions for threads, mutexes and read/write-locks to
core/thread.[ch]. It is all based on pthreads. This isn't complete, as
I added stuff on as-needed basis.
- If I didn't miss something, everything with a global state in
libgenometools should be MT-safe now (file allocator, symbol and
random number generator module). The memory allocator is MT-safe, if
the memory bookkeeping is not used. Making the memory bookkeeping
MT-safe is not possible without a major redesign which I consider
low-priority.
- Testing is even more important for multithreaded-code! I thought
twice the memory allocator with bookkeeping is now MT-safe before I
realized it is not possible with the current design.
- As a testing strategy, I added a -j development option to the gt
binary which can then be used in the unit tests to start parallel
threads to stress test the corresponding data structure in parallel
(gt_multithread() starts -j many threads, see core/symbol.c for an
example).
If possible, store the mutexes or read/write-locks in the data
structure itself and not globally, as in the file allocator module.
- The valgrind tool helgrind is your friend (valgrind
--tool=helgrind)! Without it I would still wonder why the memory
allocator with bookkeeping isn't MT-safe. The valgrind tool DTD might
be useful, too.
@Sascha: Could you look into making the FeatureIndex MT-safe? It
shouldn't be much work at this point as the basic infrastructure is
already there. If anything of the above is unclear or questions arise,
please ask!
Gordon
More information about the gt-users
mailing list