[gt-users] FASTA stream input implemented?

David Ellinghaus d.ellinghaus at ikmb.uni-kiel.de
Tue Feb 17 08:47:16 CET 2009


Dear gt-developers,

it's been a long time that I've written some code using the genometools.
And I am really impressed what is now available in the libraries for 
implementing
efficient and clean C programs!! So, I tried to implement something with 
the genometools ... :-)

@Sascha and Gordon:
Are there any functions in the genometools library that I can use for 
reading FASTA files
as a stream?
I tried using the "core/bioseq_iterator.c" functions to read FASTA files:
gt_bioseq_iterator_new()
gt_bioseq_iterator_next()
Unfortunately, I could only implement to read file after file by mapping 
the whole file into memory.
I would like to read two FASTA files (they are sorted by their 
description) in parallel by using a stream
(that means that I do not want to keep the whole files in memory),
so I can reject (single) FASTA entries on the fly if there do not exist 
two FASTA entries in the two files with the same description.
(I am working with mate-pair FASTA files from NextGenSequencing, often 
bigger than 15GB per file.)

Is something like a FASTA stream already implemented, and if not, is 
there a stream implementation which I can use
to guide my own FASTA stream implementation?

Many thanks in advance
Best regards

David

-- 
David Ellinghaus
Institute for Clinical Molecular Biology
Christian-Albrechts-University Campus Kiel
House 6, Arnold-Heller-Str.3
D-24105 Kiel, Germany

Email:  d.ellinghaus at ikmb.uni-kiel.de
Phone:  +49-(0)431-597-1963
FAX  :  +49-(0)431-597-1842



More information about the gt-users mailing list