[gt-users] FASTA stream input implemented?
David Ellinghaus
d.ellinghaus at ikmb.uni-kiel.de
Tue Feb 17 08:47:16 CET 2009
Dear gt-developers,
it's been a long time that I've written some code using the genometools.
And I am really impressed what is now available in the libraries for
implementing
efficient and clean C programs!! So, I tried to implement something with
the genometools ... :-)
@Sascha and Gordon:
Are there any functions in the genometools library that I can use for
reading FASTA files
as a stream?
I tried using the "core/bioseq_iterator.c" functions to read FASTA files:
gt_bioseq_iterator_new()
gt_bioseq_iterator_next()
Unfortunately, I could only implement to read file after file by mapping
the whole file into memory.
I would like to read two FASTA files (they are sorted by their
description) in parallel by using a stream
(that means that I do not want to keep the whole files in memory),
so I can reject (single) FASTA entries on the fly if there do not exist
two FASTA entries in the two files with the same description.
(I am working with mate-pair FASTA files from NextGenSequencing, often
bigger than 15GB per file.)
Is something like a FASTA stream already implemented, and if not, is
there a stream implementation which I can use
to guide my own FASTA stream implementation?
Many thanks in advance
Best regards
David
--
David Ellinghaus
Institute for Clinical Molecular Biology
Christian-Albrechts-University Campus Kiel
House 6, Arnold-Heller-Str.3
D-24105 Kiel, Germany
Email: d.ellinghaus at ikmb.uni-kiel.de
Phone: +49-(0)431-597-1963
FAX : +49-(0)431-597-1842
More information about the gt-users
mailing list