[gt-users] some questions
Giorgio Gonnella
ggonnell at yahoo.it
Sat Aug 8 15:58:37 CEST 2009
> I think accepting multiple sequences would be the better solution. [...]
> you should should architect around a set-like object like the GtBioseq.
Thank you for the answer. I decided then for the following, that should approximate
the reality of sequencing afaik: reads can come from any of the input sequences,
(provided they are long enough) and the probability of a read to come from a certain sequence
is proportional to the sequence lenght.
> I would not manually concatenate sequences in memory, it severely
> limits the amount of sequence data you can process at once.
OK. That's fine. I have another question regarding this: currently for each iteration
(eg to simulate 1000 reads, 1000 times), a Bioseq object is instantiated, after
one of the input files is selected randomly (as descibed before). This means
however that every time the file must be read, etc, even if it was already selected
in a previous iteration. Would it be better to instead keep in memory a collection
of Bioseq objects for each input file and then choose one of them
or would that force the sequences themselves to stay in memory and i.e. bad?
> Please use the FASTA-format.
Done it. Thank you.
> it makes sense to mutate the sequences inside the tool.
OK, then I will implement it, using 'gt mutate' as template. I did not implement it yet, I will make it later.
> Of course, you should reuse the code used by `gt mutate` to mutate the
> sequences (and extend it if necessary).
>
> Gordon
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
More information about the gt-users
mailing list