[gt-users] some questions

Giorgio Gonnella ggonnell at yahoo.it
Wed Jul 29 14:31:25 CEST 2009


Thank you for answering so quickly. So, now I am writing a tool that I named "gt simreads" to simulate a set of sequencing reads for a given sequence. Maybe a combination of tools (shredder, filter, mutate) could already almost do what I am going to do, but it is quite complicated, so I think this could be useful.

I thought to make the tool work as follow:
- input: a sequence
- output: newline separated list of fragments (the simulated reads)
- options: fragment lenght (-lenght) or lenght range (-minlenght/-maxlenght), number of fragments (-num)

And here are my questions, regarding conventions in gt:
- Are all tools that accept a sequence as input actually accepting multiple sequences? At the moment I used GtBioSeqIterator, but I don't really know what should be the expected behaviour for a genometools tool if the input is multiple (concatenate input sequences before starting? return "-num" sequences for each input sequence?). Or should I better accept only a single sequence as input?
- Is the output format as written up here acceptable for a genometools tool?
- I was thinking to implement an error rate option, is it ok (i.e. mutate sequences inside the tool) or is it better to chain the tool with "gt mutate"?

Giorgio



      


More information about the gt-users mailing list