[gt-users] gff3 parser

Gordon Gremme gremme at gmail.com
Tue Feb 17 17:07:10 CET 2009


>> bah. i was trying to avoid becoming a c programmer.

It doesn't hurt to know some C, all major scripting languages are
written in it ;-)


>>>> should i just change the tests or do you want to keep the original behavior when
>>>> the retainids is not used?
>>> Please keep the original behaviour.
>> ok. all tests pass with this patch:
>> http://gist.github.com/62770/

Cool!


> We should maybe generally discuss whether ID renaming may be
> counterproductive in the case of multi-line features, e.g.
>
> $ bin/gt gff3 -retainids testdata/multi_feature_simple.gff3
> ##gff-version   3
> ##sequence-region   ctg123 1 1497228
> warning: feature ID "CDS1" not unique: changing to CDS1.1
> ctg123  .       gene    1000    9000    .       +       .       ID=gene1
> ctg123  .       CDS     1201    1500    .       +       0       ID=CDS1;Parent=gene1
> ctg123  .       CDS     3000    3902    .       +       0       ID=CDS1.1;Parent=gene1
>
> breaks the "same-ID" rule for the multi-line features. I am not sure
> whether this may lead to problems or not...
> Any comments?

Good point Sascha, that is a bug in the new retainids functionality.
Multi-features have to be handled explicitly (similar to the
non-retainids case) to avoid this problem.

I think after fixing this and two additonal -retainids test (one for
``normal'' features and one for multi-features) we are ready for prime
time!

Gordon


More information about the gt-users mailing list