[gt-users] gff3 parser

Brent Pedersen bpederse at gmail.com
Mon Feb 2 22:50:55 CET 2009


On Fri, Jan 30, 2009 at 5:08 AM, Gordon Gremme <gremme at gmail.com> wrote:
>>> I agree that it would be more intuitive if the IDs would be retained
>>> as much as possible.
>>> Just created a ticket for it, but I probably won't have time to
>>> implement it myself.
>>>
>>> If someone want's to tackle it, just let me know and I give the
>>> corresponding pointers on where to look in the code. It's a good task
>>> to get started with the codebase.
>>
>> i'll give it a try then, if you show me where to look.
>
> Ok, cool!
>
> All the action happens in the GtGFF3Visitor (gff3_visitor.[ch]) which
> is employed by the GtGFF3OutStream to show all GenomeNodes flowing
> through the it.
>
> The GtFeatureNodes are processed by the method
> gff3_visitor_feature_node(). It is called once for each top-level
> feature which has all children attached to it. The parent-child
> relationship is stored explicitly, but the original ID attribute is
> stored in the attributes.
> Therefore, you can still get it with gt_feature_node_get_attribute(fn, "ID").
>
> There is a special case you have to consider, the so-called multi-features.
> This are features which span multiple lines, but have the same ID.
> Each such multi-feature has a 'representative' which can quite useful.
>
> To store IDs which have already been used, a GtCstrTable could be
> useful. To handle ID clashes, a naming scheme has to be introduced.
> Something like: If an ID was already used, append .2 (if that was also
> used, .3 instead and so forth). To check whether an ID ends with a
> number according to the chosen naming scheme, gt_grep() might be
> helpful.
>
> It would be great, if the old behaviour would still be possible via an
> option to the GtGFF3OutStream (analog to
> gt_gff3_out_stream_set_fasta_width()).
>
> If you encounter problems, please ask.
>
> Gordon
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
>

ok. i guess and checked my way to a start. diff here:
http://gist.github.com/57121
there are a couple of Q: s in there that i'm not sure of.

i didnt keep the old behavior, but can probably figure that out. just
want to get an idea
of whether this is the direction you had in mind. i'm not sure it is
because i didnt do any of the
stuff you mention in the 2nd to last paragraph above.

any direction welcomed.
-brent


More information about the gt-users mailing list