[gt-users] gff3 parser

Brent Pedersen bpederse at gmail.com
Thu Feb 12 18:53:20 CET 2009


On Thu, Feb 12, 2009 at 3:27 AM, Gordon Gremme <gremme at gmail.com> wrote:
>> i'm doing something wrong with the make_unique_id function because
>> when it's called it does turn 'name' into 'name.1'
>> but it prints out ascii characters as well.
>> any feedback on that and in general?
>
> The make_id_unique() procedure is flawed, see below:
>
> +static void make_unique_id_string(GtStr *current_id, unsigned long counter)
> +{
> +    /* name => name.1 */
> +    gt_str_append_char(current_id, '.');
> +    gt_str_append_ulong(current_id, counter);
> +}
> +
> +static void make_id_unique(GtGFF3Visitor *gff3_visitor, GtStr *id)
> +{
> +
> +  GtCstrTable *used_ids = gff3_visitor->gt_used_ids;
> +  unsigned long i = 0;
> +  const char *id_string = gt_str_get(id);
>
> id_string now contains a _pointer_ to the internal memory of the GtStr id.
>
>
> +  while ( gt_cstr_table_get(used_ids, gt_str_get(id) )) {
> +    /* TODO: add warning */
> +    make_unique_id_string(id, ++i);
>
> This call modifies id which means the iternal memory might be resized
> and therefore might be moved around in memory.
> Therefore, id_string has to be considered invalid at this point.
>
> +    gt_str_set(id, id_string);
>
> Now you reset id to the content of id_string (which might be invalid).
> What are you trying to do here?
> If you want to reset id for the next iteration, you have to do it
> differently, because now you would get into an infinite loop (you
> haven't called gt_cstr_table_get() with the modified id yet).
>
> +  }
> +  /* update table with the new id */
> +  gt_cstr_table_add(used_ids, gt_str_get(id));
> +
> +}
> +
>
> Hope that helps,
>
> Gordon
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
>

ok. i think i've resolved this:
http://gist.github.com/62770

for a.gff of:
##gff-version 3
1   ucb gene    2   2   .   -   .   ID=geneA
1   ucb gene    2   2   .   +   .   ID=geneB

and b.gff of:
gff-version 3
1   ucb gene    4   4   .   -   .   ID=geneA

it gives:

$ bin/gt gff3 -retainids a.gff b.gff 2>/dev/null
##gff-version   3
##sequence-region   1 2 2
1       ucb     gene    2       2       .       -       .       ID=geneA
###
1       ucb     gene    2       2       .       +       .       ID=geneB
###
##sequence-region   1 4 4
1       ucb     gene    4       4       .       -       .       ID=geneA.1
###



and if i add another c.gff that's same as b.gff, it creates geneA.2
-b


More information about the gt-users mailing list