[gt-users] gff3 parser
Brent Pedersen
bpederse at gmail.com
Thu Feb 12 18:53:20 CET 2009
On Thu, Feb 12, 2009 at 3:27 AM, Gordon Gremme <gremme at gmail.com> wrote:
>> i'm doing something wrong with the make_unique_id function because
>> when it's called it does turn 'name' into 'name.1'
>> but it prints out ascii characters as well.
>> any feedback on that and in general?
>
> The make_id_unique() procedure is flawed, see below:
>
> +static void make_unique_id_string(GtStr *current_id, unsigned long counter)
> +{
> + /* name => name.1 */
> + gt_str_append_char(current_id, '.');
> + gt_str_append_ulong(current_id, counter);
> +}
> +
> +static void make_id_unique(GtGFF3Visitor *gff3_visitor, GtStr *id)
> +{
> +
> + GtCstrTable *used_ids = gff3_visitor->gt_used_ids;
> + unsigned long i = 0;
> + const char *id_string = gt_str_get(id);
>
> id_string now contains a _pointer_ to the internal memory of the GtStr id.
>
>
> + while ( gt_cstr_table_get(used_ids, gt_str_get(id) )) {
> + /* TODO: add warning */
> + make_unique_id_string(id, ++i);
>
> This call modifies id which means the iternal memory might be resized
> and therefore might be moved around in memory.
> Therefore, id_string has to be considered invalid at this point.
>
> + gt_str_set(id, id_string);
>
> Now you reset id to the content of id_string (which might be invalid).
> What are you trying to do here?
> If you want to reset id for the next iteration, you have to do it
> differently, because now you would get into an infinite loop (you
> haven't called gt_cstr_table_get() with the modified id yet).
>
> + }
> + /* update table with the new id */
> + gt_cstr_table_add(used_ids, gt_str_get(id));
> +
> +}
> +
>
> Hope that helps,
>
> Gordon
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
>
ok. i think i've resolved this:
http://gist.github.com/62770
for a.gff of:
##gff-version 3
1 ucb gene 2 2 . - . ID=geneA
1 ucb gene 2 2 . + . ID=geneB
and b.gff of:
gff-version 3
1 ucb gene 4 4 . - . ID=geneA
it gives:
$ bin/gt gff3 -retainids a.gff b.gff 2>/dev/null
##gff-version 3
##sequence-region 1 2 2
1 ucb gene 2 2 . - . ID=geneA
###
1 ucb gene 2 2 . + . ID=geneB
###
##sequence-region 1 4 4
1 ucb gene 4 4 . - . ID=geneA.1
###
and if i add another c.gff that's same as b.gff, it creates geneA.2
-b
More information about the gt-users
mailing list