[gt-users] gt splitfasta -numfiles patch (was: New feature announcement)

Gordon Gremme gremme at gmail.com
Thu Jan 29 13:36:41 CET 2009


Hi Brent,

> re contributions, after switching out a few simple scripts for some
> shell gt calls, i found gt splitfasta. it almost did what i needed.
> so i did just push a change to github that adds a -numfiles option to
> splitfasta. if it's useful to anyone else,
> please let me know if/how i can clean it up. but the new option does
> seem to work and i added a test.

welcome to the C-side of GenomeTools, that's where the pedants live ;-)

I think this option is a useful addition! I reviewed your patch,
please find my remarks below:

- We try to get a linear history in git (i.e., no merges) in our
history as much as possible. Looks nicer and more importantly
simplifies calls to `git bisect`. Therefore, it is best to develop new
features in separate branches which can then be rebased (`git rebase`)
before incorporated into the master.

- All C-code should adhere to our coding standards. You can check
source files with the script src_check (in the scripts/ directory).
Some cleanups can be done automatically with the script src_clean. To
call src_clean automatically before every commit, copy the pre-commit
script to your .git/hooks directory.

+         max_filesize_in_bytes = tot_size / (num_files + 1);

I believe the '+ 1' is superfluous and leads to the creation of one
excessive file.


+Name "gt splitfasta (-numfiles)"
+Keywords "gt_splitfasta"
+Test do
+  run "cp #{$testdata}U89959_ests.fas ."
+  run_test "#{$bin}gt splitfasta -numfiles 8 U89959_ests.fas"
+  if not File.exists?("U89959_ests.fas.1") then
+    raise TestFailed, "file 'U89959_ests.fas.1' does not exist"
+  end
+  if not File.exists?("U89959_ests.fas.1") then
                                                        ^^
that should be .8

- You have a memleak in the code (the GtStrArray 'gtfiles' is not
deleted), but I will commit a gt_file_estimate_total_size() function
to make the creation of a GtStrArray for a single file  unnecessary.

- I believe it would be better to have the calculation of
max_filesize_in_bytes (when num_files is set) before calling
split_fasta_file() instead of overriding it inside the function.

Please don't take this as a discouragement, I am looking forward to your patch!

Gordon


More information about the gt-users mailing list