[gt-users] gt splitfasta -numfiles patch (was: New feature announcement)
Gordon Gremme
gremme at gmail.com
Thu Jan 29 13:36:41 CET 2009
Hi Brent,
> re contributions, after switching out a few simple scripts for some
> shell gt calls, i found gt splitfasta. it almost did what i needed.
> so i did just push a change to github that adds a -numfiles option to
> splitfasta. if it's useful to anyone else,
> please let me know if/how i can clean it up. but the new option does
> seem to work and i added a test.
welcome to the C-side of GenomeTools, that's where the pedants live ;-)
I think this option is a useful addition! I reviewed your patch,
please find my remarks below:
- We try to get a linear history in git (i.e., no merges) in our
history as much as possible. Looks nicer and more importantly
simplifies calls to `git bisect`. Therefore, it is best to develop new
features in separate branches which can then be rebased (`git rebase`)
before incorporated into the master.
- All C-code should adhere to our coding standards. You can check
source files with the script src_check (in the scripts/ directory).
Some cleanups can be done automatically with the script src_clean. To
call src_clean automatically before every commit, copy the pre-commit
script to your .git/hooks directory.
+ max_filesize_in_bytes = tot_size / (num_files + 1);
I believe the '+ 1' is superfluous and leads to the creation of one
excessive file.
+Name "gt splitfasta (-numfiles)"
+Keywords "gt_splitfasta"
+Test do
+ run "cp #{$testdata}U89959_ests.fas ."
+ run_test "#{$bin}gt splitfasta -numfiles 8 U89959_ests.fas"
+ if not File.exists?("U89959_ests.fas.1") then
+ raise TestFailed, "file 'U89959_ests.fas.1' does not exist"
+ end
+ if not File.exists?("U89959_ests.fas.1") then
^^
that should be .8
- You have a memleak in the code (the GtStrArray 'gtfiles' is not
deleted), but I will commit a gt_file_estimate_total_size() function
to make the creation of a GtStrArray for a single file unnecessary.
- I believe it would be better to have the calculation of
max_filesize_in_bytes (when num_files is set) before calling
split_fasta_file() instead of overriding it inside the function.
Please don't take this as a discouragement, I am looking forward to your patch!
Gordon
More information about the gt-users
mailing list