NAME Annovar::Wrapper - A wrapper around the annovar annotation pipeline VERSION Version 0.06 SYNOPSIS annovar-wrapper.pl --vcfs file1.vcf,file2.vcf --annovardb_path /path/to/annovar/dbs This module is a wrapper around the popular annotation tool, annovar. http://www.openbioinformatics.org/annovar/ . The commands generated are taken straight from the documentation. In addition, there is an option to reannotate using vcf-annotate from vcftools. It takes as its input a list or directory of vcf files, bgzipped and tabixed or not, and uses annovar to create annotation files. These multianno table files can be optionally reannotated into the vcf file. This script does not actually execute any commands, only writes them to STDOUT for the user to run as they wish. It comes with an executable script annovar-wrapper.pl. This should be sufficient for most of your needs, but if you wish to overwrite methods you can always do so in the usual Moose fashion. #!/usr/bin/env perl package Main; use Moose; extends 'Annovar::Wrapper'; Annovar::Wrapper->new_with_options->run; sub method_to_override { my $self = shift; #dostuff }; before 'method' => sub { my $self = shift; #dostuff }; has '+variable' => ( #things to add to variable declaration ); #or has 'variable' => ( #override variable declaration ); 1; Please see the Moose::Manual::MethodModifiers for more information. Prerequisites This module requires the annovar download. The easiest thing to do is to put the annovar scripts in your ENV{PATH}, but if you choose not to do this you can also pass in the location with annovar-wrapper.pl --tableannovar_path /path/to/table_annovar.pl --convert2annovar_path /path/to/convert2annovar.pl It requires Vcf.pm, which comes with vcftools. Vcftools is publicly available for download. http://vcftools.sourceforge.net/. export PERL5LIB=$PERL5LIB:path_to_vcftools/perl If you wish to you reannotate the vcf file you need to have bgzip and tabix installed, and have the executables in vcftools in your path. export PATH=$PATH:path_to_vcftools Generate an Example To generate an example you can run the following commands tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:39967768-40000000 > test.vcf bgzip test.vcf tabix test.vcf.gz vcf-subset -c HG00098,HG00100,HG00106,HG00112,HG00114 test.vcf.gz | bgzip -c > out.vcf.gz tabix out.vcf.gz rm test.vcf.gz rm test.vcf.gz.tbi annovar-wrapper.pl --vcfs out.vcf.gz --annovar_dbs refGene --annovar_fun g --outdir annovar_out --annovardb_path /path/to/annovar/dbs > my_cmds.sh There is more detail on the example in the pod files. Variables Annovar Options tableannovar_path You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location. convert2annovar_path You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location annovardb_path Path to your annovar databases buildver Its probably hg19 or hg18 convert2annovar_opts Assumes vcf version 4 and that you want to convert all samples Not using --allsample on a multisample vcf is untested and will probably break the whole pipeline annovar_dbs These are pretty much all the databases listed on http://www.openbioinformatics.org/annovar/annovar_download.html for hg19 that I tested as working #Download databases with cd path_to_annovar_dir ./annotate_variation.pl --buildver hg19 -downdb -webfrom annovar esp6500si_aa hg19/ #Option is an ArrayRef, and can be given as either --annovar_dbs cg46,cg69,nci60 #or --annovar_dbs cg46 --annovar_dbs cg69 --annovar_dbs nci60 annovar_fun Functions of the individual databases can be found at What function your DB may already be listed otherwise it is probably listed in the URLS under Annotation: Gene-Based, Region-Based, or Filter-Based Functions must be given in the corresponding order of your annovar_dbs #Option is an ArrayRef, and can be given as either --anovar_fun f,f,g #or --annovar_fun f --annovar_fun f --annovar_fun g annovar_cols Some database annotations generate multiple columns. For reannotating the vcf we need to know what these columns are. Below are the columns generated for the databases given in annovar_dbs To add give a hashref of array Wrapper Options indir A path to your vcf files can be given, and using File::Find::Rule it will recursively search for vcf or vcf.gz vcfs VCF files can be given individually as well. #Option is an ArrayRef and can be given as either --vcfs 1.vcf,2.vcf,3.vcfs #or --vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf Don't mix the methods outdir Path to write out annotation files. It creates the structure outdir --annovar_interim --annovar_final --vcf-annotate_interim #If you choose to reannotate VCF file --vcf-annotate_final #If you choose to reannotate VCF file A lot of interim files are created by annovar, and the only one that really matters unless you debugging a new database is the multianno file found in annovar_final If not given the outdirectory is assumed to be the current working directory. annotate_vcf Use vcf-annotate from VCF tools to annotate the VCF file This does not overwrite the original VCF file, but instead creates a new one To turn this off annovar-wrapper.pl --annotate_vcf 0 SUBROUTINES/METHODS run Subroutine that starts everything off print_opts Print out the command line options check_files Check to make sure either an indir or vcfs are supplied find_vcfs Use File::Find::Rule to find the vcfs parse_commands Allow for giving ArrayRef either in the usual fashion or with commas write_annovar Write the commands that Convert the vcf file to annovar input Do the annotations Reannotate the vcf - if you want get_samples Using VCF tools get the samples listed per vcf file Supports files that are bgzipped or not Sample names are stripped of all non alphanumeric characters. convert_annovar Print out the command to print the convert2annovar commands table_annovar Print out the commands to generate the annotation using table_annovar.pl command. vcf_annotate Generate the commands to annotate the vcf file using vcf-annotate gen_descr Bgzip, tabix, all of vcftools, and sort must be in your PATH for these to work. There are two parts to this command. The first prepares the annotation file. 1. The annotation file is backed up just in case 2. The annotation file is sorted, because I had some problems with sorting 3. The annotation file is bgzipped, as required by vcf-annotate 4. The annotation file is tabix indexed using the special commands -s 1 -b 2 -e 3 The second writes out the vcf-annotate commands Example with RefGene zcat ../../variants.vcf.gz | vcf-annotate -a sorted.annotation.gz \ -d key=INFO,ID=SAMPLEID_Func_refGene,Number=0,Type=String,Description='SAMP LEID Annovar Func_refGene' \ -d key=INFO,ID=SAMPLEID_Gene_refGene,Number=0,Type=String,Description='SAMP LEID Annovar Gene_refGene' \ -d key=INFO,ID=SAMPLEID_ExonicFun_refGene,Number=0,Type=String,Description= 'SAMPLEID Annovar ExonicFun_refGene' \ -d key=INFO,ID=SAMPLEID_AAChange_refGene,Number=0,Type=String,Description=' SAMPLEID Annovar AAChange_refGene' \ -c CHROM,FROM,TO,-,-,INFO/SAMPLEID_Func_refGene,INFO/SAMPLEID_Gene_refGene, INFO/SAMPLEID_ExonicFun_refGene,INFO/SAMPLEID_AAChange_refGene > SAMPLEID.annotated.vcf gen_cols Generate the -c portion of the vcf-annotate command merge_vcfs There is one vcf-annotated file per sample, so merge those at the the end to get a multisample file using vcf-merge subset_vcfs vcf-merge used in this fashion will create a lot of redundant columns, because it wants to assume all sample names are unique Straight from the vcftools documentation vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz AUTHOR Jillian Rowe, "" BUGS Please report any bugs or feature requests to "bug-annovar-wrapper at rt.cpan.org", or through the web interface at . I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. SUPPORT You can find documentation for this module with the perldoc command. perldoc Annovar::Wrapper You can also look for information at: * RT: CPAN's request tracker (report bugs here) * AnnoCPAN: Annotated CPAN documentation * CPAN Ratings * Search CPAN ACKNOWLEDGEMENTS This module is a wrapper around the well developed annovar pipeline. The commands come straight from the documentation. This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team and input by Khalid Fahkro. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude. LICENSE AND COPYRIGHT Copyright 2014 Jillian Rowe. This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at: Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.