Perl Scripting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Introduction to PERL: Scripting for UNIX made simple and portable

Yuk Sham
MSI Consultant Phone: (612) 626 0802 (help) Email: [email protected]

What is PERL? Why would I use PERL instead of something else? PERL features How to run PERL scripts PERL syntax, variables, quotes Flow control constructs Subroutines Typical UNIX scripting tasks Filter a file or a group of files Searching/Matching Naming file sequences Executing applications Parsing files More information

Outline

What is PERL?
Practical Extraction Report Language Written by Larry Wall Combines capabilities of Bourne shell, csh, awk, sed, grep, sort and C To assist with common tasks that are too heavy or portable-sensitive in shell, and yet too weird or too complicated to code in C or other programming language. File or list processing - matching, extraction, formatting (text reports, HTML, mail, etc.)

Why would I use PERL instead of something else?


Interpreted language Commonly used for cgi programs Very flexible Very automatic Can be very simple for a variety of tasks WIDELY available HIGHLY portable

PERL features
C-style flow control (similar) Dynamic allocation Automatic allocation Numbers Lists Strings Arrays Associative arrays (hashes)

PERL features
Very large set of publicly available libraries for wide range of applications Math functions (trig, complex) Automatic conversions as needed Pattern matching Standard I/O Process control System calls Can be object oriented

How to run PERL scripts


% cat hello.pl print "Hello world from PERL.\n"; %

% perl hello.pl Hello world from PERL.

How to run PERL scripts


OR ------------------

% which perl /usr/bin/perl

% cat hello.pl #!/usr/bin/perl print "Hello world from PERL.\n"; %chmod a+rx hello.pl % hello.pl Hello world from PERL.
(the .pl suffix is just a convention - no special meaning - to perl) /usr/local/bin/perl is another place perl might be linked at Institute

PERL syntax
Free form - whitespace and newlines are ignored, except as delimiters PERL statements may be continued across line boundaries All PERL statement end with a ; (semicolon) Comments begin with the # (pound sign) and end at a newline Comments may be embedded in a statement
see previous item no continuation may be anywhere, not just beginning of line

Example 1: #!/usr/bin/perl # This is how perl says hello print "Hello world from PERL.\n"; # It says hello once print "Hello world again from PERL.\n";# It says hello twice Example 2:

Hello world

#!/usr/bin/perl print"Hello world from PERL.\n";print"Hello world again from PERL.\n"; Example 3: #!/usr/bin/perl print "Hello world from PERL.\n"; print "Hello world again from PERL.\n"; Hello world from PERL. Hello world again from PERL.

10

PERL variables
Number or string Array
$count List of numbers and/or strings Indexed by number starting at zero @an_array List of numbers and/or strings Indexed by anything %a_hash

Associative array or hash

11

$x = 27; $y = 35; $name = "john"; @a = ($x,$y,$name); print x = $x and y = $y\n; print The array is @a \n"; X = 27 and y = 35 The array is 27 35 john @a = ("fred","barney","betty","wilma"); print "The names are @a \n"; print "The first name is $a[0] \n"; print "The last name is $a[3] \n"; The names are fred barney betty wilma The first name is fred The last name is wilma

Strings and arrays

12

$a{dad} = "fred"; $a{mom} = "wilma"; $a{child} = "pebble"; print "The mom is $a{mom} \n"; The mom is wilma

Associative arrays

@keys = keys(%a); @values = values(%a); print The keys are @keys \n print The values are @values \n"; The keys are mom dad child The values are wilma fred pebble

13

increase or decrease existing value by 1 (++, --) modify existing value by +, -, * or / by an assigned value (+=, -=, *=, /=)
Example 1 $a = 1; $b = "a"; ++$a; ++$b; print "$a $b \n"; 2 b Example 2 $a = $b = $c = 1; ++$b; $c *= 3; print "$a $b $c\n"; 1 2 3

Operators and functions

14

Operators and functions


Numeric logical operators
==, !=, <, >, <=, >=

String logical operators


eq, ne, lt, gt, le, ge

15

Add and remove element from existing array (Push, pop, unshift, shift) Rearranging arrays (reverse, sort)

@a = qw(one two three four five six); print "@a\n"; one two three four five six unshift(@a,zero"); print "@a\n"; zero one two three four five six shift(@a); print "@a\n"; one two three four five six @a = reverse(@a); print "@a\n"; six five four three two one @a = sort(@a); print "@a\n"; five four one six three two

Operators and functions

# add elements to the array # from the left side # removes elements from the array # from the left side # reverse the order of the array

# sort the array in alphabetical order

16

Removes last character from a string (chop) Removes newline character, \n,from end of a string (chomp) Breaks a regular expression into fields (split) and joints the pieces back (join)
$a = "this is my expression\n"; print "$a"; this is my expression chomp($a); print "$a . "; @a = split(/ /,$a); print "$a[3] $a[2] $a[1] $a[0]\n";

Operators and functions

# splits $a string into an array called @a

this is my expression. expression my is this $a = join(":",@_); print "$a \n"; this:is:my:expression # create a string called $a by joining # all the elements in the array @a and # having : spaced between them

17

Substituting a pattern (=~ s/./../) Transliteration (=~ tr/././) $_ = "this is my expression\n"; print "$_\n"; this is my expression $_ =~ s/my/your/; print "$_\n"; this is your expression $_ =~ tr/a-z/A-Z/; print "$_\n"; THIS IS YOUR EXPRESSION

Operators and functions

18

Control_operator (expression(s) ) { statement_block; } Example: if ( $i < $N ) { statement_block; } else { statement_block; } foreach $i ( @list_of_items ) { statement_block; }

Flow control constructs

19

Subroutines
# assigns an array @a @a = qw(1 2 3 4); print summation(@a),"\n"; # prints results of subroutine # summation using @a as # input sub summation { my $k = 0; foreach $i (@_) { $k += $i; } return($k); } 10 # summing every element in # the array @a and return # the value as $k

20

Command-line arguments
#!/usr/bin/perl print "Command name: $0\n"; print "Number of arguments: $#ARGV\n"; for ($i=0; $i <= $#ARGV; $i++) { print "Arg $i is $ARGV[$i]\n"; }

% ./arguments.pl zero one two three Number of arguments: 3 Arg 0 is zero Arg 1 is one Arg 2 is two Arg 3 is three
21

Concatenating Strings with the . operator


$firstname = George; $midname = walker; $lastname = Bush; $fullname = $lastname . , . $firstname . . uc(substr $midname, 0, 1) . .\n; print $fullname;

Bush, George W.

22

UNIX Environment Variables


print your username is $ENV{USER} and \n; print your machine name is $ENV{HOST} and \n; print your display is set to $ENV{DISPLAY} and \n; print your shell is $ENV{SHELL} and \n; print your timezone is $ENV{TZ} etcetera.\n;

your your your your your

username is shamy and machine name is cirrus.msi.umn.edu and display is set to localhost:10.0 and shell is /bin/tcsh and timezone is CST6CDT, etcetera...

23

Typical UNIX scripting tasks


Filter a file or a group of files Searching/Matching Naming file sequences Executing applications Parsing files

24

Filtering standard input


#!/usr/bin/perl while( <> ) { print "line $. : $_" ; } # read from stdin one line at a time # print current line to stdout

print.txt
Silicon Graphics' Info Search lets you find all the information available on a topic using a keyword search. Info Search looks begin through all the release notes, man pages, and/online books you done have installed on your system or on a networked server. From the Toolchest on your desktop, choose Help-Info Search. begin Quick Answers tells you how to connect to an Internet Service Provider (ISP). done From the Toolchest on your desktop, choose Help > Quick Answers > How Do I > Connect to an Internet Service Provider. through all the release notes, man pages, and/online books you Quick Answers tells you how to connect to an Internet Service Provider (ISP).

25

./printlines.pl print.txt

Filtering standard input

line 1 : Silicon Graphics' Info Search lets you find all the information line 2 : available on a topic using a keyword search. Info Search looks line 3 : begin line 4 : through all the release notes, man pages, and/online books you line 5 : done line 6 : have installed on your system or on a networked server. From line 7 : the Toolchest on your desktop, choose Help-Info Search. line 8 : begin line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 11 : done line 12 : From the Toolchest on your desktop, choose line 13 : Help > Quick Answers > How Do I > Connect to an Internet Service Provider. line 14 : through all the release notes, man pages, and/online books you line 15 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

26

Filtering standard input


#!/usr/bin/perl while( <> ) { print "line $. : $_" unless $. %2; } ./printeven.pl print.txt
line 2 : available on a topic using a keyword search. Info Search looks line 4 : through all the release notes, man pages, and/online books you line 6 : have installed on your system or on a networked server. From line 8 : begin line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 12 : From the Toolchest on your desktop, choose line 14 : through all the release notes, man pages, and/online books you

# print only the even lines

27

Filtering standard input


#!/usr/bin/perl while( <> ) { if( /begin/ .. /done/ ) { print "line $. : $_; } } ./printpattern.pl print.text
line 3 : begin line 4 : through all the release notes, man pages, and/online books you line 5 : done line 8 : begin line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP). line 11 : done

# prints any text that # starts with begin # and finishes with end

28

Filtering standard input


#!/usr/bin/perl while( <> ) { if( /begin/ .. /done/ ) { unless( /begin/ || /done/ ) { print "line $. : $_; } } }

./printpattern2.pl print.text
line 4 : through all the release notes, man pages, and/online books you line 9 : line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

29

Naming files
Files Reformating files

30

#!/usr/bin/perl # touch.pl foreach $i ( 0 .. 50 ) { print "touch gifdir/$i.gif\n"; system("touch gifdir/$i.gif"); } ./touch.pl


Perl executes the following in unix: touch touch touch touch touch . . . touch touch touch gifdir/0.gif gifdir/1.gif gifdir/2.gif gifdir/3.gif gifdir/4.gif

Files

gifdir/48.gif gifdir/49.gif gifdir/50.gif

31

Files
% ls lt gifdir/*.gif -rw-------rw-------rw-------rw-------rw------1 1 1 1 1 shamy shamy shamy shamy shamy support support support support support 995343 995343 995343 995343 995343 . . . 995343 995343 995343 995343 995343 Oct Oct Oct Oct Oct 21 21 21 21 21 18:50 18:50 18:50 18:50 18:50 50.gif 49.gif 48.gif 47.gif 46.gif

-rw-------rw-------rw-------rw-------rw-------

1 1 1 1 1

shamy shamy shamy shamy shamy

support support support support support

Oct Oct Oct Oct Oct

21 21 21 21 21

18:50 18:50 18:50 18:50 18:50

4.gif 3.gif 2.gif 1.gif 0.gif

32

#!/usr/bin/perl foreach $i ( 0 .. 50 ) { $new = sprintf("step%3.3d.gif", $i); print "mv gifdir2/$i.gif gifdir2/$new\n"; system "mv gifdir2/$i.gif gifdir2/$new"; } ./rename.pl Perl executes the following in unix:
mv mv mv mv mv mv mv mv mv gifdir2/0.gif gifdir2/step000.gif gifdir2/1.gif gifdir2/step001.gif gifdir2/2.gif gifdir2/step002.gif gifdir2/3.gif gifdir2/step003.gif gifdir2/4.gif gifdir2/step004.gif . . gifdir2/47.gif gifdir2/step047.gif gifdir2/48.gif gifdir2/step048.gif gifdir2/49.gif gifdir2/step049.gif gifdir2/50.gif gifdir2/step050.gif

Files
# naming the gif file with # with a 3 digit numbering # scheme

33

ls gifdir2 (before)
gifdir2: 0.gif 14.gif 2.gif 25.gif 30.gif 36.gif 41.gif 47.gif 7.gif 1.gif 15.gif 20.gif 26.gif 31.gif 37.gif 42.gif 48.gif 8.gif 10.gif 16.gif 21.gif 27.gif 32.gif 38.gif 43.gif 49.gif 9.gif 11.gif 17.gif 22.gif 28.gif 33.gif 39.gif 44.gif 5.gif 12.gif 18.gif 23.gif 29.gif 34.gif 4.gif 45.gif 50.gif 13.gif 19.gif 24.gif 3.gif 35.gif 40.gif 46.gif 6.gif

Files

ls gifdir2 (after)
gifdir2: script step008.gif step017.gif step026.gif step035.gif step044.gif step000.gif step009.gif step018.gif step027.gif step036.gif step045.gif step001.gif step010.gif step019.gif step028.gif step037.gif step046.gif step002.gif step011.gif step020.gif step029.gif step038.gif step047.gif step003.gif step012.gif step021.gif step030.gif step039.gif step048.gif step004.gif step013.gif step022.gif step031.gif step040.gif step049.gif step005.gif step014.gif step023.gif step032.gif step041.gif step050.gif step006.gif step015.gif step024.gif step033.gif step042.gif step007.gif step016.gif step025.gif step034.gif step043.gif

34

Parsing and reformating Files


HEADER COMPND REMARK REMARK RORIGX2 CALCIUM-BINDING PROTEIN 29-SEP-92 CALMODULIN (VERTEBRATE) 1 REFERENCE 1 1 AUTH W.E.MEADOR,A.R.MEANS,F.A.QUIOCHO 0.000000 0.018659 0.001155 0.00000 . . . ATOM 1 N LEU 4 -6.873 21.082 25.312 ATOM 2 CA LEU 4 -6.696 22.003 26.447 ATOM 3 C LEU 4 -6.318 23.391 25.929 ATOM 4 O LEU 4 -5.313 23.981 26.352 ATOM 5 N THR 5 -7.147 23.871 25.013 ATOM 6 CA THR 5 -6.891 25.193 24.428 . . . CONECT 724 723 1137 CONECT 736 735 1137 1CLL 2 1CLL 3 1CLL 13 1CLL 14 1CLL 143

1.00 49.53 1.00 48.82 1.00 46.50 1.00 45.72 1.00 46.77 1.00 46.84

1CLL 1CLL 1CLL 1CLL 1CLL 1CLL

148 149 150 151 152 153

1CLL1440 1CLL1441

35

Parsing Files
#!/usr/bin/perl $pdbfile = shift; ($pref = $pdbfile) =~ s/\.pdb//; print "Converting $pdbfile to $pref.xyz \n"; open(FILIN, "<$pdbfile" || die "Cannot open pdb file $pdbfile \n "); open(FILOUT,">$pref.xyz"); while (<FILIN>) { if (/^ATOM/) { chomp; split; } }

printf FILOUT "%5d %4s %8.3f%8.3f%8.3f\n", $_[1], substr($_[2], 0, 1), $_[5], $_[6], $_[7];

close(FILIN); close(FILOUT);

36

Reformating Files
./pdb2xyz.pl foo.pdb more foo.xyz
1 2 3 4 5 6 N C C O N C -6.873 -6.696 -6.318 -5.313 -7.147 -6.891 21.082 22.003 23.391 23.981 23.871 25.193 25.312 26.447 25.929 26.352 25.013 24.428

. . .

37

Executing applications
#!/usr/bin/perl $pdbfile = shift(@ARGV); ($pref = $pdbfile) =~ s/.pdb//; system ("rm -r $pref"); system ("mkdir $pref"); chdir ("$pref"); open(SCRIPT,">script"); print SCRIPT "zap\n"; print SCRIPT "load pdb ../$pdbfile\n"; print SCRIPT "background black\n"; print SCRIPT "wireframe off\n"; print SCRIPT "ribbons on\n"; print SCRIPT "color ribbons yellow\n"; for ($i = 0; $i <= 50; ++$i) { $name = sprintf("%3.3d",$i); print SCRIPT "rotate x 10\n"; print SCRIPT "write $name.gif\n"; } print SCRIPT "quit\n"; close SCRIPT; #create a variable $pref using the prefix

#of the pdb filen

#create a directory named after $pref #change directory into $pref #create a a file called script

#assigns a value from 0 to 50 #create a file name based on this value #for every value, rotate 10 degrees #generate a gif file for each value

system("/usr/local/bin/rasmol < script"); system("dmconvert -f mpeg1v -p video ###.gif out.mpeg"); chdir ("..");

#execute the rasmol program #execute dmconvert to make movie

38

more foo/script
background black wireframe off ribbons on color ribbons yellow rotate x 10 write 000.gif rotate x 10 write 001.gif rotate x 10 write 002.gif . .

Executing applications

ls -lt foo
total 99699 -rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw------1 1 1 1 1 1 1 1 shamy shamy shamy shamy shamy shamy shamy shamy support support support support support support support support 256504 Oct 21 18:34 out.mpeg 995343 Oct 21 18:33 050.gif 995343 Oct 21 18:33 049.gif 995343 Oct 21 18:33 048.gif . . 995343 Oct 21 18:32 002.gif 995343 Oct 21 18:32 001.gif 995343 Oct 21 18:32 000.gif 1418 Oct 21 18:32 script

39

>sequence1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGC ACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAAATA GAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGG TGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCA TGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAG CAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCA ATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >sequence2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGC GATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGG TGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCA TGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAG CAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCA ATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Parsing a DNA sequence

40

#!/usr/bin/perl while (<>) { if ($_ =~ /^\>/ || eof ) { if ($count > 0) { $line = join("",@line); print $seq; fixhead($line); fixtail($line); write stdout; } $count = 0; $seq = $_; @line = ""; } else { chomp; ++$count; push(@line,$_); } } format stdout =

Parsing a DNA sequence

~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $line .
41

sub fixhead { $length = length($line); for ($i = 0;$i <= $length; ++$i){ if (substr($line,0,1) eq "X"){ $line = substr($line,1,$length-1); } else { return; } } }

Parsing a DNA sequence

sub fixtail { $length = length($line); for ($i = 0;$i <= $length; ++$i){ if (substr($line,$length-($i+1),1) eq "X"){ $line = substr($line,0,$length-($i+1)); } else { return; } } }

42

Parsing a DNA sequence


>sequence1 GGCACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAA ATAGAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTG TGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTA CCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAAT CAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAA GCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAC >sequence2 GGCGATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTG TGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTA CCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAAT CAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAA GCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAC
43

Creating DNA sequence fragments

>chr4 GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTTCGAGG ATTCGTCGACCAGGTGTTGGAATCGTCGACCGAGTCTGAGAATTCGTAGACCAGGACGGC GGAATCCTCGACAATGACGAGGTATGGTCGAGGAAAATCTATCGGGTTCGAGGATTCGTC TACCAGGTGATGGAATCCTCGACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTT GTTATTCCGATCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACG GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGCCGATG AGGATAAGAGGTCGATCGGATGGACGGAAGAGGTAGAGGAAGAGCCATGAAGCGGCGAGG CATAGGAGGAGGATGAGCGAGAATGGGTGGGCGGGAAGAGAGAAACTGATGATCAGAGCG ATGATGCAGACGTAATTCACCCTGAAATAAGAGGAGTTCTTCCAGAATCGCGTCATGGCC TAAGGGTTAGGGGTTAAGGGTTAAGGGTTTAGGGTTAAGGGTTAAGGGTTTAGGGTTTAG GGTTTAGG

44

#!/usr/bin/perl # $infile = $ARGV[0]; $break_length = $ARGV[1]; $overlap_length = $ARGV[2]; $seq_count = 0; $count = 0; $fileflag = 0;

Parsing a DNA sequence

open (IN, "< $infile" ) || die "can not open input file for reading: $!\n"; while (<IN>) { if (!(/^\>/ )) { chomp; push(@line,$_); } } $seq = join("",@line); $length = length($seq); $nfrag = int($length/$break_length); $frag_length = $break_length + $overlap_length; print "The break length = $break_length\n"; print "The overlap length = $overlap_length\n"; print "The total length of the sequence = $length\n"; print "The total length of each fragment = $frag_length\n"; print "The total number of fragments = $nfrag\n\n";
45

Parsing a DNA sequence


for ($i = 0;$i <= $nfrag; ++$i){ $start = $i * $break_length; $stop = $i * $break_length+$frag_length; $frag = substr($seq,$start,$frag_length); # $outfile = $infile.sprintf("_%5.5d_%5.5d",$start,$stop); $outfile = $infile."_".$start."_".$stop; open( OUT, "> $outfile" ) || die "Can not open output file\n"; print "Writing framgment from $start to $stop to fragment file $outfile\n"; print OUT "$outfile $start $stop\n"; write OUT; } format OUT = ~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $frag .

46

yuk.pl short 50 5
The The The The The break length = 50 overlap length = 5 total length of the sequence = 608 total length of each fragment = 55 total number of fragments = 12 framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment framgment . from from from from from from from from from from from from from

Parsing a DNA sequence

Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing

0 to 55 to fragment file short_0_55 50 to 105 to fragment file short_50_105 100 to 155 to fragment file short_100_155 150 to 205 to fragment file short_150_205 200 to 255 to fragment file short_200_255 250 to 305 to fragment file short_250_305 300 to 355 to fragment file short_300_355 350 to 405 to fragment file short_350_405 400 to 455 to fragment file short_400_455 450 to 505 to fragment file short_450_505 500 to 555 to fragment file short_500_555 550 to 605 to fragment file short_550_605 600 to 655 to fragment file short_600_655

47

Parsing a DNA sequence

more short_*
short_0_55 0 55 GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTT ...skipping... short_100_155 100 155 AATTCGTAGACCAGGACGGCGGAATCCTCGACAATGACGAGGTATGGTCGAGGAA ...skipping... short_150_205 150 205 AGGAAAATCTATCGGGTTCGAGGATTCGTCTACCAGGTGATGGAATCCTCGACCA ...skipping... short_200_255 200 255 GACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTTGTTATTCCGATCATG ...skipping... short_250_305 250 305 TCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACGGCGAT ...skipping... short_300_355 300 355 GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGC ...skipping...

48

Convert seq to fasta format


ls *.seq tc86660.seq tc86662.seq tc86664.seq tc86666.seq tc86668.seq tc86661.seq tc86663.seq tc86665.seq tc86667.seq tc86669.seq

49

Convert seq to fasta format


source /usr/local/gcg/gcgstartup gcg submitfasta.pl

50

Convert seq to fasta format


#!/usr/bin/perl @list =`ls -1 *.seq`; foreach $i (@list) { chomp($i); system("/usr/local/gcg_10.3/solarisbin/gcgbin /execute/tofasta $i -Default"); }

51

Convert seq to fasta format


ls README tc86661.tfa submitfasta.pl* tc86662.seq submitfasta.pl~* tc86662.tfa tc86660.seq tc86663.seq tc86660.tfa tc86663.tfa tc86661.seq tc86664.seq tc86664.tfa tc86665.seq tc86665.tfa tc86666.seq tc86666.tfa tc86667.seq tc86667.tfa tc86668.seq tc86668.tfa tc86669.seq tc86669.tfa

52

<?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd" > <BlastOutput> <BlastOutput_program>blastn</BlastOutput_program> <BlastOutput_version>blastn 2.2.5 [Nov-16-2002]</BlastOutput_version> <BlastOutput_reference>~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~&quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> <BlastOutput_db>ecoli.nt</BlastOutput_db> <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID> <BlastOutput_query-def>sequence1</BlastOutput_query-def> <BlastOutput_query-len>319</BlastOutput_query-len> <BlastOutput_param> <Parameters> <Parameters_expect>10</Parameters_expect> <Parameters_sc-match>1</Parameters_sc-match> <Parameters_sc-mismatch>-3</Parameters_sc-mismatch> <Parameters_gap-open>5</Parameters_gap-open> <Parameters_gap-extend>2</Parameters_gap-extend> <Parameters_filter>D</Parameters_filter> </Parameters> </BlastOutput_param>
53

Parsing Blast output

<BlastOutput_iterations> <Iteration> <Iteration_iter-num>1</Iteration_iter-num> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_id>gi|1789957|gb|AE000431.1|AE000431</Hit_id> <Hit_def>Escherichia coli K-12 MG1655 section 321 of 400 of the complete genome</Hit_def> <Hit_accession>AE000431</Hit_accession> <Hit_len>11575</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>30.2282</Hsp_bit-score> <Hsp_score>15</Hsp_score> <Hsp_evalue>1.12539</Hsp_evalue> <Hsp_query-from>267</Hsp_query-from> <Hsp_query-to>253</Hsp_query-to> <Hsp_hit-from>8485</Hsp_hit-from> <Hsp_hit-to>8499</Hsp_hit-to> <Hsp_query-frame>1</Hsp_query-frame> <Hsp_hit-frame>-1</Hsp_hit-frame> <Hsp_identity>15</Hsp_identity> <Hsp_positive>15</Hsp_positive> <Hsp_align-len>15</Hsp_align-len> <Hsp_qseq>GCTAATCACTTTATT</Hsp_qseq> <Hsp_hseq>GCTAATCACTTTATT</Hsp_hseq> <Hsp_midline>|||||||||||||||</Hsp_midline> </Hsp> </Hit_hsps> </Hit> <Hit> <Hit_num>2</Hit_num> <Hit_id>gi|1789185|gb|AE000366.1|AE000366</Hit_id> 54

Parsing Blast output

. more test.out.1.xls
sequence1</BlastOutput_query-def> 319

Parsing Blast output

more test.out.1.xls
sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> 1 2 3 4 5 6 7 8 Esch Esch Esch Esch Esch Esch Esch Esch AE000431 AE000366 AE000467 AE000410 AE000300 AE000220 AE000170 AE000123 30.2282 1.12539 30.2282 1.12539 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 11575 10405 15633 10826 16939 9780 10627 11093 1 1 1 1 1 1 1 1

more test.out.1.xls
sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> sequence1</BlastOutput_query-def> 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 267 59 101 160 22 95 33 40 253 73 114 147 9 108 16 53 8485 7824 9628 2067 2971 5209 2344 2390 8499 7838 9641 2080 2984 5222 2361 2403 15 15 14 14 14 14 18 14 15 15 14 14 14 14 17 14 100.00 100.00 100.00 100.00 100.00 100.00 94.44 100.00 0 0 0 0 0 0 0 0 30.2282 1.12539 30.2282 1.12539 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683 28.2459 4.44683

55

foreach $i (@files) { ++$count; print $i; chomp($i); ($prefix = $i) =~ s/\.pdb//; `cp $dir/$i .`; &wait(); `./run.pl $i`; } sub wait { loop: $check = `llq -u duany | grep " [IR] "|wc`; @check = split(/\t/,$check); print "There are $check[0] in the queue\n"; if ($check[0] > 5) { print "I am sleeping\n"; sleep 60; goto loop; } else { print "I am awake\n"; print "I am right now working on $i\n"; return; } }

Executing applications in a queue

56

More info
CPAN - Comprehensive Perl Archive Network Perl Resource Topics Bioperl
http://bioperl.org/ http://www.netcat.co.uk/rob/perl/win32perltut.html http://www.1001tutorials.com/perltut/index.html http://www.perlmasters.com/tutorial http://www-2.cs.cmu.edu/cgi-bin/perl-man Countless more are available... http://www.cpan.org Source, binaries, libs, scripts, FAQs, links http://www.perl.com/pub/q/resources

Others

57

Contact the Institute for additional help


Yuk Sham Computational Biology/Biochemistry Consultant Phone: (612) 624 7427 (direct) Email: [email protected]

58

You might also like