.. _week_five: *************************** Week Five 14-Feb-12 *************************** ======= Tuesday ======= .. note:: Learning Goals: * BLAST ASN formatting * open files in python * parse BLAST output with python * for loop * split strings Lecture --------- **BLAST ASN format** write a shell script called week5_Tues.sh:: run BLASTN vs 16SMicrobial database use file /jscbio2710/week5.fa make the output ASN format | reformat the output using blast_formatter | command to give hit table format | without headers (-outfmt 6) completed script:: #!/bin/bash BLASTN=/mnt/blast/ncbi-blast-2.2.25+/bin/blastn BLASTFORMATTER=/mnt/blast/ncbi-blast-2.2.25+/bin/blast_formatter DB=$HOME/blast/databases/16SMicrobial QUERY=week5.fa OUTFILE=$QUERY.blast.asn # /mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.fa -outfmt 11 -out testThree.fa.blast.asn echo "Running BLASTN" echo "query: $QUERY" echo "db: $DB" $BLASTN -db $DB -query $QUERY -outfmt 11 -out $OUTFILE echo "Finished BLASTN" $BLASTFORMATTER -archive $OUTFILE -outfmt 6 -out OUTFILE.table **Parsing BLAST output with python** *for loops* *arrays* *reading files line by line* write a python program that reads blast output from table format:: #!/usr/bin/env python """ James Vincent Feb. 14, 2012 Read a BLAST output file in table format Print only certain columns from the hit table Table format from outfmt 6 should have these fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score """ import sys # get blast file name from command line blastFileName = sys.argv[1] # open the input file (blast file) blastFile = open(blastFileName) # loop over each line in the file for thisLine in blastFile.readlines(): lineParts = thisLine.strip().split('\t') print '%s\t%s\t%5.1f\t%d'% (lineParts[0],lineParts[1],float(lineParts[2]),int(lineParts[11])) Homework --------- **Reading** * http://learnpythonthehardway.org/book/ Exercises 18 and 19 **Exercises** - complete Exercises 18,19 in the python reading above **Turn In** - turn in python exercises 18,19 - put them in the proper homework directory in your home on the AWS server ( week 5, Tuesday) - write a shell script to run BLAST on the file week5.fa from today's class vs 16SMicrobial - store BLAST output in ASN format - reformat the ASN blast output to hit table with only these fields: Query Seq-id, Subject Seq-id, score, evalue - write a python program that parses the reformatted BLAST output table - make the program print the average score from all hits ======= Thursday ======= .. note:: Learning Goals: * for loops, lists again * boolean expressions * conditional statements * if statement * BLAST with python program analysis Lecture --------- **Example 1** *for loops*:: #!/usr/bin/env python """ James Vincent Feb 16, 2012 Week 5 - Thurs example1.py Loop over items in a list """ myList = ['a', 'b', 'c', 'd', 'e'] for myItem in myList: print myItem *output*:: a b c d e **Example 2** *lists and slice notation*:: #!/usr/bin/env python """ James Vincent Feb 16, 2012 Week 5 - Thurs example2.py Print items from a list with slice notation """ myList = ['a', 'b', 'c', 'd', 'e'] print "The first item in the list: ",myList[0] print "The second item in the list: ",myList[1] print # Slice notation [n:m] means from the n-th item up to the m-th item, # but excluding the last item at m print "Slice [0:2] returns items at " print "index 0 and index 1 but not at index 2 : ",myList[0:2] print # Another way of stating slice notation is [start:end] # The item and start will be returned up to the end-1 item print "Slice [2:5] returns items at " print "index 2, 3 and 4 but not 5: ",myList[2:5] *output*:: The first item in the list: a The second item in the list: b Slice [0:2] returns items at index 0 and index 1 but not at index 2 : ['a', 'b'] Slice [2:5] returns items at index 2, 3 and 4 but not 5: ['c', 'd', 'e'] **Example 3** *for loops and boolean expressions*:: #!/usr/bin/env python """ James Vincent Feb 16, 2012 Week 5 - Thurs example3.py Conditionally print items from a list """ myList = ['a', 'b', 'c', 'd', 'e'] # loop over items in the list for thisItem in myList: if thisItem == 'b': print "This item should be b: ", thisItem myNumberList = [1,2,3,4,5,6,7,8,9,10] for myNum in myNumberList: if myNum > 4: print "I am > 4 :", myNum **Example 4** *loop over lines in a file*:: Create a file called myNumbers.txt: a 1 b 2 c 3 d 4 e 5 """ James Vincent Feb 16, 2012 Week 5 - Thurs example4.py Open a text file of numbers Loop over each line in the file Conditionally print numbers from each line """ import sys # Get input file name from command line argument: # ./example4.py infile.txt myInfileName = sys.argv[1] infile = open(myInfileName) # loop over each line in the file for thisLine in infile.readlines(): print thisLine ip138067:Thurs jjv5$ ./example4.py myNumbers.txt a 1 b 2 c 3 d 4 e 5 **Example 5** *loop over lines and count scores, average of items in each line*:: Use same file myNumberstxt: a 1 b 2 c 3 d 4 e 5 #!/usr/bin/env python """ James Vincent Feb 16, 2012 Week 5 - Thurs example4.py Open a text file of numbers Loop over each line in the file Compute the average of values in the second field that are greater than 3 """ import sys # Get input file name from command line argument: # ./example4.py infile.txt myInfileName = sys.argv[1] infile = open(myInfileName) mySum = 0.0 myCount = 0 # loop over each line in the file for thisLine in infile.readlines(): # Split the ine into a list of fields myFields = thisLine.strip().split() # Stor ethe second field as an integer value thisScore = int(myFields[1]) # If the score is greater than 3, accumulate in our sum if thisScore > 3: mySum = mySum + thisScore myCount = myCount + 1 # After reading all lines print the sum and average print "Sum of all scores > 3: ", mySum print "Number of scores > 3: ",myCount print "Average of scores > 3: ",mySum/myCount *output*:: ip138067:Thurs jjv5$ ./example5.py myNumbers.txt Sum of all scores > 3: 9.0 Number of scores > 3: 2 Average of scores > 3: 4.5 Homework --------- **Reading** * We will begin using an additional python book:: http://www.openbookproject.net/thinkcs/python/english2e/index.html * Read the first three sections. These are mostly review of what we have already covered * Section 3 contains new material on functions that we have not covered in class http://www.openbookproject.net/thinkcs/python/english2e/ch01.html http://www.openbookproject.net/thinkcs/python/english2e/ch02.html http://www.openbookproject.net/thinkcs/python/english2e/ch03.html **Exercises** - complete the short exercises in each section of the reading above - if you did not complete the homework from Tues go back and finish it - contact the class TA for help if needed **Turn In** - turn in the exercises at the end of Section 3 only - put them in the proper homework directory in your home on the AWS server ( week5/Thurs) - the exercises include 3 numbered questions all using a file named tryme3.py - name the files tryme3-1.py, tryme3-2.py, tryme3-3.py for each of the three questions - the fourth exercise asks you to create a file named import_test.py. Use that file name.