.. _week_Nine: *************************** Week Nine 13-Mar-12 *************************** ======= Tuesday ======= .. note:: Learning Goals: * Parse BLAST output to give only GI number * Use python Counter class * Install BioPython | | Video ----- .. raw:: html | | Lecture --------- **User defined functions for parsing BLAST output**:: #!/usr/bin/env python """ James Vincent Mar 13 , 2012 parseBlast.py Open a text file loop over lines split lines into fields Sum numbers from certain field """ import sys def getScore(blastLine): """ parse blast output line and return score """ # BLAST input file has hit lines like this: # fmt "6 qseqid sseqid evalue bitscore" # 1 gi|219856848|ref|NR_024667.1| 0.0 2551 myFields = blastLine.strip().split() thisScore = float(myFields[3]) return thisScore def getGI(blastLine): """ parse blast output line and return GI of hit """ # blast line looks like this: # 1 gi|219856848|ref|NR_024667.1| 0.0 2551 # spit line into fields by white space myFields = blastLine.strip().split() # split subject (hit) ID field on | myIDs = myFields[1].split("|") thisGI= int(myIDs[1]) return thisGI # Get file name myInfileName = sys.argv[1] infile = open(myInfileName) mySum = 0.0 myCount = 0 # loop over each line in the file for thisLine in infile.readlines(): thisScore = getScore(thisLine) thisGI = getGI(thisLine) print thisGI,thisScore # Accumulate scores greater than 3 if thisScore > 2600: # accumulate scores mySum = mySum + thisScore # count number of scores matching myCount = myCount + 1 # Print sum, count and average print "Sum is: ",mySum print "Count is: ",myCount if myCount > 0: print "Average is: ",mySum/myCount else: print "No scores" **How to use python Counter**:: #!/usr/bin/env python """ James Vincent Mar 13 , 2012 myCounter.py Demonstrate use of python Counter http://docs.python.org/library/collections.html """ import sys from collections import Counter myCounter = Counter() for word in ['red', 'blue', 'red', 'green', 'blue', 'blue','red', 'blue', 'red', 'green', 'blue', 'blue']: myCounter[word] += 1 print myCounter print "Top two most common: ", myCounter.most_common(2) # Sum of all values: print sum(myCounter.values()) .. NOTE:: wget http://biopython.org/DIST/biopython-1.59.tar.gz **Install BioPython**:: login1$ cd $WORK login1$ mkdir Software login1$ cd Software login1$ wget http://biopython.org/DIST/biopython-1.59.tar.gz --2012-03-13 04:48:40-- http://biopython.org/DIST/biopython-1.59.tar.gz Resolving biopython.org... 208.94.50.58 Connecting to biopython.org|208.94.50.58|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 8577492 (8.2M) [application/x-gzip] Saving to: `biopython-1.59.tar.gz' 100%[================================================================>] 8,577,492 1.42M/s in 6.1s 2012-03-13 04:48:46 (1.34 MB/s) - `biopython-1.59.tar.gz' saved [8577492/8577492] login1$ tar zxf biopython-1.58.tar.gz login1$ cd biopython-1.58 login1$ module load python login1$ python ./setup.py install --user running install running build running build_py running build_ext running install_lib running install_egg_info Removing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info Writing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info login1$ login1$ python Python 2.7.1 (r271:86832, Sep 20 2011, 09:42:22) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import Entrez >>> Homework -------- **Reading** - http://docs.python.org/library/collections.html - http://docs.python.org/tutorial/controlflow.html#defining-functions **Exercises** - Follow the exercises in the reading above. Complete as many of them as you can **Turn In** Write a python program to parse BLAST table formatted output: * write a function to parse each line and return the GI * use a Counter object to count how many times each GI number was seen * print the top five GI numbers and their counts at the end of the program * use week9.fa as the query file for BLAST * You can download week9.fa from this link :: http://bit.ly/yoK27C You can copy it directly to your lonestar account with the wget program. Log in to lonestar and move to a directory where you would like to place the file, then call wget: wget -O Example: login1$ cd /work/00921/tg801771/JSCBIO2710/lectures/week9/Thurs/ login1$ wget http://bit.ly/yoK27C -O week9.fa