.. _week_Nine:
***************************
Week Nine 13-Mar-12
***************************
=======
Tuesday
=======
.. note:: Learning Goals:
* Parse BLAST output to give only GI number
* Use python Counter class
* Install BioPython
|
|
Video
-----
.. raw:: html
|
|
Lecture
---------
**User defined functions for parsing BLAST output**::
#!/usr/bin/env python
"""
James Vincent
Mar 13 , 2012
parseBlast.py
Open a text file
loop over lines
split lines into fields
Sum numbers from certain field
"""
import sys
def getScore(blastLine):
""" parse blast output line and return score """
# BLAST input file has hit lines like this:
# fmt "6 qseqid sseqid evalue bitscore"
# 1 gi|219856848|ref|NR_024667.1| 0.0 2551
myFields = blastLine.strip().split()
thisScore = float(myFields[3])
return thisScore
def getGI(blastLine):
""" parse blast output line and return GI of hit """
# blast line looks like this:
# 1 gi|219856848|ref|NR_024667.1| 0.0 2551
# spit line into fields by white space
myFields = blastLine.strip().split()
# split subject (hit) ID field on |
myIDs = myFields[1].split("|")
thisGI= int(myIDs[1])
return thisGI
# Get file name
myInfileName = sys.argv[1]
infile = open(myInfileName)
mySum = 0.0
myCount = 0
# loop over each line in the file
for thisLine in infile.readlines():
thisScore = getScore(thisLine)
thisGI = getGI(thisLine)
print thisGI,thisScore
# Accumulate scores greater than 3
if thisScore > 2600:
# accumulate scores
mySum = mySum + thisScore
# count number of scores matching
myCount = myCount + 1
# Print sum, count and average
print "Sum is: ",mySum
print "Count is: ",myCount
if myCount > 0:
print "Average is: ",mySum/myCount
else:
print "No scores"
**How to use python Counter**::
#!/usr/bin/env python
"""
James Vincent
Mar 13 , 2012
myCounter.py
Demonstrate use of python Counter
http://docs.python.org/library/collections.html
"""
import sys
from collections import Counter
myCounter = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue','red', 'blue', 'red', 'green', 'blue', 'blue']:
myCounter[word] += 1
print myCounter
print "Top two most common: ", myCounter.most_common(2)
# Sum of all values:
print sum(myCounter.values())
.. NOTE:: wget http://biopython.org/DIST/biopython-1.59.tar.gz
**Install BioPython**::
login1$ cd $WORK
login1$ mkdir Software
login1$ cd Software
login1$ wget http://biopython.org/DIST/biopython-1.59.tar.gz
--2012-03-13 04:48:40-- http://biopython.org/DIST/biopython-1.59.tar.gz
Resolving biopython.org... 208.94.50.58
Connecting to biopython.org|208.94.50.58|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8577492 (8.2M) [application/x-gzip]
Saving to: `biopython-1.59.tar.gz'
100%[================================================================>] 8,577,492 1.42M/s in 6.1s
2012-03-13 04:48:46 (1.34 MB/s) - `biopython-1.59.tar.gz' saved [8577492/8577492]
login1$ tar zxf biopython-1.58.tar.gz
login1$ cd biopython-1.58
login1$ module load python
login1$ python ./setup.py install --user
running install
running build
running build_py
running build_ext
running install_lib
running install_egg_info
Removing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info
Writing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info
login1$
login1$ python
Python 2.7.1 (r271:86832, Sep 20 2011, 09:42:22)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez
>>>
Homework
--------
**Reading**
- http://docs.python.org/library/collections.html
- http://docs.python.org/tutorial/controlflow.html#defining-functions
**Exercises**
- Follow the exercises in the reading above. Complete as many of them as you can
**Turn In**
Write a python program to parse BLAST table formatted output:
* write a function to parse each line and return the GI
* use a Counter object to count how many times each GI number was seen
* print the top five GI numbers and their counts at the end of the program
* use week9.fa as the query file for BLAST
* You can download week9.fa from this link ::
http://bit.ly/yoK27C
You can copy it directly to your lonestar account with the wget
program. Log in to lonestar and move to a directory where you
would like to place the file, then call wget:
wget -O
Example:
login1$ cd /work/00921/tg801771/JSCBIO2710/lectures/week9/Thurs/
login1$ wget http://bit.ly/yoK27C -O week9.fa