.. _week_Eight: *************************** Week Eight 6-Mar-12 *************************** ======= Tuesday ======= .. note:: Learning Goals: * Log in to lonestar.tacc.teragrid.org * recreate working directories * submit a job through qsub | | | | Video ------ .. raw:: html | | | Lecture --------- **Texas Advanced Computing Center: TACC** **lonestar.tacc.teragrid.org** Log in to the TACC lonestar cluster lonestar.tacc.teragrid.org:: You should have received login details from XSEDE for your new account. jjv5$ ssh tg801771@lonestar.tacc.teragrid.org Make sure we are using the bash shell:: login1$ echo $SHELL /bin/bash # If needed we can change the defualt shell to bash: login1$ chsh -l /bin/sh /bin/bash /sbin/nologin /bin/tcsh /bin/csh /bin/ksh /bin/zsh | | | | **Recreate directory structure** .. IMPORTANT:: All files should be placed in $WORK directory Create directory in $WORK:: login2$ cd $WORK login2$ mkdir quiz homework projects login2$ ls homework projects quiz Create any other directories as needed | | | | **Transfer files from AWS EC2 server to lonestar** Open a second terminal window:: # Log in to the EC2 server $ ssh ec2-23-20-18-242.compute-1.amazonaws.com jjv5@ec2-23-20-18-242.compute-1.amazonaws.com's password: $ cd lectures/ $ ls week5 $ cd week5/ $ ls Thurs Tues $ cd Thurs/ $ ls # use sftp to connect to lonestar $ sftp tg801771@lonestar.tacc.teragrid.org Connecting to lonestar.tacc.teragrid.org... The authenticity of host 'lonestar.tacc.teragrid.org (129.114.53.21)' can't be established. RSA key fingerprint is 5c:36:42:99:aa:2d:52:58:70:3a:20:c2:3a:33:e4:2f. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'lonestar.tacc.teragrid.org,129.114.53.21' (RSA) to the list of known hosts. Password: # transfer files as needed sftp> cd lectures sftp> cd week5 sftp> cd Thurs sftp> lls 4 example2.py example4.py myNumbers.txt runBlast.sh week5.fa.blast.asn example1.py example3.py example5.py parseBlast.py week5.fa sftp> put runBlast.sh Uploading runBlast.sh to /home1/00921/tg801771/lectures/week5/Thurs/runBlast.sh runBlast.sh 100% 815 0.8KB/s 00:00 sftp> | | | | **Use scp to transfer whole directories** .. Note:: ftp (sftp) clients generally do not have a recursive option. It is difficult to transfer entire directories with an interactive ftp client. Other methods include making a single tar file containing all files or using a transfer method that does support recursion. wget, curl and scp support recursion. For moving large files, Globus Online is preferred: https://www.globusonline.org/ Secure copy (scp) can recursively copy whole directories:: ip138067:~ jjv5$ ssh tg801771@lonestar.tacc.teragrid.org Password: Last login: Tue Mar 6 03:51:47 2012 from ip138067.uvm.edu ------------------------------------------------------------------------------ Welcome to the Lonestar4 Westmere/QDR IB Linux Cluster Texas Advanced Computing Center, The University of Texas at Austin ------------------------ Disk quotas for user tg801771 ------------------------ | Disk Usage (GB) Limit %Used File Usage Limit %Used | | /home1 1.1 1.1 98.11 1300 1001000 0.13 | | /work 40.4 250.0 16.15 58255 500000 11.65 | ------------------------------------------------------------------------------- login1$ login1$ cd $WORK login1$ scp -r jjv5@ec2-23-20-18-242.compute-1.amazonaws.com:homework . jjv5@ec2-23-20-18-242.compute-1.amazonaws.com's password: .. WARNING:: scp will overwrite files by default, without warning scp can be used to transfer files in either direction:: scp [[user@]host1:]file1 [...] [user@]host2:]file2 From this host, directory mydir, to other host: scp -r mydir user@otherhost:/tmp From remote host, directory mydir, to here: scp -r user@otherhost:mydir . | | | | **Create a job script** Create the script runHello.sh shown below:: #!/bin/bash #$ -pe 1way 12 # 12 cores per node - must take them all #$ -q development # Queue name #$ -N helloWorld #$ -A TG-MCB120034 #$ -V # inherit submission env #$ -j y # combine stderr & stdout into stdout #$ -o $JOB_NAME.o$JOB_ID # Name of the output file (eg. myMPI.oJobID) #$ -l h_rt=00:05:00 # Run time (hh:mm:ss) #$ -M jjv5.jjv5@gmail.com #$ -m bea echo "Hello, I am running" date hostname **Submit the job to the development queue** The queue is specifiec in the job script itself:: qsub runHello.sh Monitor the job with th qstat command:: login2$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 479531 0.00000 helloWorld tg801771 qw 02/28/2012 05:10:32 12 Homework --------- **Reading** - http://www.tacc.utexas.edu/user-services/user-guides/lonestar-user-guide - read the first several sections through File Systems - pay attention to the File Systems section description of $WORK,$HOME,$SCRATCH - read the man page on qsub and qstat **Exercises** **Turn In** 1. transfer file from AWS EC2 server to lonestar | put everything in your $WORK directory | bash shell scripts | python programs | 16S blast database 2. create a shell script to run on lonestar using qsub that does the following | cd to your $WORK directory | list all files | print the date - be sure to use the proper qsub options and resource specifications in your script - use the development queue to make sure the job runs properly - when you are sure it runs correctly, change the queue name to 'normal' - use qstat to monitor how long it takes before your job runs in the 'normal' queue - leave the script in your $WORK/week_8/Tues homework folder on lonestar 3. Create a jobs script on lonestar that runs a BLAST job - use week5.fa file (from AWS EC2 server) as input query file - use 16SMicrobial database as the database - leave the script and output in your $WORK/week_8/Tues homework folder on lonestar | | | ======== Thursday ======== .. note:: Learning Goals: * Create a complete qsub script for a BLAST job * Parse BLAST output with a python program using functions * Resubmit entire job with different parameters | | | | Lecture --------- | | **Create a job script** Add basic qsub parameters to an otherwise empty script:: #!/bin/bash #$ -V # inherit shell environment #$ -l h_rt=00:05:00 # wall time limit #$ -q development # run in dev q #$ -pe 1way 12 #$ -A TG-MCB120034 #$ -N Hello #$ -cwd #$ -j y #$ -M jjv5.jjv5@gmail.com # Mail address #$ -m bea # send mail when job starts, stops or aborts #module load blast echo "Hello" | | | .. WARNING:: qsub recognizes #$ as meaningful. Make sure your commented lines do not begin with #$. For example: #$BLASTN -db ..... will cause qsub to interpret the line as an option string and thus fail. Put a space after the # to correct: # $BLASTN -db ... Add comments describing tasks and variables needed:: #!/bin/bash #$ -V # inherit shell environment #$ -l h_rt=00:05:00 # wall time limit #$ -q development # run in dev q #$ -pe 1way 12 #$ -A TG-MCB120034 #$ -N Hello #$ -cwd #$ -j y #------------------------ # # James Vincent # March 8, 2012 # # Run blast on week5.fa vs 16SMicrobial database # Reformat output to include Query seq-id, subject seq-id, score and e-value # #------------------------ # BLAST programs and variables # TACC lonestar uses module system to provide blast module load blast # Database DB=$WORK/JSCBIO2710/blast/databases/16SMicrobial # Query QUERY=week5.fa OUTFILE=$QUERY.blast.asn # BLAST output format: 11 is ASN, 6 is table no header OUTFMT=11 # BLAST programs loaded by module command BLASTN=blastn BLAST_FORMATTER=blast_formatter BLASTDBCMD=blastdbcmd # Run blast # $BLASTN -db $DB -query $QUERY -outfmt $OUTFMT -out $OUTFILE # Reformat ASN to hit custom hit table # $BLAST_FORMATTER -archive $OUTFILE -outfmt "6 qseqid sseqid evalue bitscore" -out $OUTFILE.table # Parse BLAST output with python program to get best hits # myParser.py $OUTFILE.table echo "Hello" | | | **Create python script to parse BAST table output**:: #!/usr/bin/env python """ James Vincent Mar 8 , 2012 parseBlast.py Open a text file loop over lines split lines into fields Sum numbers from certain field """ import sys # Get file name myInfileName = sys.argv[1] infile = open(myInfileName) mySum = 0.0 myCount = 0 # loop over each line in the file for thisLine in infile.readlines(): # BLAST input file has hit lines like this: # fmt "6 qseqid sseqid evalue bitscore" # 1 gi|219856848|ref|NR_024667.1| 0.0 2551 myFields = thisLine.strip().split() thisScore = int(myFields[3]) # Accumulate scores greater than 3 if thisScore > 2600: # accumulate scores mySum = mySum + thisScore # count number of scores matching myCount = myCount + 1 # Print sum, count and average print "Sum is: ",mySum print "Count is: ",myCount print "Average is: ",mySum/myCount | | | **Create function to return score**:: #!/usr/bin/env python """ James Vincent Mar 8 , 2012 parseBlast.py Open a text file loop over lines split lines into fields Sum numbers from certain field """ import sys def getScore(blastLine): """ parse blast output line and return score """ # BLAST input file has hit lines like this: # fmt "6 qseqid sseqid evalue bitscore" # 1 gi|219856848|ref|NR_024667.1| 0.0 2551 myFields = blastLine.strip().split() thisScore = int(myFields[3]) return thisScore # Get file name myInfileName = sys.argv[1] infile = open(myInfileName) mySum = 0.0 myCount = 0 # loop over each line in the file for thisLine in infile.readlines(): thisScore = getScore(thisLine) # Accumulate scores greater than 3 if thisScore > 2600: # accumulate scores mySum = mySum + thisScore # count number of scores matching myCount = myCount + 1 # Print sum, count and average print "Sum is: ",mySum print "Count is: ",myCount print "Average is: ",mySum/myCount Homework -------- **Reading** - Go back through the course web pages for this week and last - Review the python documentation for the split() method of strings: - http://docs.python.org/library/stdtypes.html#string-methods **Exercises** - Make sure you can write the shell scripts and python programs that we did in class - You should be able to write complete python programs from scratch - You should be able to write complete qsub scripts that work with some copying of qsub parameters **Turn In** - Modify the parseBlast.py python program (last program shown above) - Add a function to return just the GI number from each line of BLAST output