remove outliers   Remove outliers from document file columns  
 

file format: SPIDER document file

PURPOSE:  Remove outliers from columns in a document file. The program was
          to sort through the coorinates created by correspondence analysis
          or PCA. Since real outlier images have a major influence on the 
          direction of the eigenvectors, it is important to remove them. Be
          carefull if you are removing more that 10-20% of your images. You 
          may be removing a complete cluster. Best is to cofirm the removal 
          by checking the factor map for the location of these particles. 
          Since the output file contains the information, which coordinate
          caused the removal it is easy to look at the corresponding map
          to visually confirm this.
          
USAGE:    remove outliers

          .Input coordinate docfile: imccoord001
          [Enter the name of the document file that you want to
           process.]

          .Output selection docfile: removelines001
          [Enter name of output document file. This will have the keys
          of the lines to be removes, followed by a 0, followed by the
          input column number that was the reason for the removal. Keys
          may be occuring multiple times, if the exclusion is based on
          multiple columns. This file can be appended to a typical 
          0/1 selection document file, or, if such an existing selection
          file is entered as output name, it will be appended.]
          
          .Output doc file format (0=new,default,1=old): 1
          [Enter the format of the output doc file. default is 
           0 (= new format). option 1 added for compatibility with SPIDER
           version 5.0].

          .Columns to include: 2-4,6
          [Enter which columns of theinput document file should be checked for
          outliers.] 

          .Sigma multiplier for threshold: 3.3
          [Enter the factor by which the standard deviation of a column is
          multiplied to determine outliers. Outliera are those that have
          values smaller than average-factor*sigma or larger than
          average_factor*sigma.] 
          
          .Number of columns to write to the output docfile: 3
          [The minimum number is 1, which will only write the 0 to the
          output file. 2 will also wrte the column that caused the out
          lier. Anything larger will be 0s. The reason to put multiple 
          columns is because the selection file used may have extra 
          information in each line, and adding a shorter line could 
          create problems in reading the file later.]
          
        
Programs: em_removeoutliers.py, doceliminate.f

Author(s): M. Radermacher