msanlmh      Higher dimensioal nonlinear mapping 
 

file format: specialformats for image coordinates (SPIDER new MSA compliant)
             SPIDER document file format

SEE ALSO:  msa, msaimc2doc, msamap, msavismap, msanlm, ppcaem, doc2pdb

PURPOSE:  Apply nonlinear mapping to coordinates created by the multivariate 
	  statistical analysis. (msa, ppcaem, previous nlmh). doc2pdb can be 
	  used to later look at a 3D output map using UCSF Chimera. 

USAGE:    msa nlmh

        .Full name of image coordinate (IMC) file: nlmin_IMC
         [Enter the file containing the image coordinates. The file has
          the format the msa creates. Essential is the following:
          Line 1 integers, 
          <# of coordinates>, <# of factors>, , , 
          <# of coordinates(again)>, <0>. 
          The image dimensions are not needed for nlm.
          following lines contain image coordinates, folowed by two more
          numbers, followed by image id number (float), followed by "1.00"
          The file is formatted with up to 10 real numbers per line. More
          details may be avaliable in the SPIDER description of PCA/CA]
          

        .factor number to use: 1-6 
         [Enter which factors you like to use for the nonlinear map.]
	 
	 .Dimension of output map: 3
	 [Enter the dimension of the resulting non-linear map.At least one
	  dimension less than the input coordinate dimensions.]
	 
        .Doc-file for 2D output map: nlmdoc002
         [Document file containing the output coordinates. Key =
         image identifier number, columns = result coordinates]
         
        .Name for IMC style 2D output: nlmout002_IMC
        [Will contain the same info as the document file, but in a format
         that can be further evaluated with programs to analyze msa maps,
         for example visual maps. It has the same format as the output of 
         MSA (and the input to this program, above).]
	 
	 .Name for error list output file: errordoc002
	 [This is a document file that contains the residual error after each
	  iteration.]
         
        .Enter W to apply weights: n
         [You can apply a renormalization to the factors. At this time the
         weights are normalized with 1/(5sigma). This will need future
         revision and tuning.]

        .Magic factor1, factor2 for steepest desc. algor.: 0.7, 0.2
        [Enter the "magic factor" used in the steppest descent iterations.
         The default, and value recommended in the literature is 0.4. 
         However, it seems that when the dimensional reduction is very
         large, smaller magic factors should be advantageous (e.g. 0.2 or 0.1). 
         Since the program allows for transition between a large and small magic
	 factor, analoguous to simulated annealing, two factors are given. If
	 both are identical, the same factor will be used for all iterations.
	 
        .Number of cooldown iterations, falloff start, total iterations: 200,20,1000
         [Enter the iteration number at which the magic factor reaches the 
	  factor2 value, followed by the number of iterations that the start 
	  magic factor should be kept constant, followed by the total number of
	  iterations. In the above example that first 20 iterations will use the
	  magic factor 0.7,from iteration 21 to 200 the magic factor will be 
	  reduced, linear with the iteration number to the value 0.2, after 
	  iteration 200 the magic factor will be held at 0.2 until the final 
	  iteration of 1000.]

        .Epsilon for iteration cutoff: 0.00001 
         [Terminate the iterations if the error is below this value. Check this
          value. The appropriate limit depends of the data set. If it is never 
	  reached,the program stops at the maximum iterations.]
         
        .Exponent in error measure: 0.0
         [Determined the type of optimization. If the exponent is 0 then
          the long distances will get highest weights (Kruskal, 
          Psycometrica, 29, 1964, 1 and 115). If it is 1 than the short distances 
          are more important (Sammon, IEEE Trans. Computers, C-18 (1969), p. 401).
          In-between values may be used.) For details see M. Radermacher et al. 
	  Ultramicroscopy 17 (1985) 117-126, and a correction of the equations in
	  Ultramicroscopy 19 (1986), p 75.	  ]
          
        .Lower distance threshold: 0.0
         [ you can specify a threshold below which coordinates are considered
         identical and removed from the calculations, retaining only one of
         them.]
         
        .Start distribution type: 1
         [Enter the type of start distributions. Currently implemented:
         1 -> specify a coordinate pair
         2 -> Random distribution
         4 -> read start distribution from file
        
	 Option 3 was commented out of the current implementation. Option 3 tried
	 out which factor pair in 2D led to the starting map with lowest error.]
	 
        if 1 was answered: 
           .Factors for start distribution: 1,2
           [Enter the factor pair to be used for the starting map.]
        
       if 4 was answered:
           .Document file with 2D image map: nlmdoc001
	   [Enter the document file, typically from a previous run of non-linear
	    mapping that contains coordinates used for the starting distribution.
            Typically this strating map option is used, when non-linear 
	    mapping is first applied with emphasis on the large distances and 
	    then followed with a refinment of the small distances.]

Notes:  The program was extracted from spider where it had become
        obsolete in the transitions from VMS to UNIX. The program
        here is a rewrite with substantial changes for better effciency 
        and larger data sets. nlmh in addition has new options like higher
	dimension output maps and varaible magic factors in the iterations. 

Programs: em_msanlm.py,mrerr.f,bdisttst.f ,bstrtdis.f,bdistlst.f 
            bnonlmap.f,bdist2d.f,bmrnlstrt.f (main program),bnlmrealstrt.f 
            bnltfile.f, mrdev.f 

Remark: Advise is to first optimize the large distance in a coordinate
        set and then optimize the short disances, using the large distance map
        as start distribution. It can be shown that this is similar to
        a manifold mapping if used in this order.(Citation needs to be looked
        up). 

Author(s): M. Radermacher. See Use of Nonlinear Mapping in Multivariate Image
Analysis of Molecule Projections, Ultramicroscopy `17 (1985) 117-126. (see also
Erratum since a division in the equations in the paper is printed wrong:
Ultramicroscopy, 19(1986), p75)