msanlmh Higher dimensioal nonlinear mapping
file format: specialformats for image coordinates (SPIDER new MSA compliant) SPIDER document file format SEE ALSO: msa, msaimc2doc, msamap, msavismap, msanlm, ppcaem, doc2pdb PURPOSE: Apply nonlinear mapping to coordinates created by the multivariate statistical analysis. (msa, ppcaem, previous nlmh). doc2pdb can be used to later look at a 3D output map using UCSF Chimera. USAGE: msa nlmh .Full name of image coordinate (IMC) file: nlmin_IMC [Enter the file containing the image coordinates. The file has the format the msa creates. Essential is the following: Line 1 integers, <# of coordinates>, <# of factors>,, , <# of coordinates(again)>, <0>. The image dimensions are not needed for nlm. following lines contain image coordinates, folowed by two more numbers, followed by image id number (float), followed by "1.00" The file is formatted with up to 10 real numbers per line. More details may be avaliable in the SPIDER description of PCA/CA] .factor number to use: 1-6 [Enter which factors you like to use for the nonlinear map.] .Dimension of output map: 3 [Enter the dimension of the resulting non-linear map.At least one dimension less than the input coordinate dimensions.] .Doc-file for 2D output map: nlmdoc002 [Document file containing the output coordinates. Key = image identifier number, columns = result coordinates] .Name for IMC style 2D output: nlmout002_IMC [Will contain the same info as the document file, but in a format that can be further evaluated with programs to analyze msa maps, for example visual maps. It has the same format as the output of MSA (and the input to this program, above).] .Name for error list output file: errordoc002 [This is a document file that contains the residual error after each iteration.] .Enter W to apply weights: n [You can apply a renormalization to the factors. At this time the weights are normalized with 1/(5sigma). This will need future revision and tuning.] .Magic factor1, factor2 for steepest desc. algor.: 0.7, 0.2 [Enter the "magic factor" used in the steppest descent iterations. The default, and value recommended in the literature is 0.4. However, it seems that when the dimensional reduction is very large, smaller magic factors should be advantageous (e.g. 0.2 or 0.1). Since the program allows for transition between a large and small magic factor, analoguous to simulated annealing, two factors are given. If both are identical, the same factor will be used for all iterations. .Number of cooldown iterations, falloff start, total iterations: 200,20,1000 [Enter the iteration number at which the magic factor reaches the factor2 value, followed by the number of iterations that the start magic factor should be kept constant, followed by the total number of iterations. In the above example that first 20 iterations will use the magic factor 0.7,from iteration 21 to 200 the magic factor will be reduced, linear with the iteration number to the value 0.2, after iteration 200 the magic factor will be held at 0.2 until the final iteration of 1000.] .Epsilon for iteration cutoff: 0.00001 [Terminate the iterations if the error is below this value. Check this value. The appropriate limit depends of the data set. If it is never reached,the program stops at the maximum iterations.] .Exponent in error measure: 0.0 [Determined the type of optimization. If the exponent is 0 then the long distances will get highest weights (Kruskal, Psycometrica, 29, 1964, 1 and 115). If it is 1 than the short distances are more important (Sammon, IEEE Trans. Computers, C-18 (1969), p. 401). In-between values may be used.) For details see M. Radermacher et al. Ultramicroscopy 17 (1985) 117-126, and a correction of the equations in Ultramicroscopy 19 (1986), p 75. ] .Lower distance threshold: 0.0 [ you can specify a threshold below which coordinates are considered identical and removed from the calculations, retaining only one of them.] .Start distribution type: 1 [Enter the type of start distributions. Currently implemented: 1 -> specify a coordinate pair 2 -> Random distribution 4 -> read start distribution from file Option 3 was commented out of the current implementation. Option 3 tried out which factor pair in 2D led to the starting map with lowest error.] if 1 was answered: .Factors for start distribution: 1,2 [Enter the factor pair to be used for the starting map.] if 4 was answered: .Document file with 2D image map: nlmdoc001 [Enter the document file, typically from a previous run of non-linear mapping that contains coordinates used for the starting distribution. Typically this strating map option is used, when non-linear mapping is first applied with emphasis on the large distances and then followed with a refinment of the small distances.] Notes: The program was extracted from spider where it had become obsolete in the transitions from VMS to UNIX. The program here is a rewrite with substantial changes for better effciency and larger data sets. nlmh in addition has new options like higher dimension output maps and varaible magic factors in the iterations. Programs: em_msanlm.py,mrerr.f,bdisttst.f ,bstrtdis.f,bdistlst.f bnonlmap.f,bdist2d.f,bmrnlstrt.f (main program),bnlmrealstrt.f bnltfile.f, mrdev.f Remark: Advise is to first optimize the large distance in a coordinate set and then optimize the short disances, using the large distance map as start distribution. It can be shown that this is similar to a manifold mapping if used in this order.(Citation needs to be looked up). Author(s): M. Radermacher. See Use of Nonlinear Mapping in Multivariate Image Analysis of Molecule Projections, Ultramicroscopy `17 (1985) 117-126. (see also Erratum since a division in the equations in the paper is printed wrong: Ultramicroscopy, 19(1986), p75)