msanlmh Higher dimensioal nonlinear mapping
file format: specialformats for image coordinates (SPIDER new MSA compliant)
SPIDER document file format
SEE ALSO: msa, msaimc2doc, msamap, msavismap, msanlm, ppcaem, doc2pdb
PURPOSE: Apply nonlinear mapping to coordinates created by the multivariate
statistical analysis. (msa, ppcaem, previous nlmh). doc2pdb can be
used to later look at a 3D output map using UCSF Chimera.
USAGE: msa nlmh
.Full name of image coordinate (IMC) file: nlmin_IMC
[Enter the file containing the image coordinates. The file has
the format the msa creates. Essential is the following:
Line 1 integers,
<# of coordinates>, <# of factors>, , ,
<# of coordinates(again)>, <0>.
The image dimensions are not needed for nlm.
following lines contain image coordinates, folowed by two more
numbers, followed by image id number (float), followed by "1.00"
The file is formatted with up to 10 real numbers per line. More
details may be avaliable in the SPIDER description of PCA/CA]
.factor number to use: 1-6
[Enter which factors you like to use for the nonlinear map.]
.Dimension of output map: 3
[Enter the dimension of the resulting non-linear map.At least one
dimension less than the input coordinate dimensions.]
.Doc-file for 2D output map: nlmdoc002
[Document file containing the output coordinates. Key =
image identifier number, columns = result coordinates]
.Name for IMC style 2D output: nlmout002_IMC
[Will contain the same info as the document file, but in a format
that can be further evaluated with programs to analyze msa maps,
for example visual maps. It has the same format as the output of
MSA (and the input to this program, above).]
.Name for error list output file: errordoc002
[This is a document file that contains the residual error after each
iteration.]
.Enter W to apply weights: n
[You can apply a renormalization to the factors. At this time the
weights are normalized with 1/(5sigma). This will need future
revision and tuning.]
.Magic factor1, factor2 for steepest desc. algor.: 0.7, 0.2
[Enter the "magic factor" used in the steppest descent iterations.
The default, and value recommended in the literature is 0.4.
However, it seems that when the dimensional reduction is very
large, smaller magic factors should be advantageous (e.g. 0.2 or 0.1).
Since the program allows for transition between a large and small magic
factor, analoguous to simulated annealing, two factors are given. If
both are identical, the same factor will be used for all iterations.
.Number of cooldown iterations, falloff start, total iterations: 200,20,1000
[Enter the iteration number at which the magic factor reaches the
factor2 value, followed by the number of iterations that the start
magic factor should be kept constant, followed by the total number of
iterations. In the above example that first 20 iterations will use the
magic factor 0.7,from iteration 21 to 200 the magic factor will be
reduced, linear with the iteration number to the value 0.2, after
iteration 200 the magic factor will be held at 0.2 until the final
iteration of 1000.]
.Epsilon for iteration cutoff: 0.00001
[Terminate the iterations if the error is below this value. Check this
value. The appropriate limit depends of the data set. If it is never
reached,the program stops at the maximum iterations.]
.Exponent in error measure: 0.0
[Determined the type of optimization. If the exponent is 0 then
the long distances will get highest weights (Kruskal,
Psycometrica, 29, 1964, 1 and 115). If it is 1 than the short distances
are more important (Sammon, IEEE Trans. Computers, C-18 (1969), p. 401).
In-between values may be used.) For details see M. Radermacher et al.
Ultramicroscopy 17 (1985) 117-126, and a correction of the equations in
Ultramicroscopy 19 (1986), p 75. ]
.Lower distance threshold: 0.0
[ you can specify a threshold below which coordinates are considered
identical and removed from the calculations, retaining only one of
them.]
.Start distribution type: 1
[Enter the type of start distributions. Currently implemented:
1 -> specify a coordinate pair
2 -> Random distribution
4 -> read start distribution from file
Option 3 was commented out of the current implementation. Option 3 tried
out which factor pair in 2D led to the starting map with lowest error.]
if 1 was answered:
.Factors for start distribution: 1,2
[Enter the factor pair to be used for the starting map.]
if 4 was answered:
.Document file with 2D image map: nlmdoc001
[Enter the document file, typically from a previous run of non-linear
mapping that contains coordinates used for the starting distribution.
Typically this strating map option is used, when non-linear
mapping is first applied with emphasis on the large distances and
then followed with a refinment of the small distances.]
Notes: The program was extracted from spider where it had become
obsolete in the transitions from VMS to UNIX. The program
here is a rewrite with substantial changes for better effciency
and larger data sets. nlmh in addition has new options like higher
dimension output maps and varaible magic factors in the iterations.
Programs: em_msanlm.py,mrerr.f,bdisttst.f ,bstrtdis.f,bdistlst.f
bnonlmap.f,bdist2d.f,bmrnlstrt.f (main program),bnlmrealstrt.f
bnltfile.f, mrdev.f
Remark: Advise is to first optimize the large distance in a coordinate
set and then optimize the short disances, using the large distance map
as start distribution. It can be shown that this is similar to
a manifold mapping if used in this order.(Citation needs to be looked
up).
Author(s): M. Radermacher. See Use of Nonlinear Mapping in Multivariate Image
Analysis of Molecule Projections, Ultramicroscopy `17 (1985) 117-126. (see also
Erratum since a division in the equations in the paper is printed wrong:
Ultramicroscopy, 19(1986), p75)