WWW Entrez Help
Note! This brief guide is intended for those who are generally familiar
with database searching. If you are new to Entrez, be sure to read the
section on Special Features below.
Table of Contents:
Getting Started
WWW Entrez allows you to retrieve molecular biology data and bibliographic
citations from the NCBI's integrated
databases. These include:
-
DNA sequences from GenBank, EMBL, and DDBJ
-
Protein sequences from Swiss-Prot, PIR, PRF, PDB, and translated protein
sequences from the DNA sequence databases
-
Genome and chromosome mapping data
-
Three-dimensional protein structures derived from PDB, and incorporated
into NCBI's Molecular
Modeling Database (MMDB)
-
PubMed bibliographic
database containing citations for nearly 9 million biomedical articles
from the National Library of Medicine's
MEDLINE and pre-MEDLINE databases
Basic PubMed Search
To search PubMed without worrying about fancy features, select "Basic Search"
from the Entrez Home Page.
If you are on the PubMed home page, you already have a Basic Search form
in front of you.
You will then see a section that looks like this:
Enter the term or terms that you wish to search on, separating terms
by spaces, and press the return key or the "search" button. This will take
you immediately to the Document Summary Page,
below, where you can review the results of your search.
Finding all terms that begin with a given word
Placing an asterisk at the end of a term will cause Entrez to search for
all terms that begin with that word; for instance "bacter*" will find all
terms that begin with the letters "bacter", e.g. bacteria, bacterium, bacteriophage,
etc. Phrases that have a space in the word that occurs after the asterisk
will NOT be included; for instance, "infection*" will include "infections"
but not "infection control".
Forcing Entrez to search for a phrase
Entrez will do its best to find logical groupings in your input. For instace,
if you enter "Lipman DJ Genomics", Entrez will recognize that "Lipman DJ"
is the name of an author and will convert your search into
"Lipman DJ" AND Genomics
It may happen that Entrez fails to find a phrase that you think is vital
to a search. For instance, if you enter
brca 1
Entrez will not recognize that this is all one item and will search
for "brca" and "1" separately. Since the latter is a numeral and is not
indexed in the title and abstract fields, it will likely not find what
you want. You can circumvent this by putting quotes (") around the words
that Entrez is failing to recognize, e.g.
"brca 1"
Important!It is usually best to let Entrez do your grouping for
most accurate retrieval, and to use quotes only when Entrez has failed
to find anything because of a failure to group words properly. Forcing
Entrez to group words will often result in "no documents found". This does
not mean that the phrase you are looking for does not exist; rather, it
was not indexed as a group.
Searching for all terms that begin with a given string
All of the terms that begin with a given string can be searched on by appending
'*' to the end of the term.
For example, "baker*[auth]" would find all of the author names that
began with 'baker'.
Note! If the use of a '*' character results in too long a list of terms
to process efficiently (more than a hundred or so), Entrez will not perform
the search and will so inform you.
Searching by identifier
If you want to look up a citation or citations by identifier (MEDLINE UID,
PubMed ID, sequence GI, or the like), just enter "UID" followed by the
identifier(s) that you want. For example:
UID 88055872
Will find MEDLINE UID 88055872.
For Experts
Expert users of Entrez can, if they wish, enter a full boolean expression
in the search box. See Entering a Complex
Boolean Expression below.
All of the Advanced Search capabilities are still available in Basic
mode, they are just hidden. You can use
Advanced Search
Entering a Search Term
To search a database, select the appropriate one from the Entrez
Home Page. You will then see a screen that looks like this (the PubMed
screen is shown; the other database screens will have different fields)
:
Select
the field and mode under which you want to search, enter the term you want
to search for in the box given, and then press the Search button.
Many browsers will allow you to submit the term you want to search for
simply by pressing "return" after typing in your term. Try it.
Search Fields
There are a number of search fields available in the WWW Entrez databases.
Some of the fields are found in all five databases; others are not. Each
field contains the following information:
-
Accession contains the accession number of the sequence, assigned
to the nucleotide, protein, structure, or genome record by a sequence database
builder.
-
Affiliation contains the institutional affiliation and address of
the primary author, and sometimes of other authors.
-
Author Name contains the list of authors for a paper in the literature.
In the Protein and Nucleotide databases, the authors listed are those of
the MEDLINE articles to which a sequence is linked. The format for author
names is the last name, followed by a space and the first initial(s), without
periods. For example, Jacob F. Marley would be Marley JF; Ebenezer Scrooge
would be Scrooge E . Initials may be omitted when searching.
-
E. C. Number is a number assigned by the Enzyme Commission to designate
a particular enzyme.
-
Feature Key is a keyword denoting a particular DNA feature.
-
Gene Symbol is the standard name for a given gene. If you cannot
find a gene using Gene Symbol, try using All Fields or Text Words instead.
-
Journal Title is the name of the journal where the record was published.
Journal names are stored in the database in abbreviated form; for instance,
the Journal of Biological Chemistry is stored as J Biol Chem . If you are
not sure how a journal name is abbreviated, use List Terms mode to browse
the journal titles.
-
Keywords allows you to search using special index terms from a controlled
vocabulary associated with the GenBank, EMBL, DDBJ, SWISS-Prot, PIR, PRF,
or PDB databases. If you are not familiar with the keywords used in these
databases, this field may not be useful to you, although using List Terms
mode will let you see what the terms look like.
-
MEDLINE UID is the MEDLINE Unique Identifier of a given citation.
-
MeSH Terms includes all of the terms in the Medical Subject Headings,
a controlled vocabulary of keywords used to index MEDLINE. Each MEDLINE
citation is given a group of MeSH terms that relate to the subject of the
paper from which it is drawn. Frequently, MeSH terms will have an additional
term, called a "subheading", which further defines how the MeSH term relates
to the article it is associated with. This subheading is appended to the
MeSH term, e.g. "pneumonia diagnosis". Searching on the MeSH term (here,
pneumonia) will retrieve all of the articles that use that MeSH term, whether
they have subheadings or not. Use the subheading terms if you require more
specificity than the MeSH term allows.
Note: MeSH terms searched for using theMesh or Mesh Major Topic fields
are automatically "Exploded" by WWW Entrez; that is, all terms which are
logical subsets of the term entered are included. For instance, "pneumococcal
infections" includes "streptococcus pneumoniae" . MeSH terms found using
the "All Fields" search are NOT exploded.
-
MeSH Major Topic includes all MeSH Terms (see above) that are marked
as being of major importance to this record by the MeSH indexers.
-
Modification Date contains the date that the record was placed into
Entrez, in the format year/month/day, as for Publication Date, see below.
-
Page Number is the number of the first journal page that the article
appears on.
-
Property is one or more keywords that denote what type of sequence
this citations contains.
-
Publication Date contains the date that the article was published
(for PubMed citations) or the date that the record was added to GenBank
(for sequence records), in the format year/month/day, e.g. 1984/10/06.
A year alone, (e.g. "1984") will retrieve all articles for that year; a
year and month (e.g. "1984/03") will retrieve all for that month. records
published in a given year without regard to month, use the year by itself,
e.g. 1984.
-
PubMed ID is the PubMed Identifier of a given citation.
-
Organism contains the scientific and common names for the organisms
associated with protein and nucleotide sequences. Organism names are "exploded"
much like MeSH terms; for instance, searching on "mammalia" will find all
entries indexed under any mammal.
-
Protein Name contains the name of the protein that this sequence
is associated with. The common name of a protein may not be indexed under
this field; if you cannot find a particular protein using this field, try
All Fields or Text Words.
-
SeqId is the special string identifier, similar to a FASTA identifier,
for a given sequence.
-
Substance contains the names of any chemicals associated with this
record from the Chemical Abstract Service (CAS) registry and the MEDLINE
Name of Substance field.
-
Text Words includes all of the "free text" associated with a record,
specifically :
-
MEDLINE records: the title and abstract.
-
Protein records: the definition, comment, protein name, and protein description.
-
Nucleotide records: the definition, comment. gene name, and gene description.
-
Title Words includes only those words found in the title or definition
line of a record.
-
Volume is the number of the journal volume this article appears
in.
The Medline UID, PubMed ID and Sequence ID fields
retrieve records differently than other fields do. To use them, it, enter
one or more Unique Identifier numbers in the Term box. If you enter more
than one, separate them by spaces or commas. Select the appropriate field
(MEDLINE UID, PubMed ID, or Sequence ID), and press Search. The
entries specified will be treated as if they were a search term, and will
be referred to as {List of Articles} by Entrez.
Finding all terms that begin with a given word
Placing an asterisk at the end of a term will cause Entrez to search for
all terms that begin with that word; for instance "bacter*" will find all
terms that begin with the letters "bacter", e.g. bacteria, bacterium, bacteriophage,
etc.
Forcing Entrez to search for a phrase
Entrez will do its best to find logical groupings in your input. For instace,
if you enter "Lipman DJ Genomics", Entrez will recognize that "Lipman DJ"
is the name of an author and will convert your search into
"Lipman DJ" AND Genomics
It may happen that Entrez fails to find a phrase that you think is vital
to a search. For instance, if you enter
brca 1
Entrez will not recognize that this is all one item and will search
for "brca" and "1" separately. Since the latter is a numeral and is not
indexed in the title and abstract fields, it will likely not find what
you want. You can circumvent this by putting quotes (") around the words
that Entrez is failing to recognize, e.g.
"brca 1"
It is usually best to let Entrez do your grouping for most accurate
retrieval, and to use quotes only when Entrez has failed to find anything
because of a grouping error.
Note! If a quoted phrase is not found, that does NOT mean that
the phrase is not in the database; it usually just means that Entrez did
not recognize this as a phrase and thus did not index it. You should remove
the quotes and try again.
Expert users of Entrez can, if they wish, enter a full boolean expression
in the term box. See Entering a Complex Boolean
Expression below.
Search Modes
WWW Entrez allows you to enter terms for searching in several different
ways.
-
In List Terms mode, when you enter a term, Entrez displays the list
of available terms for that field, starting at the first term which begins
with the characters that you entered. You can then select one or more terms
to add to your search. For example, to see the text words beginning with
"pneum", you would enter "pneum" in the term box, select "Text Words" and
"List Terms", then press Search . List Terms Mode thus allows you
to browse through the terms in any given field. This can be very useful
if you are not sure how something is spelled.
-
In Automatic mode, the term or terms that you enter are immediately
added to your search. If you enter more than one word, Entrez will try
to group them appropriately into terms and add each to your query, finding
those articles that have every one of the terms. For instance, if you entered
"central nervous system", the terms "central", "nervous", and "system"
would be added to your query. If Entrez groups or fails to group the words
you entered properly, you can place one or more words in quotes (") to
force Entrez to group them as you wish.
Choosing a Term in List Terms Mode
If a term is entered in the term box using List Terms Mode and the
Search button pressed, a list of the terms that begin with the characters
entered in the term box will be presented. For instance, if "pneum" were
entered (with the field selector on "All Fields"), the resultant list might
look like this :
Available terms in the field(s): All Fields (Total Records)
After each term is the number of articles that the term appears in,
To pick one or more of the terms in the "Available Term" list, highlight
them and press Select; the terms will then be added to your search
and to the Select.
If you want to look at another list of terms altogether, simply reenter
the new term in the term box as before and press Search.
Your Chosen List of Terms
As you enter or select terms, the terms will be added to your search and
also placed into a list at the bottom of your screen; this list is called
the Chosen List. For example, if you had entered the term "pneumonia",
and then entered "cytomegalo*", the Chosen List would look like this (the
middle part of the form is omitted for brevity) :
Modify Current Query :
Term (Total Records)
Entrez automatically calculates the intersection of the terms you enter
and displays the resultant search statement at the top of the screen, calculating
the number of records to retrieve. The terms included in the search are
highlighted in the Chosen List. In the above example, there are 42 articles
that contain both the word "pneumonia" and also a word that begins with
the characters "cytomegalo" . Once you have entered terms of interest,
you can do any of the following:
-
If the number of documents is reasonably small, press the Retrieve
button to see a listing of the records your search has chosen; see Retrieving
Documents below.
-
Select and/or deselect terms in the chosen list until the terms you wish
to include in the search are highlighted, then press the Search
button. The system will then create a new search statement based upon only
the highlighted terms, according to the type of evaluation you have selected.
Here is what each of the evaluation types do:
-
Intersection (AND): only those records that contain all of
the terms specified are returned by the search. This is abbreviated to
AND in the search statement.
-
Union (OR): those records that contain any of the terms specified
are returned. This is abbreviated to OR .
-
Difference (BUTNOT): those records that contain the uppermost term
but not any of the lower terms are returned. This is abbreviated to BUTNOT
.
Terms or expressions which are combined using Modify Query
are grouped
into a single entity and placed on a separate line in the Chose List. This
permits you to combine terms flexibly in many ways.
Note that the Retrieve button will continue to retrieve your
old search until you tell the system to update your search using the Search
button in Modify Query.
Retrieving Documents
When the number of documents that satisfy your query is reasonably small,
press the Retrieve button to view them. This produces a listing
containing each document's title, author, and publication year. This listing
is called the Document Summary Page, and it detailed below.
If the number of documents that your query retrieves is large, a box
will appear indicating the maximum number of articles that will be displayed.
You can change this number to whatever is suitable. If you cannot or do
not choose to display all of the articles that your search has found, the
articles you do see will be the more recent ones in the database.
The Document Summary Page
Once you have pressed the Retrieve button, WWW Entrez will display
a listing of information on the documents that your search has found. This
permits you to browse through the retrieved list of documents easily. Once
you have determined which documents in the list are of interest, you can
view them, individually or as a group.
Viewing Documents
Each document can be viewed in any of several "formats", each of which
is good for some purpose. The best way to decide what format best suits
you for any given purpose is to experiment with them and see what they
look like. In general, "Citation" format is best for viewing MEDLINE records,
"GenPept" for viewing Protein records, and "GenBank" for viewing Nucleotide
records.
To view a single document in PubMed, select the link at the top of the
document. This will show you the document in Citation format, and allow
you to select other formats therefrom. To view a single document in the
other databases, select the format you wish to view from the choices below
the summary information.
To view several documents at once, select the documents you wish to
view by selecting their checkboxes. If you want to view all of the documents
on the page, don't select any of them. Then pick the type of report you
want from the popup box at the top of the screen and press "Display".
Viewing Formats
Viewing formats available include:
For PubMed articles:
-
Citation - The Title, Abstract, MeSH terms, and Substance information in
an article.
-
Abstract - The Title and Abstract only of an article.
-
ASN.1 - The article in ASN.1 format.
-
MEDLINE - The article in MEDLARS format.
For Protein and Nucleotide records:
-
GenBank/GenPept - The standard GenBank or GenPept flatfile.
-
Report - GenBank report format.
-
ASN.1 - ASN.1 format.
-
FASTA - FASTA format.
-
Graphic view - The graphical view of the entry, with alignment information.
For Structure records:
-
Structure Summary - Basic information about the structure. Choose this
format to view the structure in 3-D.
-
ASN.1 - ASN.1 format.
For Genome records:
-
Graphic view - The graphical view of the entry, with alignment information.
-
ASN.1 - ASN.1 format.
Saving Documents
When you view Document Reports, you will be given the option to save your
documents in a number of formats. The Macintosh/PC/UNIX popup permits you
to select the basic file format you desire, while the Text/HTML/MIME popup
modifies the output for different uses, as follows:
-
Text format removes all HTML tags and breaks lines at 80 columns.
-
HTML format leaves the HTML tags in, for use in a browser.
-
MIME format sends a file of GenBank MIME type. This is useful only
if you have a Genbank MIME viewer installed and configured properly.
Getting Document Neighbors and Links
One of the most helpful features of Entrez is the ability to find documents
which are similar to a document you are interested in. These related documents
are called neighbors. For more details on what neighbors are, how
they are calculated, and how to use them, see
Special
Features below.
To retrieve the neighbors or links for a given record or set of records,
the procedure is the same as for viewing records, above. To view a single
document's neighbors or links, view the document and select the button
at the top that indicates the type of neighbor/link that you want to see.
To view several documents' neighbors or links at once, select the documents
by pressing the checkboxes next to the documents you want (as above, select
nothing to see them all). Then select the type of neighbor or link you
want from the popup box at the top of the screen and press "Display".
Outside Links
Some Documents have links to outside resources. These will appear as buttons
at the top of the document report. They include:
-
{Journal Name} - the WWW page for this journal article.
-
OMIM - Online Mendelian Inheritance in Man, the NCBI/JHU genetics text.
-
UMBBD - The University of Minnesota Biodegradation/Biocatalysis Database.
-
AGIS - The Department of Agriculture DNA database.
-
PGR - The Plant Gene Registry.
-
MGD - The Jackson Laboratories Mouse Genome Database.
and many others.
For Experts Only
This section explains features of WWW Entrez that may be of interest to
users with very specific needs. Most users do not need to be familiar with
the items in this section.
Entering Complex Boolean Expressions
A search can be performed by specifying the terms to search, their fields,
and the boolean operations to performs on them, all at once. Use the following
syntax :
term [field] operatorterm [field]
....(etc)
term is the term string that you wish to search on.
Field is an Entrez Field designation, which can be:
-
for PubMed : one of AFFL, ALL, AUTH, ECNO, JOUR, MESH, MAJR, PAGE, PDAT,
PTYP, SUBS, TITL, WORD, or VOL.
-
for Protein : one of ACCN, AUTH, ECNO, GENE, JOUR, KYWD, MDAT, ORGN, PDAT,
PROP, PROT, SQID, SLEN, SUBS, or WORD.
-
for Nucleotide : one of ACCN, AUTH, ECNO, FKEY, GENE, JOUR, KYWD, MDAT,
ORGN, PDAT, PROP, PROT, SQID, SLEN, SUBS, or WORD.
-
for Structure : one of ACCN, AUTH, JOUR, SUBS, or WORD.
-
for Genomes : one of ACCN, AUTH, ECNO, GENE, JOUR, ORGN, PROP, PROT, or
WORD.
where WORD = text word, TITL = title word, MESH = mesh term, MAJR = MeSH
major topic, AUTH = author name, JOUR = journal name, ECNO = E.C. Number,
GENE = gene name, DATE = publication year, PDAT = publication/creation
date, MDAT = modification date, PAGE = first page, VOL = volume, KYWD =
Keyword, ORGN = organism, ACCN = accession number, PROT = protein name,
SUBS = substance, PROP = property, FKEY = feature key, and PTYP = publication
type.
operator is any of :
-
AND (intersection)
-
OR (union)
-
BUTNOT (difference).
Note : Boolean Expressions are normally processed left to right. If you
wish part of your boolean expression to be processed out of order, enclose
it in parentheses.
An Example of a boolean expression : Find the articles in the Journal
of Biological Chemistry that contain the term "p21" in their text :
Specifying A Range of Terms
Another special expression is the range. You may use the syntax:
term1:term2
To specify all of the terms in the term list for a given
field from term1 to term2, inclusive. For instance, to find
all Protein entries that have a sequence length between 19,000 and 20,000
bases, you would go to the protein database, select the "sequence length"
field, and enter:
019000:020000
The leading zero is necessary because the sequence length terms
are all six-digit integers. When in doubt, use "List terms" to see the
terms in a list; the range operator will use the terms in the order that
they appear.
Special Features
What makes Entrez more powerful than many services is that most of its
records are linked to other records, both within a given database (such
as PubMed) and between databases. Links within a database are called "neighbors".
PubMed neighbors are determined by comparing the Text and MeSH terms
of each article, using a powerful algorithm that determines just how well
the article matches every other article. The best matches for any article
are saved, and you can retrieve them using the "Related Articles" button
at the top of the article report.
Protein and Nucleotide neighbors are determined by performing similarity
searches using the algorithm BLAST on the amino acid or DNA sequence in
the entry and the results saved as above.
What this means is that if you find one or a few documents that match
what you are looking for, pressing the "Related Articles/Sequences" button
will find a great many more documents that are likely to be relevant, in
order from most useful to least. This allows you to find what you want
with much greater speed and accuracy: instead of having to flip through
thousands of documents to assure yourself that nothing germane to your
query was missed, you can find just a few, then look at their neighbors.
Try this feature out and see how it works for you; you may well wonder
how you got along without it!
In addition, some documents are linked to others for reasons other than
computed similarity. For instance, if a protein sequence was published
in a PubMed article, the two will be linked to one another.
How to use the WWW Entrez Genome Viewer
The WWW Entrez Genome database takes you to a graphic view that can be
used to find the specific area of a genome that you are interested in and
view its component sequences. here are detailed instructions
on how to use these features.
How to use the WWW Entrez Structure Viewer
The WWW Entrez Structure database takes you to a summary page that can
be used to load the 3-D structure that you are want into a viewer in order
to manipulate it. Here is a description
of the MMDB structure database and instructions on how to do this.
For More Assistance
If you have found a bug or are still confused, please e-mail to the NCBI
Help Desk and we will be happy to assist you.
Thanks!
Credits: Brandon Brylawski