From Human Salivary Proteome Wiki
Concept Extraction
Many of the annotations and table fields in the wiki have a large amount of free-text. Having the ability to automatically highlight important biomedical terms in the text and then quickly see their definition can be very useful. This feature is available in the wiki through a process called Concept Extraction, which will be described in details below.
Contents |
Introduction
Over the past decade or so, a large number of ontologies to describe specific domain concepts have been created by biomedical subject matter experts. One central resource for viewing and using these ontologies is the The National Center for Biomedical Ontology (NCBO). This NIH-supported center collects and utilizes biomedical ontologies for the research community. The specific tool we use from the NCBO to "mark up" free text is the NCBO Annotator. The Ontology Lookup function in this wiki is a supplementary gateway for you to explore these concepts in further depth.
See also: Help:Ontology Lookup
Available terms
In the Human Salivary Proteome Wiki, we are using a subset of the ontologies from NCBO (shown in the list below along with the authors) that are relevant to saliva and proteomics research. You can also explore these ontologies in the Ontology Lookup tool:
- Medical Subject Headings (MSH); National Library of Medicine
- Foundational Model of Anatomy (FMA); University of Washington
- NCI Thesaurus (NCI); National Cancer Institute
- Human Disease (DOID); University of Maryland School of Medicine
- Pathway Ontology (PW); Medical College of Wisconsin
- Human Phenotype Ontology (HP); OBO Foundry
- Gene Ontology (GO); Gene Ontology Consortium
- Cell Type (CL); OBO Foundry
- Mammalian Phenotype (MP); The Jackson Laboratory
Seeing it In Action
The concept extraction feature is available on pages that use a lot of free text, including citation and protein signature pages. When available, the "Tag" tab will appear on the top of the page (see Figure 1).
Let's use the citation page PubMed:9563472 as an example to show you the the kind of information you can obtain using this feature. Simply click the "Tag" tab to start the extraction process. Once the results are returned from NCBO, concepts extracted from the text will be underlined and highlighted in yellow (see Figure 2).
See also: Help:PubMed Citations, Help:Protein Signatures
Understanding the extracted concepts
To see a list of concepts that were mapped to a particular term, hover over the highlighted area and right mouse click (or control-click on a Mac). Figure 3 shows an instance where there is one ontology that has a concept for "cysteine proteinase". As highlighted in the figure, "MSH" identified the term. As seen in the list above, the abbreviation maps to "Medical Subject Headings".
In addition to the source ontology, each mapping also lists the unique identifier of the concept, and the preferred name of the term. A score in square brackets (also highlighted in Figure 3) indicates the confidence of the mapping. Theoretically, the higher the score, the more accurate the match of the text to the ontological concept. In general you should use the highest scoring concept to get the most accurate definition.
Detailed information of the concept
To drill into the details of the concept, after right clicking, hover over the term of interest and left-click. This takes you directly to the term in the Ontology Lookup tool, where various properties of the concept and its position in the ontology are displayed (see Figure 4).