bIOpORtAL
Where resources come together
The GRaPH-Int bioportal search engine is an online tool that will help policymakers, researchers and public health professionals search the websites and databases of HumGen International, the PHG Foundation and the CDC's Office of Public Health Genomics. The aim of the GRaPH-Int bioportal search engine is to optimize searches on the ethical, legal and social issues surrounding public health genomics while providing the latest public health genomics literature; epidemiological data; and public health genomics news and events.
The following tips can help you refine your search technique to make the most of your searches. Simply click on underlined terms for a description of search options.
RANKING
BioPortal scoring is based on Apache Lucene scoring. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
BioPortal scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User's query. In general, the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. It uses the Boolean model to first narrow down the documents that need to be scored based on the use of boolean logic in the Query specification. BioPortal also adds some capabilities and refinements onto this model to support boolean and fuzzy searching, but it essentially remains a VSM based system at the heart.
In BioPortal, the objects we are scoring are Documents. A Document is a collection of Fields. Each Field has semantics about how it is created and stored. It is important to note that BioPortal scoring works on Fields and then combines the results to return Documents. This is important because two Documents with the exact same content, but one having the content in two Fields and the other in one Field will return different scores for the same query due to length normalization.
Each field has a value of one (1) for the scoring algorithm. BioPortal is boosting some fields to get even more relevant results. Those fields are :
- Title: value of 5
- Reference: value of 3
- Author: value of 3
- Event Organization: value of 3
- Abstract: value of 1.5
ADVANCED SEARCH
Wildcard Searches
BioPortal supports single and multiple character wildcard searches within single terms (not within phrase queries).
- To perform a single character wildcard search use the "?" symbol.
- To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search: te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search: test*
You can also use the wildcard searches in the middle of a term.
te*t
Note: You cannot use a * or ? symbol as the first character of a search.
Fuzzy Searches
BioPortal supports fuzzy searches based on the Levenshtein Distance. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "Humgen" use the fuzzy search: Humgen~
This search will find terms like Human.
An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example: Humgen~0.2
The default that is used if the parameter is not given is 0.5.
Boosting a Term
BioPortal provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for tuberculosis epidemic and you want the term "tuberculosis" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type: tuberculosis^4 epidemic
This will make documents with the term tuberculosis appear more relevant.
By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)
Boolean Operators
Boolean operators allow terms to be combined through logic operators. BioPortal supports AND, "+", OR, NOT and "-" as Boolean operators (Note: Boolean operators must be ALL CAPS).
The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
To search for documents that contain either "tuberculosis epidemic" or just "tuberculosis" use the query: "tuberculosis epidemic" tuberculosis or "tuberculosis epidemic" OR tuberculosis
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.
To search for documents that contain "National Bioethics" and "Advisory Commission" use the query: "National Bioethics" AND "Advisory Commission"
The "+" or required operator requires that the term after the "+" symbol exist somewhere in a the field of a single document. To search for documents that must contain "tuberculosis" and may contain "epidemic" use the query: +tuberculosis epidemic
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.
To search for documents that contain "National Bioethics" but not "Advisory Commission" use the query: "National Bioethics" NOT "Advisory Commission"
Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT "Advisory Commission"
The "-" or prohibit operator excludes documents that contain the term after the "-" symbol. To search for documents that contain "National Bioethics" but not "Advisory Commission" use the query: "National Bioethics" -"Advisory Commission"
Grouping
BioPortal supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query. To search for either "tuberculosis" or "epidemic" and "virus" use the query: (tuberculosis OR epidemic) AND virus
This eliminates any confusion and makes sure you that website must exist and either term tuberculosis or epidemic may exist.


