About the Search Engine

The KTMS Search Engine is an adaptation of the htgrep search engine written by Oscar Nierstrasz of the University of Berne (Switzerland). It is written in Perl (Practical Extraction and Reporting Language -- Larry Wall's ubiquitous scripting language). The following information is excerpted from the htgrep online documentation.

How to enter search queries

The search query can either be one or more words (a simple search) separated by spaces, or a boolean expression. By default, all searches are performed in a case insensitive manner, for example, entering "HOUSE" is identical to entering "house", "House" or even "HoUSe". (If necessary, it is now possible to configure htgrep to understand case sensitive queries.)

Simple Searches

Enter a single word to find any search record that contains the exact whole word entered. For example, the search entry "world" would find records containing the word "world", but not "worldwide". If you enter more than one word, it will find entries containing all of the words you entered. For example, "world economy" will find entries containing both the word "world" and the word "economy" (but not necessarily next to each other or in that order).

To find parts of words, use an asterisk (*) to represent missing parts of the word. For example, if you enter "world*" it will match "worldwide", "worlds", etc. Similarly, "*world" would find "underworld", etc.

Boolean Searches

For more control over the search query, you can use a boolean expression. If you enter the word or between two search words (with a space between each word and the "or") it will find any record which contains either the first word, or the second word, or both. For example, "apple or orange" would find records containing the word "apple" or the word "orange", or both.

If instead of the word or you entered and it would match only records which contained both the word "apple" and the word "orange". Note that this would be the same as a simple search for "apple orange" because if the boolean commands are omitted, it defaults to assuming an and between each search word.

To find records which do not contain a particular word, place the word not before it. For example, "not blue" would find all the records which do not contain the word "blue". You can combine the "and", "or" and "not" commands, for example "apple and not red" would find records containing the word apple but not the word red.

For advanced use, you can use brackets to group the expression. For example, "apple and (red or green)" would find all records containing the word "apple" and either "red" or "green" (or both). If the brackets are omitted, the and command has higher precedence, so "apple and red or green" would find all records contain "apple" and "red", and also records containing "green".


KTMS Datasets