MedSearchEngine

A search engine application for medical documents, created using Apache Lucene.

Concept

Given the medlars collection, create a search engine by using Apache Lucene that will give us the most relevant documents for every query. The index and serch process must be implemented in two ways:

A simple text index/search.
An index/search using bigrams. The bigrams used were found in wikipedia's medical collection and matching bigrams found in our medlars collection.

The results of our retrieval process are evaluated using the trec_eval program.

Implementation

The implementation consists of two parts :

Part1 is used to retrieve all the bigrams found in a subset of wikipedia's medica articles. Note: some articles are not related to medical science at all, so we will get a lot of irrelevant bigrams.
Part2 uses Apache Lucene in order to create two application for indexing and searching. The first does simple search, while the second uses the bigrams found earlier that also match bigrams found in the collection(so that irrelevant bigrams do not affect our results).

Contents of the project

src : The source code folder.
bindex : A folder that contains the bigrams found in part1.
docs : A folder that an xml file with wikipedia's medical articles.
index : A folder that keeps the index created by Lucene.
lib : A folder that contains the necessary .jar files for the project.
medlars : A folder that contains the medlars collection's documents
stopwords : A folder that contains some stopwords that we want to ignore.
trec_eval : A folder that contains the results from the evaluation of the documents that we have retrieved, comparing to the correct answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedSearchEngine

Concept

Implementation

Contents of the project

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bindex		bindex
docs		docs
index		index
lib		lib
medlars		medlars
src		src
stopwords		stopwords
trec_eval		trec_eval
LICENSE.txt		LICENSE.txt
README.md		README.md

License

pitsios-s/MedSearchEngine

Folders and files

Latest commit

History

Repository files navigation

MedSearchEngine

Concept

Implementation

Contents of the project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages