Skip to content

Eroica-cpp/DistSearch

Repository files navigation

Campus Search Engine

An open sourse distributed search engine instance based on Apache Nutch and Apache Solr.

Introduction

This project was first derived from a course project of Parallel Algorithm in SZU. We retrieve and integrate information for the benefits of students and faculty in SZU. We also apply some machine learning and data mining skills, especially recommender system techniques to enhance user experience, and, more importantly, Hack Data For Fun! Feel free to contact us and contribute.

Contributors

License

Apache License, Version 2.0

Features

  1. Full-text retrieval on all public websites on campus.
  2. E-mail reminder.
  3. News filter and recommender.

Reference

  1. NutchTutorial
    官方nutch教程,里面写的相当详细。
    注:这个是nutch 1.x 版本的教程, nutch有2.x的版本了,但是文档不是很多。建议还是用nutch 1.7或1.8

  2. Python 爬虫如何入门学习?
    知乎上关于爬虫的一个很好的讲解,里面涉及到集群爬虫,不过用的是python.

  3. Git 教學(1) : Git 的基本使用
    git 入门教程,写的很详细很好,第二篇Git 教學(2):Git Branch 的操作與基本工作流程讲解git下多人协作的流程。

  4. Nutch – How It Works

About

A Distributed Search Engine Instance based on Nutch & Solr.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published