Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
elasticsearch		elasticsearch
images		images
keycloak		keycloak
kibana		kibana
logstash		logstash
microservices		microservices
nifi		nifi
sftp		sftp
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
README.md		README.md
Rakefile		Rakefile
docker-compose.yml		docker-compose.yml
document_search_engine_architecture.drawio		document_search_engine_architecture.drawio
hadoop.env		hadoop.env
notas.txt		notas.txt
notas_logstash.txt		notas_logstash.txt
notas_microservices.txt		notas_microservices.txt

Repository files navigation

A document search engine architectural approach

An architectural approach to implementing a large-scale document search engine based on Apache Nifi.

ETL process design based on Apache Nifi's flow-based programming model.

Main Goals

It should have a fast and efficient search, providing the same search experience as Google Search.
All text in documents (including their content) must be extracted and indexed.
The architecture should be scalable, it must use technological references in the movement of data.
It should be able to handle a large number of files of various formats and some quite large.
It should be optimized to store large amounts of data and maintain multiple copies to ensure high availability and fault tolerance.
It should have the ability to integrate with external systems to collaborate on more complex tasks or simply define platform usage schemes.

Architecture Overview

Containers Ports

Container	Port
Apache Nifi Dashboard UI	localhost:8080
Hadoop Resource Manager	localhost:8081
Kafka Topics UI	localhost:8082
MongoDB Express	localhost:8083
Kibana	localhost:8084
Keycloak PGAdmin	localhost:8085
Keycloak Admin UI	localhost:8086
Consul Dashboard	localhost:8087
Rabbit MQ - Stomp Dashboard	localhost:8088
Hadoop NameNode Dashboard	localhost:8089
API Gateway SSH	localhost:2223
SFTP Server	localhost:2222

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A document search engine architectural approach

Main Goals

Architecture Overview

Containers Ports

Some screenshots

Apache Nifi

Apache Kafka

Apache Hadoop HDFS

MongoDB

ELK Stack

About

Releases

Packages

Languages

License

sergio11/document_search_engine_architecture

Folders and files

Latest commit

History

Repository files navigation

A document search engine architectural approach

Main Goals

Architecture Overview

Containers Ports

Some screenshots

Apache Nifi

Apache Kafka

Apache Hadoop HDFS

MongoDB

ELK Stack

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages