- Golang Backend: Refactored code from Python backend to Golang, much more stability and performance.
- Node Network Graph: Visualization of node typology.
- Node System Info: Available to see system info including OS, CPUs and executables.
- Node Monitoring Enhancement: Nodes are monitored and registered through Redis.
- File Management: Available to edit spider files online, including code highlight.
- Login/Regiser/User Management: Require users to login to use Crawlab, allow user registration and user management, some role-based authorization.
- Automatic Spider Deployment: Spiders are deployed/synchronized to all online nodes automatically.
- Smaller Docker Image: Slimmed Docker image and reduced Docker image size from 1.3G to ~700M by applying Multi-Stage Build.
- Node Status. Node status does not change even though it goes offline actually. #87
- Spider Deployment Error. Fixed through Automatic Spider Deployment #83
- Node not showing. Node not able to show online #81
- Cron Job not working. Fixed through new Golang backend #64
- Flower Error. Fixed through new Golang backend #57
- Documentation: Better and much more detailed documentation.
- Better Crontab: Make crontab expression through crontab UI.
- Better Performance: Switched from native flask engine to
gunicorn
. #78
- Deleting Spider. Deleting a spider does not only remove record in db but also removing related folder, tasks and schedules. #69
- MongoDB Auth. Allow user to specify
authenticationDatabase
to connect tomongodb
. #68 - Windows Compatibility. Added
eventlet
torequirements.txt
. #59
- Docker: User can run docker image to speed up deployment.
- CLI: Allow user to use command-line interface to execute Crawlab programs.
- Upload Spider: Allow user to upload Customized Spider to Crawlab.
- Edit Fields on Preview: Allow user to edit fields when previewing data in Configurable Spider.
- Spiders Pagination. Fixed pagination problem in spider page.
- Automatic Extract Fields: Automatically extracting data fields in list pages for configurable spider.
- Download Results: Allow downloading results as csv file.
- Baidu Tongji: Allow users to choose to report usage info to Baidu Tongji.
- Results Page Pagination: Fixes so the pagination of results page is working correctly. #45
- Schedule Tasks Duplicated Triggers: Set Flask DEBUG as False so that schedule tasks won't trigger twice. #32
- Frontend Environment: Added
VUE_APP_BASE_URL
as production mode environment variable so the API call won't be alwayslocalhost
in deployed env #30
- Configurable Spider: Allow users to create a spider to crawl data without coding.
- Advanced Stats: Advanced analytics in spider detail view.
- Sites Data: Added sites list (China) for users to check info such as robots.txt and home page response time/code.
- Basic Stats: User can view basic stats such as number of failed tasks and number of results in spiders and tasks pages.
- Near Realtime Task Info: Periodically (5 sec) polling data from server to allow view task info in a near-realtime fashion.
- Scheduled Tasks: Allow users to set up cron-like scheduled/periodical tasks using apscheduler.
- Initial Release