Skip to content
View thisnew's full-sized avatar

Block or report thisnew

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Highly configurable JSON format logging per Location - nginx logging module - aka. kasha 🍲

C 42 21 Updated Aug 9, 2021

parquet for DataX - hdfsreader

Java 13 9 Updated Apr 24, 2018

The Metadata Platform for your Data and AI Stack

Java 10,467 3,080 Updated Mar 29, 2025

Source-agnostic distributed change data capture system

Java 3,659 736 Updated Sep 28, 2023

A connector for Spark that allows reading and writing to/from Redis cluster

Scala 947 370 Updated Oct 22, 2024

Like those other ds4tools, but sexier

C# 7,135 813 Updated Dec 31, 2023

HanLP 测试

Scala 16 11 Updated Aug 31, 2017

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Python 34,724 10,476 Updated Jan 15, 2025

Example programs and scripts for accessing parquet files

Java 30 25 Updated Jan 29, 2018

基于inception的自动化SQL操作平台,支持SQL执行、LDAP认证、发邮件、OSC、SQL查询、SQL优化建议、权限管理等功能,支持docker镜像

JavaScript 1,569 639 Updated Jul 25, 2023

Spark structured streaming with Kafka data source and writing to Cassandra

Scala 62 33 Updated Dec 5, 2019

Spark Structured Streaming / Kafka / Cassandra / Elastic

Scala 183 78 Updated Feb 7, 2023

DataLink是一个满足各种异构数据源之间的实时增量同步、离线全量同步,分布式、可扩展的数据交换平台。

Java 1,100 411 Updated Dec 6, 2022

🚁🚀基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。

Java 4,360 1,471 Updated Feb 4, 2024

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.

Scala 553 279 Updated May 10, 2021

Collect logs and send lines to Apache Kafka

C++ 499 115 Updated Jan 27, 2021

The Internals of Spark Structured Streaming

417 172 Updated Dec 28, 2022

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Java 13,359 4,737 Updated Mar 30, 2025

CMAK is a tool for managing Apache Kafka clusters

Scala 11,892 2,506 Updated Aug 2, 2023

Use SQL to query Elasticsearch

Java 7,008 1,541 Updated Feb 4, 2025

A data integration framework

Java 4,037 1,698 Updated Mar 5, 2025

基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法

Java 2,045 929 Updated Feb 21, 2024

A web front end for an elastic search cluster

JavaScript 9,455 2,027 Updated Jul 17, 2021

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

270,475 21,105 Updated Oct 3, 2024

DRPC-Proxy是基于使用storm DRPC的RPC服务,解耦业务代码与storm框架代码的一个简单框架; 在某些场景下,有使用DRPC但不注重使用storm的流式计算的需求,通常情况下使用DRPCServer做为服务提供方接收请求,bolt中处理业务,ReturnResults返回结果;bolt中会将业务代码与storm代码交织、耦合,为后期升级、扩展留下难题。 DRPC-Proxy…

Java 5 5 Updated Jun 10, 2017

专门为kettle这款优秀的ETL工具开发的web端管理工具。

Java 649 370 Updated Dec 10, 2024

一款简单易用的Kettle调度监控平台,专门用来调度和监控由kettle客户端创建的job和transformation。整体的框架是由spring+sprin gmvc +beetlsql整合而成,通过调用kettle的API来执行转换和作业,并且使用quartz框架完成调度工作。

JavaScript 588 245 Updated Apr 27, 2023

快速、简洁、解决大文件内存溢出的java处理Excel工具

Java 33,252 7,618 Updated Oct 29, 2024

Not Just A Notepad! (golang + mongodb) http://leanote.org

JavaScript 11,709 2,474 Updated Nov 27, 2023

Pentaho Data Integration ( ETL ) a.k.a Kettle

Java 7,930 3,506 Updated Mar 30, 2025
Next
Showing results