Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to be able to host my own open data platform so others can use the code and datasets my team creates #212

Open
petebachant opened this issue Nov 13, 2024 · 1 comment
Assignees
Milestone

Comments

@petebachant
Copy link
Member

petebachant commented Nov 13, 2024

It seems like some teams are reinventing the wheel here, creating bespoke websites to share their stuff. If they don't want to post it on a common server, we can at least make it easy for folks to host their own, and the goal of this software is to make the data easily usable.

In other words, open source this package so others can self-host, but enable some sort of federated search so projects and artifacts can be found on any instance. We also want commands like calkit import dataset ... to work with other domains.

@petebachant petebachant converted this from a draft issue Nov 13, 2024
@petebachant
Copy link
Member Author

petebachant commented Nov 13, 2024

Notes from WindLab webinar:

  • Trying to make it easy for people to validate their models with predefined cases. Can we create Calkit projects that people use as a template for this sort of benchmarking, where they automatically import the dataset and some pipeline stage that compares the results. Where the users need to create an intermediate stage that saves their data in the correct format?
    • Maybe a good use case for a CAD feature? At least the CAD models should be part of the repo, hosted on Figshare or whatever.
  • Challenge of separating commercial companies' controller software, which needs to be closed source. Uses HTTP server. They use the open source model as a digital twin so commercial companies can test their designs. Use case for making some files private, or perhaps referencing a model in a private project.

WindLab

  • What is WindLab?
  • https://windlab.hlrs.de/
  • "Harvesting" plugin to collect metadata from related CKAN instances? What is CKAN? https://github.com/ckan/ckan
  • Very interesting. An open source data management system.
  • WindLab is a CKAN instance. So is AIRE.
  • We should probably be interoperable with CKAN instances, letting them "harvest" datasets from Calkit instances.
  • We could also list CKAN instances in Calkit.
  • We should allow users to import datasets from CKAN instances and cache them with DVC. --> ckanfs?
  • Goal is not to duplicate data
  • WindIO schema? What is this?
  • Python package for WindLab will go into GitLab, but where?

FLOW

  • "Model chain" -- is this a pipeline?
  • FLOW seems to be like Calkit
  • Download Jupyter notebooks
  • Has linking between processes, publications, datasets

AIRE

  • 8 experimental wind farms, 4 are commercial.
  • Blade erosion studies. Measuring precipitation.
  • Data and knowledge, models, tools, case studies, component designs --> Calkit models for all these?

@petebachant petebachant added this to the Open source milestone Nov 19, 2024
@petebachant petebachant self-assigned this Nov 20, 2024
@petebachant petebachant moved this from In progress to Ready in Calkit Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

1 participant