Skip to content
@southern-cross-ai

Southern Cross AI

Australia's First Large Language Model Research Initiative
Southern Cross AI   Website   Discord Server   Meetup Group

✨ Welcome to Southern Cross AI ✨
We aim at developing Open-Source Large Language Models to serve Australia
through collaborations across universities, governments and business sectors

Wanna make friends and munch snacks? Let's meet up!

Announcement (7 Feb): For all newcomers and our OG crews, we send you a big welcome for joining with us (again) in this journey. Below are the old schedules in 2024, to give you a litte bit taste of who we are and what actually is going on. If you have any ideas or questions, don't hesitate to connect with us via Discord or shoot an email to our team members. Wish you all will have a pleasant experience in the next following months!

2024 Schedule

Join our exciting 12-week (5 Aug - 7 Oct) Meetup Events held every Monday:

Latest Update (8 Oct): Sadly, our 12-week journey has come to an end. A heartfelt ❤️ thank you to all our community members who joined us over the past few months. It’s been an amazing journey with lovely people like you! Stay tuned to our Meetup Events and we’ll see you next semester.

New kid in town? No worries, we got you!

Onboard LLMs

LLM Battleground

LLM Playground

Misc

Call for Contributors - We need your magic to make things happen

  • Data Source Contributor 🕵️‍♀️
    • Identify and provide access to Australia-related data sources.
    • Collaborate with other contributors to ensure data quality and relevance.
  • Data Collecting, Crawling and Scraping 👩‍🌾
    • Develop scripts and tools to collect data from various sources.
    • (Optional) Have experience with web scraping tools (e.g., BeautifulSoup, Scrapy).
  • Data Cleaning 👩‍⚕️
    • Clean and preprocess datasets to ensure they are ready for analysis and modeling.
    • (Optional) Have experience with data manipulation libraries (e.g., Pandas, NumPy).
  • Model Building, Training and Tuning 👩‍💻
    • Develop and train LLMs to solve with our datasets.
    • Have experience with machine learning frameworks (e.g., TensorFlow, PyTorch).
  • GitHub Organising 👩‍🔧
    • Manage the GitHub repository by organizing files, documentation, and issues.
    • (Optional) Have proficiency in using Git and GitHub.
  • Hugging Face Organising 👩‍🏭
    • Manage and organize model versions and datasets.
    • Ensure proper documentation and metadata for each model and dataset.
  • Social Media Organising 👩‍💼
    • Promote the project and its updates on social media platforms (e.g., Discord, Meetup).
    • Engage with the community to increase project visibility and collaboration.

Can't wait to join us? Send a message to our lovely team members:

Pinned Loading

  1. BabyJoey BabyJoey Public

    Small 115 million parameter model - .5GB

    Python 4 9

  2. Gutenberg-Data Gutenberg-Data Public

    HTML 3 2

  3. Dataset-Repo-Template Dataset-Repo-Template Public template

    A Template for Creating Your Dataset Repos

    1

Repositories

Showing 10 of 28 repositories
  • southern-cross-ai.github.io Public

    Southern Cross AI's official website, hosted on GitHub Pages.

    southern-cross-ai/southern-cross-ai.github.io’s past year of commit activity
    HTML 0 0 0 0 Updated Feb 10, 2025
  • .github Public

    These are the default community health files for Southern Cross AI's GitHub profile.

    southern-cross-ai/.github’s past year of commit activity
    0 Apache-2.0 0 0 0 Updated Feb 7, 2025
  • BabyJoey Public

    Small 115 million parameter model - .5GB

    southern-cross-ai/BabyJoey’s past year of commit activity
    Python 4 Apache-2.0 9 7 0 Updated Jan 8, 2025
  • consolidated-data-repo Public

    This is a repo of old data coleceted in 2024 and will be consoldataed into a new lager one in 2025

    southern-cross-ai/consolidated-data-repo’s past year of commit activity
    0 0 0 0 Updated Dec 1, 2024
  • Braided-Channels Public

    Interview Dateset from the Braided Channels Research Collection

    southern-cross-ai/Braided-Channels’s past year of commit activity
    Jupyter Notebook 1 MIT 0 1 0 Updated Sep 7, 2024
  • OpenAustralia Public

    Dataset of House and Senate Debates from Australian Parliament

    southern-cross-ai/OpenAustralia’s past year of commit activity
    HTML 1 MIT 0 1 0 Updated Sep 5, 2024
  • Inside-Airbnb-Australia Public

    Airbnb's Residential Dataset (Australia)

    southern-cross-ai/Inside-Airbnb-Australia’s past year of commit activity
    Jupyter Notebook 1 MIT 0 1 0 Updated Sep 5, 2024
  • ICE-AUS Public

    Corpus Dataset from Australian component of the International Corpus of English (ICE-AUS)

    southern-cross-ai/ICE-AUS’s past year of commit activity
    Python 2 MIT 0 0 0 Updated Sep 5, 2024
  • CoANZSE Public

    Dataset from Corpus of Australian and New Zealand Spoken English (CoANZSE)

    southern-cross-ai/CoANZSE’s past year of commit activity
    Python 1 0 0 0 Updated Sep 5, 2024
  • Dewr-data Public
    southern-cross-ai/Dewr-data’s past year of commit activity
    0 0 1 0 Updated Aug 27, 2024

Top languages

Loading…

Most used topics

Loading…