Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests #18

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Tests #18

wants to merge 4 commits into from

Conversation

KristijanArmeni
Copy link
Collaborator

Opening this WIP pull request just to get the ball rolling regarding setting up tests (#10). Here I added a small unit test that tests the behavior of the gini() method and the full hashtag analyzer on a small input dataframe with 5 rows. See current idea of how this works and demo video below.

Current test architecture

Standard pytest stuff

  • Add /tests folder with a test_hashtags.py testing script (eventually each analyzer would have test_<analyzer-name>.py script that would test analyzer methods.
  • Add pyproject.toml which holds configuration options for pytest (pyproject.toml can also be used in the future to add packaging configuration).

Mangotango specific stuff

  • Create a AnalyzerContextDummy() class that's used to mirror PrimaryContextAnalyzer() and it's methods/attributes and is only used to fetch the test data within the analyzer.
  • Create check_analyzer_context() function within analyzer.main() that can check if the context passed to the analyzer is a real one or the dummy class. This is not elegant and add hoc; but I couldn't figure out how to get around the input_reader.preprocess(pl.read_parquet(input_reader.parquet_path)) which loads data from disk, but the test data are created within the test script on the fly.

Demo

The main idea is to use pytest, which can be invoked in the root folder. It collects all the tests present in the newly added /tests folder. For example, right now I added test_hashtags.py which contains a test of the gini() and hashtags.main() functions. Invoking pytest (on this branch) looks like so:

pytest_test.mov

@KristijanArmeni KristijanArmeni added the enhancement New feature or request label Nov 24, 2024
@KristijanArmeni KristijanArmeni self-assigned this Nov 24, 2024
@arkavo-com
Copy link

I agree on the pytest approach

@soul-codes
Copy link
Collaborator

@KristijanArmeni thanks for this! Can somehow add this to our CI workflow?

@KristijanArmeni
Copy link
Collaborator Author

Sure @soul-codes. Is there a CI set up somewhere?

Otherwise, we want to choose a CI framework/provider first. I use CircleCI in a few of my projects and works fine (and see it in some mature OSS projects too). It has a free tier that I think should work fine. I guess GitHub actions might work too, but don't have experience with those.

We'd need to setup an account, link the repo with CircleCI. Then it's a matter of setting up small CircleCI config file and it should be running our tests remotely.

@soul-codes
Copy link
Collaborator

@KristijanArmeni this is where we set up GitHub CI workflows. Feel free to add a new workflow that performs tests. Here's the docs to writing them. Perhaps @DeanEby can help as he's been doing some CI things on the repo recently? 🙏

@@ -0,0 +1,3 @@
[tool.pytest.ini_options]
testpaths = ["tests"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KristijanArmeni a thought -- what do you think about allowing each analyzer to have its own test folder so that each module is self contained? How tricky would it be to configure something like that?

Copy link
Collaborator Author

@KristijanArmeni KristijanArmeni Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be too tricky, as it is one of the two supported layouts by pytest. I can't say I see an obvious advantage for one or the other layout atm. I guess once you have many tests, it becomes more manageable to put them alongside with application code?

Separate test folder:

Putting tests into an extra directory outside your actual application code might be useful if you have many functional tests or for other reasons want to keep tests separate from actual application code (often a good idea)

Tests as part of modules:

Inlining test directories into your application package is useful if you have direct relation between tests and application modules and want to distribute them along with your application

See more under 'Tests layout' here: https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html

@DeanEby
Copy link
Collaborator

DeanEby commented Dec 6, 2024

Hey! I'd be happy to help out with this. Should I implement pytests on other analyzers, or perhaps try to implement the current test to the CI workflow?

@soul-codes
Copy link
Collaborator

A CI workflow of some kind would be great

@KristijanArmeni
Copy link
Collaborator Author

KristijanArmeni commented Dec 7, 2024

@KristijanArmeni thanks for this! Can somehow add this to our CI workflow?

@soul-codes @DeanEby If I understand correctly, we could just add one CI step to the current build_exe.yml workflow, after the "Install dependencies" step on l. 66 and before the pyinstaller step, like so (assuming tests are in /tests as it is currently assumed for this PR):

    # Run tests using pytest
    - name: Run pytest
      run: |
        # Run pytest with verbose output
        pytest tests -v

Meaning, that build does not proceed if tests are not passing. Alternatively, we could have a separate run_tests.yml CI workflow.

EDIT: I see that the current trigger for build_exe.yml is whenever a push with a version tag is performed. But for tests, we want to trigger them upon any PR, i.e. any read-to-review PR should have unit-tests passing. I guess that should be preferred?

@DeanEby
Copy link
Collaborator

DeanEby commented Dec 7, 2024

I think triggering on PR makes a lot of sense
I was able to get this working by recreating your commits on a local repo, and then using the following run_tests.yml
image

name: Run Tests

  

on:

  pull_request:

    branches:

      - main

  

jobs:

  build:

    strategy:

      matrix:

        include:

          - platform_name: Windows

            artifact_name: windows

            os: windows-2022

            version_command: icacls "VERSION" /grant Everyone:F /T /C /Q

            move_command: move dist\mangotango.exe dist\mangotango_windows.exe

            sha_command: pwsh -c "Get-FileHash -Algorithm SHA1 dist\mangotango_windows.exe | Format-Table Hash -HideTableHeaders > dist\mangotango_windows.exe.sha1"

            list_command: dir dist

          - platform_name: MacOS 12

            artifact_name: macos-12

            os: macos-12

            move_command: mv dist/mangotango dist/mangotango_macos_12

            sha_command: shasum -a 1 dist/mangotango_macos_12 > dist/mangotango_macos_12.sha1

            list_command: ls -ll dist

          - platform_name: MacOS 13

            artifact_name: macos-13

            os: macos-13

            move_command: mv dist/mangotango dist/mangotango_macos_13

            sha_command: shasum -a 1 dist/mangotango_macos_13 > dist/mangotango_macos_13.sha1

            list_command: ls -ll dist

          - platform_name: MacOS 14

            artifact_name: macos-14

            os: macos-14

            move_command: mv dist/mangotango dist/mangotango_macos_14

            sha_command: shasum -a 1 dist/mangotango_macos_14 > dist/mangotango_macos_14.sha1

            list_command: ls -ll dist

          - platform_name: MacOS 15

            artifact_name: macos-15

            os: macos-15

            move_command: mv dist/mangotango dist/mangotango_macos_15

            sha_command: shasum -a 1 dist/mangotango_macos_15 > dist/mangotango_macos_15.sha1

            list_command: ls -ll dist

  

    name: Build ${{ matrix.platform_name }}

    runs-on: ${{ matrix.os }}

    steps:

      - name: Checkout code

        uses: actions/checkout@v2

  

      - name: Set up Python

        uses: actions/setup-python@v4

        with:

          python-version: 3.12

  

      - name: Cache dependencies

        uses: actions/cache@v3

        with:

          path: |

            ~/.cache/pip

          key: ${{ matrix.os }}-pip-${{ hashFiles('requirements.txt') }}

          restore-keys: |

            ${{ matrix.os }}-pip-

  

      - name: Install dependencies

        run: |

          python -m pip install --upgrade pip

          pip install -r requirements.txt

        # Run tests using pytest

      - name: Run pytest

        env:

            PYTHONPATH: ${{ github.workspace }}

        run: |

          pip install pytest

          # Run pytest with verbose output

          pytest tests -v

@KristijanArmeni
Copy link
Collaborator Author

I think triggering on PR makes a lot of sense I was able to get this working by recreating your commits on a local repo, and then using the following run_tests.yml

Awesome @DeanEby! Thanks for looking into this. Shall I just add a commit to this PR? Or do you want to add it? Either way fine by me.

@DeanEby
Copy link
Collaborator

DeanEby commented Dec 11, 2024

You can add the commit. It would be nice to get this merged so I can start working on additional tests :)

@KristijanArmeni
Copy link
Collaborator Author

@DeanEby Thanks, I added it. Seems like all are platforms are passing, except MacOS 12 build:

The macOS-12 environment is deprecated, consider switching to macOS-13, macOS-14 (macos-latest) or macOS-15. For more details, see actions/runner-images#10721

Sounds like it can safely be removed?

Outside of that, this seems ready to merge. @soul-codes feel free to throw an extra pair of eyes if need be.

@soul-codes soul-codes marked this pull request as ready for review December 13, 2024 00:38
Copy link
Collaborator

@soul-codes soul-codes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we'll want to do is create a reusable fake context class for testing that obeys the context interface's intended semantic. The context interface is supposed to be so that the analyzer doesn't need to care how it's implemented.

I can help with that.


strategy:

matrix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to run datasci tests in all environments. Tools like polars, pandas and such should be cross-platform enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants