Skip to content

5. Legal Implications

Diego edited this page Nov 18, 2023 · 11 revisions

Library licenses

Here is the list of all the licenses of libraries that are used by these project.

  • Apache-2.0 license
    • Streamlit
    • PyDrive
    • OpenAI python wrapper
    • bcrypt
    • oauth2client
    • cryptography (library) license
  • BSD-3Clause license
    • Pandas
  • GPLv2
    • mysql connector
  • MIT
    • tqdm
    • langchain

All of them are free software licenses, therefore we are legally allowed to use them. GPLv2 license restricts whether we can distribute the mysql connector library. We don't distribute any library though, so that is fine.

Content uploading

Our project allows users to upload their pdfs to google drive. The owner of the google account that hosts those pdfs is liable of the content uploaded. Since we don't have the capability of moderating that, we don't use our google accounts, whoever decides to host this project will provide their own google account (specified in the secrets.toml file).

OpenAI Usage

The application uses the OpenAI API to generate embeddings and generate the open responses of the model. Therefore the results and costs are depending on the API output of OpenAI. Once the company changes their policies of usage or the model, this will also affect the results of the application. Currently, OpenAI commits to not use data provided over the API for training purposes ("We do not train on your data from ChatGPT Enterprise or our API Platform" [1]). Nevertheless, this can potentially change in the future. Through the usage of langchain the switch to an open-source Large Language Model is doable.

We used an enterprise API key for the development of the application.

GDPR

As we use the streamlit platform, we have to make sure that it is GDPR compliant. It wasn't in the past, but now it is. However, to make sure, we disabled streamlit telemetry as suggested.

Data our application uses

We must follow the 7 principles of article 5.1-2

The only personal data we process are the user's email. We store and use it.

  1. Lawful, fair and transparent:
  2. Purpose limitation: We only use the email for sending the password recovery. We'd have to inform the user about that when they introduce it.
  3. Data minimization: We should specify what we use the emails for to be compliant.
  4. Accuracy: The email is provided by the user, therefore it should be accurate.
  5. Integrity and confidentiality: we don't encrypt this data.
  6. Accountability: We don't comply with GDPR, therefore we can't show accountability.

We also send the user's PDFs to our own google drive account. The same principles are violated and for similar reasons. Except for encryption, since GDrive is GDPR compliant, they do encrypt the data.

Consent

We do ask for consent to collect the emails, it's a text field that the user fills. We don't, however, have the functionality to remove that consent (delete the account or email).

Same thing for the PDFs.

LSSI

Our website doesn't generate revenue (neither direct nor indirect) for us, therefore LSSI is not applicable.

If we were to generate revenue, we would have to have visible Terms of Service (ToS) about the conditions of the service provided, Legal Notice, privacy policy, cookies policy

We would also offer the user some information about us:

  • Name
  • NIF
  • Address
  • Email
  • Phone number