Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to pipeline.ini files (global) #332

Open
jscaber opened this issue May 17, 2017 · 1 comment
Open

Changes to pipeline.ini files (global) #332

jscaber opened this issue May 17, 2017 · 1 comment

Comments

@jscaber
Copy link
Member

jscaber commented May 17, 2017

This is to initiate a debate on pipeline.ini files, to provide a convention that will work for input validation.
In its current format, input validation would happen directly on the ini file before anything else happens.

1. How to handle file paths

There have been two suggestions:

  • Require a full path for every file that is used
    • Easy to write a validation script
    • Portable, no cgat file structure required
    • Would break with current practice: e.g. /path/to/fasta would replace the genome_dir and genome variables
  • Require a "file(name)" prefix/suffix in parameter name

2. How to handle common directories

As per Ian's comment in pull request #331, providing the directory once may be desirable for directories with multiple required files in them. Any ideas on handling something like this using input validation would be helpful:

feature_dir=/path/to/dir
feature1=name1.file
feature2=name2.file

And what about this, where basename is then assembled into multiple files basename.file1 and basename.file2 by python later?

feature_dir=/path/to/dir
feature=basename

3. How to deal with defaults

Options

  • Keep defaults
  • Empty file with suggestions in comments (+/- a filled-in example)

4. How to deal with mandatory input

Ideas that can be parsed by an input validation script

  • add "?" for mandatory input and provide user with example input
  • add "req" or similar suffix/prefix to the parameter name

Ultimately, the question is, do we want to do this at all?
Apart from having to change all pipeline.ini files and the pipelines (depending on choices), it would also require reconfiguring all your existing inis.

@sebastian-luna-valero
Copy link
Member

Thanks @jscaber

Please note that the existing input validation happens after the PARAMS dictionary has been fully loaded, which includes more steps than parsing the pipeline.ini file. Actually, it may parse more than one .ini file and also there are 1) input given in the command-line and 2) hard-coded values (e.g. in Parameters.py) However, it does happen just before running any ruffus task, and I think that's what we need.

I think some form of input validation is desirable, but I understand we need to be a bit flexible as well. Here are my thoughts:

  • we should use full paths whenever possible

  • provide empty file with suggestions in comments

  • use "?" for mandatory input and provide examples

It would be also required to have a key word in the parameter name. For example, containing file and/or dir to be explicit about what input you would expect a file or any other configuration parameter (i.e. job queue name). Moreover, it would be required to distinguish between input and output files, as the former can be tested but the latter cannot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants