Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl portion of autopipeline misses CT and RTSTRUCT files #87

Open
strixy16 opened this issue Jun 8, 2023 · 2 comments
Open

Crawl portion of autopipeline misses CT and RTSTRUCT files #87

strixy16 opened this issue Jun 8, 2023 · 2 comments

Comments

@strixy16
Copy link
Collaborator

strixy16 commented Jun 8, 2023

Running autopipeline /Users/katyscott/Documents/SARC021/images/ /Users/katyscott/Documents/SARC021/med-imageout/ --n_jobs 1 --update --overwrite doesn't find all of the CT and RTSTRUCT files in the images directory.

My images directory contains four directories total - one sample has two directories each. Each sample directory contains subdirectories containing CT and RTSTRUCTs as DICOMs. There are three different CT scans for each sample and RTSTRUCTs associated with most of them.

The output of the crawl only finds one of the three sets of CT and RTSTRUCT combinations for the first sample and two of the three CTs and one RTSTRUCT set for the second sample.

When I call the crawl_one function on its own, it appears to find all of the files. So somewhere between this and the output, the files are getting lost.

@Zhack47
Copy link

Zhack47 commented Aug 16, 2024

Hello, I have had a similar problem on a dataet, wher no file was found. Uponfurther inspection, it seems, that the condition for recursive search with glob (in src/imgtools/utils/crawl.py, l.17) is too strict. Inded, it only looks for files ending in ".dcm", which is not always the case for DICOM files :)

I simply changed the condition to "*", to include all files. This allowed the tool to find my patients and is actually what is present in the article's branch F1000Research

Hope this helps !

@Zhack47
Copy link

Zhack47 commented Aug 19, 2024

Overall, this strict matching of only "*.dcm" is a problem in multiple places in the code, for example further down the line I had the same issue with RT Structure Set files conversion

Sometimes thee files will end in .dcm, other times .DCM, other times no suffix at all !

I think it would be necessary to check the files are DICOM another way, to make this tool agnostic to the filename suffix :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants