-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
115 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Feature group coverage | ||
|
||
The **coverage** command calculates the coverage -- percentage of features present in each sample over a pre-defined group of features -- of a profile. | ||
|
||
```bash | ||
woltka tools coverage -i input.biom -m mapping.txt -o output.biom | ||
``` | ||
|
||
A typical use case is to assess the likelihoods of presence of **metabolic pathways** in each organism or community. Because a pathway consists of _multiple_ chemical **reactions** or functional **genes** connected to each other, the presence of some of them (even with high abundance) in the sample does not necessarily suggest that the entire pathway is viable. Only when all or a large proportion of them are found can we be more confident about this hypothesis. | ||
|
||
In this example, the input profile ([sample](../woltka/tests/data/output/truth.metacyc.tsv)) is a table of **genes**: | ||
|
||
Feature ID | Sample 1 | Sample 2 | Sample 3 | Sample 4 | ||
--- | --- | --- | --- | --- | ||
_plsC_ | 51 | 49 | 113 | 34 | ||
_fruK_ | 83 | 128 | 160 | 41 | ||
_panE_ | 0 | 53 | 0 | 39 | ||
_leuA_ | 111 | 262 | 232 | 77 | ||
... | | ||
|
||
The mapping file ([sample](../woltka/tests/data/function/metacyc/pathway_mbrs.txt)) defines the member features (**genes**) of each feature group (**pathway**) (each line can have arbitrary number of fields; field delimiter is \<tab\>): | ||
|
||
| | | | | | | | | ||
|-|-|-|-|-|-|-| | ||
| Asparagine biosynthesis | _asnB_ | _aspC_ | | ||
| Biotin synthesis | _bioA_ | _bioB_ | _bioD_ | _bioF_ | | ||
| NAD biosynthesis II | _hel_ | _nudC_ | _nadN_ | _pnuE_ | _nadR_ | _nadM_ | | ||
| pyruvate decarboxylation | _aceE_ | _aceF_ | _lpd_ | | ||
| ... | | ||
|
||
The output file ([sample](../woltka/tests/data/output/truth.metacyc.coverage.tsv)) is a table of coverage values (percentages) per sample per feature group (**pathway**): | ||
|
||
Feature ID | Sample 1 | Sample 2 | Sample 3 | Sample 4 | ||
--- | --- | --- | --- | --- | ||
Biotin synthesis | 50.0 | 50.0 | 25.0 | 37.5 | ||
GDP-D-rhamnose biosynthesis | 20.0 | 80.0 | 20.0 | 80.0 | ||
L-glutamine degradation I | 100.0 | 100.0 | 50.0 | 0.0 | ||
Sucrose biosynthesis I | 20.0 | 20.0 | 20.0 | 20.0 | ||
... | | ||
|
||
|
||
## Parameters | ||
|
||
### Presence / absence | ||
|
||
With parameter `--threshold` or `-t` followed by a percentage (e.g., `80`), the output coverage table will display binary results, with "**1**" representing coverage above or equal to this threshold and "**0**" being coverage below this threshold. | ||
|
||
### Feature count | ||
|
||
With flag `--count` or `-c`, the program will report the number of member features of a group present in a sample, instead of the percentage. Note: This will override `--threshold`. | ||
|
||
### Feature group names | ||
|
||
One can supply a mapping of feature groups to their names by `--names` or `-n`, and these names will be appended to the coverage table as a metadata column ("Name"). | ||
|
||
|
||
## Considerations | ||
|
||
The coverage command will treat any feature count -- as low as **1** -- as the evidence of the feature's presence. False positives may be introduced if the profile has many noises. One may consider **filtering** the profile prior to running this command. Woltka provides a per-sample feature abundance [filtering](filter.md) function, in addition to the multiple filtering functions implemented in the QIIME 2 plugin [feature-table](https://docs.qiime2.org/2020.11/plugins/available/feature-table/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Per-sample filtering | ||
|
||
The **filter** command filters each feature in each sample based on the absolute or relative abundance of that feature in that particular sample. For example, the following command will drop features that are less than 0.01% abundant in each sample: | ||
|
||
```bash | ||
woltka tools filter -i input.biom -o output.biom --min-percent 0.01 | ||
``` | ||
|
||
This function is especially useful in shotgun metagenomics, where very-low-abundance false positive assignments are prevalent and causing biases in downstream analyses ([Ye et al, 2019](https://www.cell.com/cell/fulltext/S0092-8674(19)30775-5)). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Merging profiles | ||
|
||
The **merge** command merges two or more profiles into one, while treating overlapping samples and features in an additive way. This is useful when the analysis includes multiple sets of input files (e.g., multiple sequencing runs). | ||
|
||
```bash | ||
woltka tools merge -i input1.biom -i input2.biom -i input3.biom -o output.biom | ||
``` | ||
|
||
The output file from the merge command is **identical** or nearly identical to the output file generated by merging sequence alignment file prior to running Woltka. Small errors (differring by the count of **1**) could be introduced during the normalization of _multiple assignments_ due to floating point arithmetic issues, which is usually not troublesome. In addition to sticking to one-to-one alignments, one can use classification parameters `--rank free`, `--uniq`, `--major`, or `--above` to prevent small errors ([see details](classify.md#ambiguous-assignment)). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters