Skip to content

Commit

Permalink
Feature/20210716 msconvert sonar mzml output (ProteoWizard#1687)
Browse files Browse the repository at this point in the history
* updated psi-ms.obo to version 4.1.56 in support of Waters SONAR output in mzML format
* handle a renamed CV term: MS_mean_drift_time_array becomes MS_mean_ion_mobility_drift_time_array
* CVID canonicalization was not distinguishing certain CVIDs representing regular expressions.
* added LysArginase case in pepXMLSpecificity()
* fixed combined SONAR spectra to use 4 arrays instead of 3 (separate arrays for lower and upper scanning quadrupole bounds), and non-combined SONAR spectra to use userParams for scanning quadrupole bounds (isolationWindow being unsuitable because it can't be used for MS1s)

Co-authored-by: bspratt <[email protected]>
Co-authored-by: Matt Chambers <[email protected]>
  • Loading branch information
3 people authored Sep 28, 2021
1 parent cb1c5cd commit aa94919
Show file tree
Hide file tree
Showing 16 changed files with 1,694 additions and 734 deletions.
14 changes: 13 additions & 1 deletion pwiz/data/common/CVTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,23 @@ inline char alnum_lower(char c)
return isalnum(c) ? static_cast<char>(tolower(c)) : c == '+' ? c : ' ';
}

inline char alnum_lower_regex(char c)
{
// c -> lower-case, whitespace, +, or _ for things that appear to be part of a regex
return isalnum(c) ? static_cast<char>(tolower(c)) : c == '+' ? c : '_';
}

string preprocess(const string& s)
{
string result = s;
transform(result.begin(), result.end(), result.begin(), alnum_lower);
if (bal::starts_with(s, "(?<=")) // Looks like a regex
{
transform(result.begin(), result.end(), result.begin(), alnum_lower_regex);
}
else
{
transform(result.begin(), result.end(), result.begin(), alnum_lower);
}
return result;
}

Expand Down
64 changes: 51 additions & 13 deletions pwiz/data/common/cv.cpp

Large diffs are not rendered by default.

70 changes: 58 additions & 12 deletions pwiz/data/common/cv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@
// [psi-ms.obo]
#define _PSI_MS_OBO_
// format-version: 1.2
// data-version: 4.1.51
// date: 24:03:2021 00:00
// saved-by: Eric Deutsch
// data-version: 4.1.56
// date: 25:06:2021 00:00
// saved-by: Chris Bielow
// auto-generated-by: OBO-Edit 2.3.1
// import: http://ontologies.berkeleybop.org/pato.obo
// import: http://ontologies.berkeleybop.org/uo.obo
Expand All @@ -61,6 +61,7 @@
// remark: creator: Fredrik Levander <fredrik.levander <-at-> immun.lth.se>
// remark: creator: Pierre-Alain Binz <pierre-alain.binz <-at-> chuv.ch>
// remark: creator: Gerhard Mayer <mayerg97 <-at-> rub.de>
// remark: creator: Joshua Klein <jaklein <-at-> bu.edu>
// remark: publisher: HUPO Proteomics Standards Initiative Mass Spectrometry Standards Working Group and HUPO Proteomics Standards Initiative Proteomics Informatics Working Group
// remark: When appropriate the definition and synonyms of a term are reported exactly as in the chapter 12 of IUPAC orange book. See http://www.iupac.org/projects/2003/2003-056-2-500.html and http://mass-spec.lsu.edu/msterms/index.php/Main_Page
// remark: For any queries contact [email protected]
Expand Down Expand Up @@ -7902,8 +7903,8 @@ enum PWIZ_API_DECL CVID
/// ion mobility drift time: Drift time of an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This time might refer to the central value of a bin into which all ions within a narrow range of drift time have been aggregated.
MS_ion_mobility_drift_time = 1002476,

/// mean drift time array: Array of drift times, averaged from a matrix of binned m/z and drift time values, corresponding to spectrum of individual peaks encoded with an m/z array.
MS_mean_drift_time_array = 1002477,
/// mean ion mobility drift time array: Array of population mean ion mobility values from a drift time device, reported in seconds (or milliseconds), corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_mean_ion_mobility_drift_time_array = 1002477,

/// mean charge array: Array of mean charge values where the mean charge is calculated as a weighted mean of the charges of individual peaks that are aggregated into a processed spectrum.
MS_mean_charge_array = 1002478,
Expand Down Expand Up @@ -8598,6 +8599,15 @@ enum PWIZ_API_DECL CVID
/// protein group-level result list statistic: Attrbiute of an entire list of protein groups.
MS_protein_group_level_result_list_statistic = 1002706,

/// (?=[KR]): Regular expression for LysargiNase.
MS_____KR__ = 1002707,

/// LysargiNase: Metalloproteinase found in Methanosarcina acetivorans that cleaves on the N-terminal side of lysine and arginine residues.
MS_LysargiNase = 1002708,

/// Tryp-N (LysargiNase): Metalloproteinase found in Methanosarcina acetivorans that cleaves on the N-terminal side of lysine and arginine residues.
MS_Tryp_N = MS_LysargiNase,

/// Pegasus BT: LECO bench-top GC time-of-flight mass spectrometer.
MS_Pegasus_BT = 1002719,

Expand Down Expand Up @@ -8892,7 +8902,7 @@ enum PWIZ_API_DECL CVID
/// inverse reduced ion mobility: Ion mobility measurement for an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This might refer to the central value of a bin into which all ions within a narrow range of mobilities have been aggregated.
MS_inverse_reduced_ion_mobility = 1002815,

/// mean ion mobility array: Array of drift times, averaged from a matrix of binned m/z and ion mobility values, corresponding to a spectrum of individual peaks encoded with an m/z array.
/// mean ion mobility array: Array of population mean ion mobility values (K or K0) based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_mean_ion_mobility_array = 1002816,

/// Bruker TDF format: Bruker TDF raw file format.
Expand Down Expand Up @@ -9123,7 +9133,7 @@ enum PWIZ_API_DECL CVID
/// ion mobility attribute: An attribute describing ion mobility searches.
MS_ion_mobility_attribute = 1002892,

/// ion mobility array: An array of ion mobility data.
/// ion mobility array: Abstract array of ion mobility data values. A more specific child term concept should be specified in data files to make precise the nature of the data being provided.
MS_ion_mobility_array = 1002893,

/// InChIKey: Unique chemical structure identifier for chemical compounds.
Expand Down Expand Up @@ -9453,13 +9463,13 @@ enum PWIZ_API_DECL CVID
/// timsTOF Pro: Bruker Daltonics' timsTOF Pro.
MS_timsTOF_Pro = 1003005,

/// mean inverse reduced ion mobility array: Array of inverse reduced ion mobilities, averaged from a matrix of binned m/z and ion mobility values, corresponding to a spectrum of individual peaks encoded with an m/z array.
/// mean inverse reduced ion mobility array: Array of population mean ion mobility values based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, normalized for the local conditions and reported in volt-second per square centimeter, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_mean_inverse_reduced_ion_mobility_array = 1003006,

/// raw ion mobility array: Array of raw drift times.
/// raw ion mobility array: Array of raw ion mobility values (K or K0) based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_raw_ion_mobility_array = 1003007,

/// raw inverse reduced ion mobility array: Array of raw inverse reduced ion mobilities.
/// raw inverse reduced ion mobility array: Array of raw ion mobility values based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, normalized for the local conditions and reported in volt-second per square centimeter, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_raw_inverse_reduced_ion_mobility_array = 1003008,

/// Shimadzu Biotech LCD format: Shimadzu Biotech LCD file format.
Expand Down Expand Up @@ -9882,15 +9892,51 @@ enum PWIZ_API_DECL CVID
/// PTMProphet mean best probability: PSM-specific average of the m best site probabilities over all potential sites where m is the number of modifications of a specific type, as computed by PTMProphet.
MS_PTMProphet_mean_best_probability = 1003148,

/// PTMProphet normalized information content: PTMProphet-computed PSM-specific normalized (0.0 1.0) measure of information content across all modifications of a specific type.
/// PTMProphet normalized information content: PTMProphet-computed PSM-specific normalized (0.0 - 1.0) measure of information content across all modifications of a specific type.
MS_PTMProphet_normalized_information_content = 1003149,

/// PTMProphet information content: PTMProphet-computed PSM-specific measure of information content per modification type ranging from 0 to m, where m is the number of modifications of a specific type.
/// PTMProphet information content: PTMProphet-computed PSM-specific measure of information content per modification type ranging from 0 to m, where m is the number of modifications of a specific type.
MS_PTMProphet_information_content = 1003150,

/// SHA-256: SHA-256 (member of Secure Hash Algorithm-2 family) is a cryptographic hash function designed by the National Security Agency (NSA) and published by the NIST as a U. S. government standard. It is also used to verify file integrity.
MS_SHA_256 = 1003151,

/// GCMS-QP2010SE: Shimadzu Scientific Instruments GCMS-QP2010SE.
MS_GCMS_QP2010SE = 1003152,

/// raw ion mobility drift time array: Array of raw ion mobility values from a drift time device, reported in seconds (or milliseconds), corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_raw_ion_mobility_drift_time_array = 1003153,

/// deconvoluted ion mobility array: Array of ion mobility values (K or K0) based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, as an average property of an analyte post peak-detection, weighted charge state reduction, and/or adduct aggregation, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_deconvoluted_ion_mobility_array = 1003154,

/// deconvoluted inverse reduced ion mobility array: Array of ion mobility values based on ion separation in gaseous phase due to different ion mobilities under an electric field based on ion size, m/z and shape, normalized for the local conditions and reported in volt-second per square centimeter, as an average property of an analyte post peak-detection, weighted charge state reduction, and/or adduct aggregation, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_deconvoluted_inverse_reduced_ion_mobility_array = 1003155,

/// deconvoluted ion mobility drift time array: Array of mean ion mobility values from a drift time device, reported in seconds (or milliseconds), as an average property of an analyte post peak-detection, weighted charge state reduction, and/or adduct aggregation, corresponding to a spectrum of individual peaks encoded with an m/z array.
MS_deconvoluted_ion_mobility_drift_time_array = 1003156,

/// scanning quadrupole position lower bound m/z array: Array of m/z values representing the lower bound m/z of the quadrupole position at each point in the spectrum.
MS_scanning_quadrupole_position_lower_bound_m_z_array = 1003157,

/// scanning quadrupole position upper bound m/z array: Array of m/z values representing the upper bound m/z of the quadrupole position at each point in the spectrum.
MS_scanning_quadrupole_position_upper_bound_m_z_array = 1003158,

/// isolation window full range: Indicates an acquisition mode in which the isolation window is a full range, rather than a subset of the full range.
MS_isolation_window_full_range = 1003159,

/// mzQC format: Proteomics Standards Initiative mzQC format for quality control data.
MS_mzQC_format = 1003160,

/// quality control data format: Grouping term for quality control data formats.
MS_quality_control_data_format = 1003161,

/// PTX-QC: Proteomics (PTX) - QualityControl (QC) software for QC report generation and visualization.
MS_PTX_QC = 1003162,

/// PTXQC (PTX-QC): Proteomics (PTX) - QualityControl (QC) software for QC report generation and visualization.
MS_PTXQC = MS_PTX_QC,

/// unimod root node: The root node of the unimod modifications ontology.
UNIMOD_unimod_root_node = 200000000,

Expand Down
Loading

0 comments on commit aa94919

Please sign in to comment.