-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: change average_spectra
to take an iterator over SpectrumLike
#6
Conversation
Thank you for working on this. The idea of making this work with all I'll write more about |
I figured out that the error was because I did not actually run the code locally and as such did not test if it actually built. Now I think I fixed my original attempt. I did not check out the new iterators yet, sounds like a good thing to look into tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment about dx
inference.
Additionally, even if you derive an algorithm for inferring dx
, it raises the possibility the calculation becomes inconsistent as you add more spectra.
src/spectrum/group.rs
Outdated
if let Some(peaks) = scan.peaks.as_ref() { | ||
let fpeaks: Vec<_> = peaks | ||
let peaks = spectrum.peaks(); | ||
let fitted_peaks: Vec<_> = peaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will use the new iterators. It will work as expected for centroid and raw spectra which contain centroid data. For deconvolved spectra, this will probably fall apart because they aren't sorted by m/z, but neutral mass, and even if you sort the points before reprofiling, such a signal is of little use since the charge dimension is lost and cannot be recovered.
If this is a MultiLayerSpectrum
, which behavior you get depends upon SpectrumLike::signal_continuity
and whether or not MultiLayerSpectrum::deconvolved_peaks
is populated or not.
This is probably worth documenting, but I don't think it invalidates the proposed flexibility itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your comment I restructured the code a bit to always use the .peaks()
method as this is always available and will have the same behaviour as the previous implementation. The documentation was updated to reflect this fact and properly documents the deconvoluted case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: if with the new iterators you mean PeakDataIterDispatch
than this is not yet exported to the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noting that. PeakDataIterDispatch
should be exported. I'll add this to the main branch.
I have yet to find a balance on how much to put into the public API of a Rust library, complexity of the interface vs. ease of extension. The mzdata::spectrum::peaks
module is a good example, with seven different types exported in the public API, but only two concepts, "abstraction over peak list types" and "abstraction over iteration over abstracted peak list types", with variations in ownership.
The changes as they are look good to me. Please let me know if you have any other changes you want to make before merging. |
Do you think it is needed to change the Additionally if there are other parts of the library you would like another pair of eyes on feel free to ask me to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mistaken on Saturday when I approved the changes. I did not properly recognize that the branching between profile and centroid spectra was gone and that only the RefPeakDataLevel
variant was being used.
Using RefPeakDataLevel::iter
makes when you have centroid data. It might be simplest to do something like this (not tested):
let peak_data = scan.peaks();
let mode = scan.signal_continuity();
match (mode, peak_data) {
(_, RefPeakDataLevel::Deconvoluted(_)) | (_, RefPeakDataLevel::Missing) => None,
(_, RefPeakDataLevel::Centroid(_)) | (SignalContinuity::Profile, RefPeakDataLevel::RawData(_)) => {
Some(reprofile(peak_data.iter().map(|p| FittedPeak::from(p.as_centroid())), dx)),
},
(_, RefPeakDataLevel::RawData(arrays)) => {
Some(ArrayPair::from(arrays.mzs().unwarp(), arrays.intensities().unwrap()))
}
}
There are other things we can do to reduce overhead here, but you're under no obligation to pursue them unless they are of interest to you. The clearest one is to use a single mzsignal::reprofile::PeakSetReprofiler
instance for all re-profiling jobs here because reprofiling also creates a costly m/z grid spaced by dx
, and the reprofile
top-level function just creates a PeakSetReprofiler
, reprofiles a single spectrum, then throws it away, while the PeakSetReprofiler
can be re-used without needing to copy the m/z grid over-and-over again. Again, this is already complicated and we can leave it for future improvements if they are needed.
Thank you for offering to continue working with the library. Right now, spectrum averaging is on the Something that could use another pair of eyes is the peak loading scheme. Alternatively, please try to use the library in an application and tell me where it fails to do what you expect it to. |
I implemented your feedback. I also tried my hand at the reuse of PeakSetReprofiler, I assumed (but that assumption might be wrong) that the I will take a look at the |
average_spectra
to take an iterator over SpectrumLike
I was working on averaging spectra and saw room for some improvements. This will allow any iterator of any
SpectrumLike
instead of a slice ofMultiLayerSpectrum
(but is just as convenient for that use case). Additionally I fleshed out the documentation a bit based on some discussion I has with a coworker.I was trying to make this into a trait implementation for
FromIterator
which would make it trivial to make any spectrum iterator and then to.collect()
and get the average. But the necessarydx
default for reprofiling centroid data made this impossible, unless a good universal default could be defined.