feature: change `average_spectra` to take an iterator over `SpectrumLike` #6

douweschulte · 2024-08-26T14:57:38Z

I was working on averaging spectra and saw room for some improvements. This will allow any iterator of any SpectrumLike instead of a slice of MultiLayerSpectrum (but is just as convenient for that use case). Additionally I fleshed out the documentation a bit based on some discussion I has with a coworker.

I was trying to make this into a trait implementation for FromIterator which would make it trivial to make any spectrum iterator and then to .collect() and get the average. But the necessary dx default for reprofiling centroid data made this impossible, unless a good universal default could be defined.

mobiusklein · 2024-08-26T15:28:19Z

Thank you for working on this.

The idea of making this work with all SpectrumLike types is reasonable, although it means working through the RefPeakDataLevel type in certain places. The last release introduced more efficient iterators for that type so this would be a good exercise for them. These would be used where you're getting a compiler error about as_ref not existing on that type.

I'll write more about dx shortly.

douweschulte · 2024-08-26T15:33:37Z

I figured out that the error was because I did not actually run the code locally and as such did not test if it actually built. Now I think I fixed my original attempt. I did not check out the new iterators yet, sounds like a good thing to look into tomorrow.

mobiusklein

Please see the comment about dx inference.

Additionally, even if you derive an algorithm for inferring dx, it raises the possibility the calculation becomes inconsistent as you add more spectra.

src/spectrum/group.rs

mobiusklein · 2024-08-26T15:57:38Z

src/spectrum/group.rs

-                    if let Some(peaks) = scan.peaks.as_ref() {
-                        let fpeaks: Vec<_> = peaks
+                        let peaks = spectrum.peaks();
+                        let fitted_peaks: Vec<_> = peaks


This will use the new iterators. It will work as expected for centroid and raw spectra which contain centroid data. For deconvolved spectra, this will probably fall apart because they aren't sorted by m/z, but neutral mass, and even if you sort the points before reprofiling, such a signal is of little use since the charge dimension is lost and cannot be recovered.

If this is a MultiLayerSpectrum, which behavior you get depends upon SpectrumLike::signal_continuity and whether or not MultiLayerSpectrum::deconvolved_peaks is populated or not.

This is probably worth documenting, but I don't think it invalidates the proposed flexibility itself.

Based on your comment I restructured the code a bit to always use the .peaks() method as this is always available and will have the same behaviour as the previous implementation. The documentation was updated to reflect this fact and properly documents the deconvoluted case.

Sidenote: if with the new iterators you mean PeakDataIterDispatch than this is not yet exported to the public API.

Thank you for noting that. PeakDataIterDispatch should be exported. I'll add this to the main branch.

I have yet to find a balance on how much to put into the public API of a Rust library, complexity of the interface vs. ease of extension. The mzdata::spectrum::peaks module is a good example, with seven different types exported in the public API, but only two concepts, "abstraction over peak list types" and "abstraction over iteration over abstracted peak list types", with variations in ownership.

src/spectrum/group.rs

mobiusklein · 2024-08-31T22:33:20Z

The changes as they are look good to me. Please let me know if you have any other changes you want to make before merging.

douweschulte · 2024-09-02T07:54:06Z

Do you think it is needed to change the .peaks() to the new iterators? Otherwise this looks like a good point for merging to me. I was thinking if adding this to some existing (or new) trait is warranted as that would allow for more efficient implementations if that would be necessary. But I do not have a direct use for that and that might be enough reason to not build it for now.

Additionally if there are other parts of the library you would like another pair of eyes on feel free to ask me to take a look.

mobiusklein

I was mistaken on Saturday when I approved the changes. I did not properly recognize that the branching between profile and centroid spectra was gone and that only the RefPeakDataLevel variant was being used.

Using RefPeakDataLevel::iter makes when you have centroid data. It might be simplest to do something like this (not tested):

let peak_data = scan.peaks();
let mode = scan.signal_continuity();

match (mode, peak_data) {
    (_, RefPeakDataLevel::Deconvoluted(_)) | (_, RefPeakDataLevel::Missing) => None,
    (_,  RefPeakDataLevel::Centroid(_)) | (SignalContinuity::Profile, RefPeakDataLevel::RawData(_)) => {
       Some(reprofile(peak_data.iter().map(|p| FittedPeak::from(p.as_centroid())), dx)),
    },
    (_, RefPeakDataLevel::RawData(arrays)) => {
         Some(ArrayPair::from(arrays.mzs().unwarp(), arrays.intensities().unwrap()))
    }
}

There are other things we can do to reduce overhead here, but you're under no obligation to pursue them unless they are of interest to you. The clearest one is to use a single mzsignal::reprofile::PeakSetReprofiler instance for all re-profiling jobs here because reprofiling also creates a costly m/z grid spaced by dx, and the reprofile top-level function just creates a PeakSetReprofiler, reprofiles a single spectrum, then throws it away, while the PeakSetReprofiler can be re-used without needing to copy the m/z grid over-and-over again. Again, this is already complicated and we can leave it for future improvements if they are needed.

src/spectrum/group.rs

mobiusklein · 2024-09-03T02:52:39Z

Thank you for offering to continue working with the library.

Right now, spectrum averaging is on the SpectrumGroupingIterator trait. You could create a SpectrumLike member function that does the same thing as this function as well, but that would purely be for convenience as there isn't an obvious way that doing that would introduce extra information and save work during averaging.

Something that could use another pair of eyes is the peak loading scheme. peaks() loads peaks in a SpectrumLike-implementing type-specific order. MultiLayerSpectrum is currently the only type to have more than one kind of peak list and so I have added try_build_peaks, try_build_centroids, and try_build_deconvoluted_centroids to do the unpacking if the caller expects those to be available. There may be a more streamlined way to do that without just moving those method calls into the spectrum file reader.

Alternatively, please try to use the library in an application and tell me where it fails to do what you expect it to.

douweschulte · 2024-09-03T12:13:38Z

I implemented your feedback. I also tried my hand at the reuse of PeakSetReprofiler, I assumed (but that assumption might be wrong) that the Vec<FittedPeak> will always be sorted. If this proves to be not the case the min_by can be extended to run over the entire vector. The trait I was talking about was mainly me trying to make it as easy as possible to call average now that you need to have a ref iterator it is somewhat cumbersome to call it with an owned iterator. But it would indeed not change all that much so might be not worth the effort.

I will take a look at the peaks() function. Otherwise I am using it in rustyms and the annotator right now so I will keep my eyes out for things that I can help with. One of the things I would like to add (after this PR) is a general API for lazy loading, I saw all your file types support this in one way or another but the current mzdata::io::MZReaderType::open_path loads everything at once.

Allowed average from an iterator and added more documentation

32d73a0

Fixed errors

36c32b2

mobiusklein reviewed Aug 26, 2024

View reviewed changes

Moved to reference iterator and moved to spectrumlike::peaks

a535b7c

mobiusklein approved these changes Aug 31, 2024

View reviewed changes

mobiusklein requested changes Sep 3, 2024

View reviewed changes

src/spectrum/group.rs Outdated Show resolved Hide resolved

src/spectrum/group.rs Outdated Show resolved Hide resolved

douweschulte added 2 commits September 3, 2024 13:51

Implemented first tips

abb3017

Create one instance of PeakSetReprofiler

c303438

mobiusklein approved these changes Sep 5, 2024

View reviewed changes

mobiusklein added 2 commits September 5, 2024 18:00

Merge branch 'main' into generic_average

0c58088

revert default features

e937b8a

mobiusklein changed the title ~~Allowed average from an iterator and added more documentation~~ feature: change average_spectra to take an iterator over SpectrumLike Sep 6, 2024

mobiusklein merged commit 0e0582e into mobiusklein:main Sep 6, 2024
2 checks passed

douweschulte deleted the generic_average branch September 9, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: change `average_spectra` to take an iterator over `SpectrumLike` #6

feature: change `average_spectra` to take an iterator over `SpectrumLike` #6

douweschulte commented Aug 26, 2024

mobiusklein commented Aug 26, 2024

douweschulte commented Aug 26, 2024

mobiusklein left a comment

mobiusklein Aug 26, 2024

douweschulte Aug 27, 2024

douweschulte Aug 27, 2024

mobiusklein Aug 27, 2024

mobiusklein commented Aug 31, 2024

douweschulte commented Sep 2, 2024

mobiusklein left a comment

mobiusklein commented Sep 3, 2024 •

edited

Loading

douweschulte commented Sep 3, 2024

feature: change average_spectra to take an iterator over SpectrumLike #6

feature: change average_spectra to take an iterator over SpectrumLike #6

Conversation

douweschulte commented Aug 26, 2024

mobiusklein commented Aug 26, 2024

douweschulte commented Aug 26, 2024

mobiusklein left a comment

Choose a reason for hiding this comment

mobiusklein Aug 26, 2024

Choose a reason for hiding this comment

douweschulte Aug 27, 2024

Choose a reason for hiding this comment

douweschulte Aug 27, 2024

Choose a reason for hiding this comment

mobiusklein Aug 27, 2024

Choose a reason for hiding this comment

mobiusklein commented Aug 31, 2024

douweschulte commented Sep 2, 2024

mobiusklein left a comment

Choose a reason for hiding this comment

mobiusklein commented Sep 3, 2024 • edited Loading

douweschulte commented Sep 3, 2024

feature: change `average_spectra` to take an iterator over `SpectrumLike` #6

feature: change `average_spectra` to take an iterator over `SpectrumLike` #6

mobiusklein commented Sep 3, 2024 •

edited

Loading