tag: proteomics
11 postsWriting about proteomics from July 2019 to May 2026.
-
A note on making HarmonizePy public, and why we wanted a Python version of HarmonizR's missing-value-aware batch correction.
-
The story of a small ProteomeXchange metadata tool that started as a fragile Selenium parser and became a PyPI package the lab can actually depend on.
-
Release note for QuEStVar v0.1.0. Python package for paired equivalence and difference testing, now stable on PyPI.
-
Early prototype of a domain-specific compression codec for mass spectrometry data. Lossless path beats gzip and comes close to mzMLb. Lossy at q=4096 hits 13.17 MiB from 75.55 MiB mzML. Single DDA dataset. Decode is 27x faster than mzMLb. Early days.
-
Proteomics data is growing faster than storage budgets. AI demand is driving up hardware prices. The format debate is a distraction. Compression tools exist but nobody uses them. The real problem is that storage is an unaccounted line item, and the field optimizes for generation, not retention.
-
Why vendor mass spectrometry formats bother me, why a small native bridge might matter, and why Zig feels like the right place to test the idea.
-
The search engine space is crowded, fast-moving, and genuinely competitive. DIA-NN, Spectronaut, FragPipe, MaxQuant, and Sage each carved a niche. A Zig-based search engine is not the right next step. The gap between engines might be.
-
Thermo locks its format behind Windows DLLs. Bruker does better. Spectronaut adds its own proprietary layer on top. Open formats are inevitable, but the real bottleneck is proprietary converters in the middle.
-
The story behind ProteoForge, a framework for finding differential proteoforms in bottom-up proteomics data. Why protein-level averages hide biology, how missing data broke every tool we tried, and what we found when we applied it to a hypoxia dataset.
-
reading
proDA
Probabilistic dropout analysis for label-free proteomics. Handles missing values by modeling dropout curves instead of imputing.
-
reading
Prosit
Deep learning for MS2 spectrum and retention time prediction. Accurate enough for rescoring and library generation.