Data Preprocessing Facilitates Metabolic Pathway Identification from Time Profiles

Eberhard O. Voit1, Jonas S. Almeida2
1VoitEO@MUSC.edu, Medical University of S. Carolina; 2almeidaj@musc.edu, Medical University of S. Carolina

Modern molecular biology is generating data of unprecedented quantity and quality. Particularly exciting for biochemical pathway modeling and proteomics are comprehensive, time-dense profiles of metabolites and proteins that are measurable with mass spectrometry and nuclear magnetic resonance. These profiles contain a wealth of information about the structure and dynamics of the pathway or network from which the data were obtained. The retrieval of this information requires a combination of computational methods and mathematical models, which are typically represented as systems of ordinary differential equations. We show that the substitution of differentials with estimated slopes in nonlinear network models reduces the coupled set of differential equations to several sets of decoupled algebraic equations, which can be processed in parallel or sequentially. The estimation of slopes for each time series of the metabolic or proteomic profile is accomplished with a "universal function" that is computed directly from the data by cross-validated training of an artificial neural network (ANN). Without preprocessing, the inverse problem of determining structure from metabolic or proteomic profile data is extremely challenging and computationally expensive. The combination of system decoupling and data fitting with universal functions very significantly simplifies this inverse problem. Examples show some successful estimations and limitations of the method. Availability: A preliminary web-based application for ANN fitting is accessible at http://bioinformatics.musc.edu/webmetabol/. S-systems can be interactively analyzed with the user-friendly freeware PLASİ (http://correio.cc.fc.ul.pt/~aenf/plas.html) or with a MATLAB module that is currently being beta-tested in our laboratory.