Share some of the frequently collected questions we have collected after we have completed the first WeChat Open Course - Clinical Metabolomics Research Strategy for everyone to learn and refer to.
Q: How do I set up the verification set and test set?
A: According to my understanding, the question you want to ask is: the setting of the training set and the test set. Beginners who do analytical chemistry, biochemistry, or molecular biology often confuse these dataset concepts, often the ambiguity that arises from our Chinese translation.
In machine learning, data is usually divided into three categories: Training Set, Validation Set, and Test Set. BD Ripley defines and classifies the following in his book "Pattern Recognition and Neural Networks" Cambridge University Press, 1996, ISBN 0-521-46086-7.
Training Set: A set of examples used for learning, which is to fit the parameters [ie, weights] of the classifier. Training model or model parameter debugging
Validation Set: A set of examples used to tune the parameters [ie, architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network. Optimization and determination of models or parameters
Test Set : A set of examples used only to assess the performance [generalization] of a fully specified classifier. Purely test the predictive ability of the established model
Then the percentage of the ideal classification is, I suggest a cohort study of large populations (the sample size is relatively large, such as >100 or more)
Recommendation 1 | Recommended 2 | |
Training Set Training Set | ≥50 | 60 |
Validation Set | 25 | 20 |
Test Set Test Set | 25 | 20 |
The reality is generally limited by the size of the sample size, which evolves into this
data set | Recommendation 1 | Recommended 2 | Recommended 3 |
Training Set + Validation Set Training Set+ Validation Set | 60 | 70 | 80 |
Test Set Test Set | 40 | 30 | 20 |
In metabolomics studies, experiments with very small sample sizes, such as cell experiments and metabolomics data from animal experiments, can also see imprecise practices (often accepted):
data set | ||
Training Set + Validation Set Training Set+ Validation Set | All | LOOCV: Leave One Out CV K-fold CV (SIMCA: 1/7-fold CV) Bootstrap |
Test Set Test Set | N/A |
Q: How are cell samples collected? Use pancreatin without EDTA or use a scraper to collect? Which is better?
A: Cell Samples We found that adherent cells detected a large number of metabolites by means of a spatula, but the repeatability depends on the type of cells and the skill level of the experimenter itself. Therefore, our laboratory usually uses digestion methods when collecting cell samples on a large scale.
Q: Is the sample tested plasma? There are many small molecules and large molecules in plasma. Will the detection of small molecules be interfered by macromolecules?
A: Hello, metabolomics can use plasma, serum, DBS, etc. Before testing, we must use a high proportion of organic solvents for protein precipitation and metabolite extraction. Small peptides and proteins of macromolecules undergo chemical degeneration, centrifugal precipitation or filtration to remove proteins, thereby avoiding interference of macromolecular substances with endogenous small molecule metabolites. Protein precipitation and metabolite extraction methods need to be optimized and investigated by methodologies, otherwise the extraction efficiency of small molecules will be greatly affected.
Q: Does Matt's mapping have any research on exhaled metabolomics research?
A: Yes. This study is relatively mature in the study of COPD in chronic obstructive pulmonary disease. We have done some research in the study of diseases such as lung cancer and gastric cancer.
Q: Can you tell us about the combination?
A: The judgment of the differential metabolite combination requires skill. It is not a mathematical combination of the top combination of p<0.05. It has been done for specialized research models that do not understand biology and medicine. The optimization of the combination must be the result of both the statistical model + the metabolic channel and the re-optimization. It is the study of the molecular biological mechanism driven by metabolomics.
Q: Is standard serum also applicable in liquid-based metabolomics studies?
A: NIST SRM1950 is provided by the American Standards and Materials Research Institute. We use this specimen to achieve data corrections collected in different countries, different laboratories, different instruments and equipment, and at different times to ensure that our two countries (China and the United States) Integration of data generated by (Shanghai, Hangzhou, North Carolina, Hawaii). Our platform for quantitative metabolomics uses this sample as an independent external quality control. Therefore, this specimen is applicable regardless of temperament or liquid quality platform, and is the only opportunity to achieve global metabolomics data unification in the future.
Want to know more about clinical metabolomics?
Meite mapping clinical metabolomics national tour technical exchange meeting free application!
What are you waiting for?
Click to download the application form
ZHEJIANG SHENDASIAO MEDICAL INSTRUMENT CO.,LTD. , https://www.sdsmedtools.com