Kurtz, Karen, Metro Planning Editor has reference to this Academic Journal, PHwiki organized this Journal Welcome! Mass Spectrometry meets ChemIn as long as matics WCMC Metabolomics Course 2013 Tobias Kind Course 1: General Introduction http://fiehnlab.ucdavis.edu/staff/kind CC-BY License What is ChemIn as long as matics Chemometrics est. 1975 Chemin as long as matics est. 1998 Who uses Chemin as long as matics All parts of chemistry heavily depend on chemin as long as matics. Life sciences, biochemistry, drug industries use chemin as long as matics. 20 years ago: 80% in lab – 20% in front of computer Now: 20% in lab – 70% in front of computer () Examples: Organic chemistry – automated reaction planning, Beilstein search Physical chemistry – modeling of structure properties (boiling points) Inorganic chemistry – lig in addition to bond interactions Analytical chemistry – structure elucidation of small compounds Biochemistry – protein/small molecule interaction networks PhD () 10% fixing in addition to installing new programs

Motivation as long as Mass Spectrometry meets ChemIn as long as matics To be a master of spectra you need to be a master of structures in the first place. Complex MS data interpretations only possible with software MS data obtained by hyphenated techniques (GC-MS, LC-MS) Mass spectral database search in addition to structure search routinely are used Mass spectrometers deliver multidimensional data Computer Illiteracy – a threat to your research Your computer is your friend You don’t have a computer You don’t have a friend (just kidding) Assume you have a computer: Please step as long as ward name: CPU, speed, memory, hard disk, OS You are a chemist, biochemist, biologist: Please step as long as ward name: Computer language or DB you know OS = operating system; DB = database, CPU = central processing unit PDP-11 www.bell-labs.com Fighting Computer Illiteracy – name your PC CPU INTEL,AMD,IBM,HP Pentium, Opteron, Xeon 12-20 Core Memory DDR, DDR2 GEIL, KINGSTON 16-128 GByte Hard disk SEAGATE, WD Raptor, Barracuda, Cheetah 100-1000 GByte OS MICROSOFT, LINUX Windows, Linux, OSX, Virtual OS Language C, Basic, Perl, JAVA Bit < Byte < kByte < MByte < GByte < TByte Single Core < Dual Core < QuadCore < MultiCore MFLOP/s < GFLOP/s < TFLOP/s < PFLOP/s 1 Thread < Dual Thread < MultiThreaded Cray 2 in rot, Nixdorfmuseum, 2004, The free lunch is over – multithreading needed Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm NO YES Can your metabolomics software use multiple CPUs The free lunch is over – multithreading needed Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm Course example MZMINE alignment (7 files -18 min LC-MS) Single core vs. multi-core 50 seconds 3:29 minutes Mors certa, hora incerta! Best recommendation ever as long as slow computers Install an SSD! Single hard disk Seagate 750 GB SSD RAID10 Samsung 830 (2 TB) RAMDISK OSFMount SSDs in addition to Ramdisks have 200 to 1000-fold(!) 4k speed. 4k speed matters. Computer Illiteracy – learn a programming language Why should you 20% lab time – 80% computer time Mass spectrometers deliver data – not results Why shouldn't you (fake reasons) You are too old to learn You are not good with computers Your have more important research to do You are so rich you have programmers who work as long as you Picture Source: WIKI James Manners from Genova, Italia Computer Illiteracy – learn a programming language Learn any language which has a large code in addition to user base (JAVA, Perl, Visual Basic) Use IDEs with automatic code completion like MS Visual Express or Eclipse Don’t re-invent code - use ( in addition to document) code search engines like http://code.ohloh.net/ ( as long as merly koders now ohloh); google.com/codesearch http://krugle.org moOMoOMoOMoOMoOmoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMMMmoOMMMommMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMMMmoOMMMMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO Language “cow” Language “brainfuck” Do not learn these working but esoteric languages There are 1123 programming languages http://99-bottles-of-beer.net/ >>++++++++[<++++>-] >++++++++++++++[<+++++++>-] +>+++++++++++[<++++++++++>-] ++>+++++++++++++++++++[<++++++>-] ++>+++++++++++++++++++[<++++++>-] >++++++++++++[<+++++++++>-] Program development – Eclipse as long as JAVA example Projects JAVA or C code Text output

Your computer Illiteracy – your emergency helpers Regular expressions; SQL database requests; EXCEL VBA scripts or Perl scripts are special tools as long as data h in addition to ling (Swiss army knifes) Regular expressions (RegEx) are used as long as finding in addition to replacing text [0-9] – represents all numbers Examples: nn – find double empty lines [a-z] – represents all small letters find t replace with spaces “ “ n – represents new line (CR/LF) find two numbers in brackets ([0-9][0-9]) t – represents TAB yr subject winner 1901 Chemistry Jacobus H. van ‘t Hoff 1902 Chemistry Emil Fischer 1903 Chemistry Svante Arrhenius 1904 Chemistry Sir William Ramsay 1905 Chemistry Adolf von Baeyer 1906 Chemistry Henri Moissan 1907 Chemistry Eduard Buchner 1908 Chemistry Ernest Ruther as long as d 1909 Chemistry Wilhelm Ostwald 1910 Chemistry Otto Wallach SELECT yr, subject, winner FROM nobel WHERE yr = 1909 in addition to subject = ‘chemistry’ yr subject winner 1909 Chemistry Wilhelm Ostwald Large Database Table SQL query Result Visit the SQL Zoo SQL is used as long as programming databases Learn about RegEx Regular Expressions – example MS data Task: create a list of 4 columns with names, as long as mulas, CAS numbers in addition to peaks Problem: 24,000 lines of mass spectral data (.msp) Program: Textpad (WIN), Smultron (Mac) Number of lines in text (m/z – intensity pair) Enter (CR/LF) in gray Regular Expressions – example MS data Solution: replace Enter (n) with TAB (t) in addition to use Replace ALL Result: Metadata in one line 1 2 3

Regular Expressions – example MS data Solution: copy only lines of interest (Mark ALL – Copy Bookmarked Lines) Regular Expressions – Result as long as MS data Solution: Replace redundant code with nothing, copy tab separated file to EXCEL Result: 1:30 min as long as RegEx job (1 hour manually) Average spectrum size: 70 peaks Minimum size: 5 peaks Maximum size: 439 peaks Most spectra have 35 in addition to 45 peaks Try Marvin Space via Webstart Be prepared – visualize your structures

Calculation of tetrahedral in addition to double bond stereoisomers How many stereoisomers can you expect from glucose (KEGG) Example: separation of species with ion mobility MS (FAIMS) Example calculated with MarvinView (via JAVA Webstart) Glucose Computation of resonance as long as ms (electron shifts) What are possible resonant structures Important as long as mass spectral interpretation (electron impact, electrospray) Phenol Example calculated with MarvinView Start via WebStart Generation of tautomers using MSketch How many tautomers can you expect Important as long as mass spectral interpretations in addition to LC-MS. Methyl acetate Example calculated with MarvinView Start via WebStart Derivatization in GC-MS in addition to LC-MS solves the tautomer problem Common tautomerisms: Enol/Keto, Lactams, Amines/Imines, Amides/Imides

Property calculations on chemicalize.org Mass spectral database search – know what exists How many mass spectra with as long as mula C11H8O3 in NIST DB Result: 19 as long as C11H8O3 in NIST05 DB Download NIST-MS-Search Mass spectral interpretation Assign structural elements to mass spectral peaks Download Mass Spectrum Interpreter Version 2

http://www.hmdb.ca/metabolites/HMDB09837 Mass Spec Scissors (ACDLabs Free) Q: What is peak m/z 281 in negative mode Molecular Weight Calculator Calculate isotopic masses Find as long as mulas from masses Calculate isotopic patterns Download MWTWIN Structure search – know what could be possible How many compounds (isomer structures) are found in public databases Result: 272 as long as C11H8O3 http://www.chemspider.com/

Stay tuned – new mass spectrometry publications via Yahoo Pipes [LINK] [RSS] Be open minded – NMR can do some things better ChenomX Profiler – with 312 pH in addition to frequency tuned reference spectra 2D-NMR needed as long as de-novo structure elucidation NMR metabolic profiling is highly reproducible with low variance NMR prediction with ChemAxon Msketch

The Last Page – What is important to remember: Learn about CPU type, memory, hard disks, bits in addition to bytes; shock you colleagues with r in addition to om questions about their computer Think about automation, thinks you would like to do (even if you can’t) shock you colleagues with a small computer script Use regular expressions as long as stupid or boring jobs you delete/replace data more than 3x – remember RegEx, RegEx, Regex Use scripting languages as long as small problems (EXCEL VBA, PERL) steal some small examples in addition to color your EXCEL data in rainbow color Generate yourself a collection of programs in addition to databases as long as MS try such programs in a Virtual Machine without messing up your system

