Install and startupWindows computersMac computers (OS X)Linux computersStarting Chrombox D from the Matlab desktop (on all systems)Changing settingsUpdatingTutorial 1. Least squares spectral resolution (LSSR)1.1. Starting the program1.2. Loading the data1.3. Basic viewing options1.4. Generating a library of compounds1.5. Generating a mass spectral library1.6. Quantification by LSSR1.7. Chromatographic resolution1.8. Other samples.Tutorial 2. LSSR on high resolution direct infusion data2.1. Importing the data2.2. Setting appropriate conditions for filtering and binning.2.3. Creating the library2.4. Quantifying the data2.5. Using the library as a filter.Tutorial 3. The Lipid Gerator 3.1. Basics3.2. Codes3.3. Main functions of the generator3.4. Generation of compounds and filtering3.5. Editing lists for each class3.6. User defined fatty acid lists3.7. Saving and loading settings and compound listsTutorial 4. The profile analyzer4.1. Initialization4.2. Description of functions4.3. Using the functions4.4. Generating a centroided spectrum and resolving it by LSSR4.5. Further description of the functions in the Analyze profile windowTutorial 5. Least squares resolution of profiles (LSSR-P)5.1. Principle5.2. Low resolution data5.3. High resolution dataTutorial 6. Accurate mass determination6.1. Low resolution data6.2 High resolution dataAppendix 1. List of lipid classes with examples of naming
The following text styling is applied in this document. Commands, paths or filenames are denoted by:
path\filename.ext. Buttons in the graphical user interface are shown as
[Button]. Keys on the keyboard are denoted by
[Key]. A parameter to be set is denoted by
parameter, and a value of a parameter or an option in a menu is denoted by
- On Windows computers with Matlab you can usually just move the folder
DDto your preferred destination and start the program by double click on the
- Follow the instructions below if you install on a Mac or Linux computer, or if you install the program on a network drive.
DDto the preferred destination, e.g.
C:\CHROMBOX\. This will be the D-root folder
"Chrombox D.exe"file in the D-root folder.
If installed on a network disk you may have to use one of the methods described below:
dstart.min the folder
…\dd\variousand move it to somewhere in your Mathlab path. This is the only file that needs to be in the Matlab path. Possible destinations may be found by starting Matlab and typing
dstart.mand edit the last line after the
runcommand so that it points to the file
dd_startscript(see example below).
dstartin the Matlab command window.
An example of
dstart.m is shown below:
You can also create a desktop shortcut by copying the shortcut to Matlab and adding the following to the destination
/automation /r dstart An example of how it can look is shown below:
C:\MATLAB6p5\bin\win32\matlab.exe /automation /r dstart
Download the installation and unzip the archive
Move the folder
DD to the preferred destination, for example
/Users/yourname/Documents/CHROMBOX/DD, This will be the D-root folder
The shell script
macstart_d.command stored in the D-root folder can be used to start the program if the file is executable and Matlab can be started with the terminal command
./matlab. Note that the extension
.command may be hidden in Finder.
To check if Matlab can executed by
./matlab open the terminal and type
./matlab. If Matlab does not start you can do the following:
sudo ln -s /Applications/MATLAB_RXXXXx.app/bin/matlab /usr/local/binwhere
RXXXXxshould be replaced by the Matlab version number, for example "R2017a". Alternatively, open
Applicationsin Finder. Locate Matlab, right-click and select
Show Package Contents. Open the folder
binand locate the application file
matlab. In terminal type
sudo ln -swithout pressing enter. Thereafter drag the
matlabapplication file to the terminal. Ensure there is a space between "
"/Applications"and press enter.
macstart_d.command executable, do the following:
Open the terminal. Use
cd to change directory to the D root where the
macstart_d.command is located or open the terminal at the D root folder if that is an option. Type
chmod +x macstart_d.command. Alternatively, type
chmod +x without pressing enter and drag the
macstart_d.command file from Finder to the terminal. Ensure there is a space between
"macstart_d.command" and press enter.
Thereafter double-click on
macstart_d.command in Finder to start the program. Depending on your security settings you may get the following message: "macstart_c.command can’t be opened because it is from an unidentified developer". To solve this, open System Preferences – Security and Privacy – General and press
[Open anyway] next to the message regarding the file. An alternative way of allowing the file to be executed is to open the file in TextEdit and saving it again. Then it will no longer have status as downloaded from the Internet.
As an alternative to the above procedure, Chrombox D can be started by the following method:
dstart.min the folder
…/dd/variousand move it to somewhere in your Matlab path. Possible destinations may be found by starting Matlab and typing
dstart.mand edit the last line after the
runcommand so that it points to the file
dd_startscript(see example below).
dstartin the Matlab command window.
An example of
dstart.m is shown below:
Download the installation and unzip the archive
DD to the preferred destination, for example
/home/yourname/CHROMBOX/DD, This will be the D-root folder
The shell scripts
linstart_d.sh stored in the D-root folder can be used to start the program, if the file is executable and Matlab can be started with the terminal command
On Ubuntu you can use the following procedure to make
Allow executing file as program.
It should now be possible to start Chrombox D by double-click on
linstart_d.sh and selecting the option
run in terminal. If you don’t get the
run in terminal option while double-clicking the file you will have to edit the preferences in the file manager. Choose
Edit in the menu for Files, thereafter
Preferences and select the
Behaviour tab. Select
Ask each time as the option for executable text files.
There is also a file
linstart_d_term.sh in the D-root folder. The difference between
linstart_d_term is that linstart_d runs the application disconnected from the terminal while
linstart_d_term runs in the terminal. Chrombox D will continue to run if you close the terminal if it was initiated by
linstart_d, while it will close together with the terminal if it was initiated by
As an alternative to the above procedure you can also start Chrombox D by
dstart.m as described for Mac computers above.
On all operating systems you can use the following procedure to start Chrombox D.
Start Matlab in the regular way, so that the Matlab desktop is opened.
Change the current working directory of Matlab to the D-root folder, either by the line showing the working directory or by browsing in the panel in the left side of the Matlab desktop.
You can now start Chrombox D by one of the following methods:
dd_startscript.min the panel showing the contents of the working directory, right-click and select
run dd_startscriptin the Matlab command window.
In a minimized Matlab session (running in terminal without Matlab desktop) you can use the
cd command to set the working directory and
run dd_startscript to start the program.
dd_localsettingsfile in the D-root folder.
dd_localsettings(.sdv or .csv) in an editor such as Notepad and edit the paths for raw data, etc, if necessary.
windowposis position of the window in fractions of the screen size. The two first numbers in the vector is the position of the lower left corner. As specified above the lower left corner is 10% from the bottom of the screen and 10% from the left. The height and width is 75% of the screen size. Ensure that the sums of numbers 1 and 3 and numbers 2 and 4 are less than 1.
defaultfoldersis set to 1 the program will use the standard setup for subfolders and it is not necessary to edit the paths even if they are not correct. If the parameter is set to 1 you will have to specify the location of each path for data and methods. Data can be read from other folders than the ones are specified. Folders can also be changed by using the
[Settings]option within the program.
versionrefers to the current version of the code. The parameter can also be updated from within the program.
D-12-06should be placed in the folder
codein the D root folder..
dd_localsettings.sdv(may also have .csv extension) that is found in the D root folder and update the version to the folder name of the new code. The part to be edited is shown in blue in the example below.
An example of
dd_localsettings.sdv is shown below. The part to edit is between the two semicolons in the first line
Alternatively, you may select the new code by the following procedure:
[Settings]button down in the right corner
[Save local settings]
The purpose of this tutorial is to learn basic features of the program and how to quantify samples using least squares spectral resolution (LSSR). The samples are unit-resolution LC-MS data of phosphatidyl cholines (PC) and sphimgomyelins (SM) analyzed by precursor ion scan of m/z 184. Details of the methodology are given in Zeng et al., J. Chromatogr. A 1280 (2013) 23.
Chrombox D.exefile in the D root folder (i.e.
C:\CHROMBOX\DD), or as described in the installation instructions for Linux and Mac. The main window should look like Figure 1.1.
You will need functions for loading data, for creating libraries and for handling spectra.
[Import Chrom]button that takes you to the window for importing chromatographic data shown in Figure 1.2.
\Tutorial-1and select all four files in the subfolder.
[Load Sel.]to load the four files.
[Accept]when finished. The chromatograms should now appear in the main window.
In the main window you can you can choose between displaying the total ion currents (TIC) and individual ions by using the radio buttons to the right in the line below the chromatograms. Right-click in an ion-chromatogram will give you additional options. You can navigate in the chromatogram by the
[-] buttons and by the slider next to them.
[>]buttons on the next line.
47min in the working range in the "Spectrum functions" area (See Fig. 1.1). Choose
Sumand press the
[View Spectrum]button. This will display the sum of signals of the selected region. The spectra can be exported graphically or numerically by right-click in the figure.
The region to display can also be changed by right-click on the vertical blue bars in the chromatogram.
The purpose of this exercise is to identify PC species in the four samples. For that you will need a library of compounds and spectra that can be found in the samples. The compounds are generated by the Lipid Generator function that is opened by the
[Gen. Lipids] button in the main window. The lipid generator generates possible lipid compounds based on lists of fatty acids, sphingoid bases and the lipid class core formula.
The window of the lipid generator is shown in Figure 1.3. When using the function it is important to consider which fatty acids, sphingoid bases and lipid classes it is possible to have in the analyzed samples – and which that can be detected under the experimental conditions that was applied. Although many of the fatty acids and sphingoid bases in the list are not expected to be abundant in the samples, there is no need to make changes to the default lists in this case. The applied MS conditions (precursor ion scan of m/z 184) means that only choline containing compounds are detected. These must therefore be selected in the lipid class list to the right in the figure.
SPCF(Choline containing sphingomyelin),
GPC[2o](PC plasmalogen with ether bond),
GPC[2p](PC plasmalogen with vinyl-ether bond).
[Gen. Comp.]button to
Weights. This will ensure that isomeric compounds are not generated. If isomers occur, the one regarded as most likely (based on weights of fatty acids and the order of generation) will be preferred.
[Gen. Comp.]to generate the compounds. This will create a compound list with approximately 730 unique molecular formulas. Possible isomers of each compound are listed in the rightmost column. You may get additional information about a molecule by selecting it in the list and searching for instance LipidMaps or EMBL by using the popup-menu that appears under the list. Navigation in the table may be more convenient if you right-click in the table and changes the view to
Tutorial-1next to the
[Exp. Comp. List]button and thereafter pressing the button. The compond list is saved in semicolon separated CSV format and can also be edited in a spreadsheet or a text editor.
The next step is to create the mass spectral library from the compound list. Press the
[Library] button in the main window. The library window is shown in Figure 1.4. The library window will import and display the library that is currently in the method when opened. When the list is empty, as in figure 1.4, it means that no library is stored in the method.
Tutorial-1in the popup menu next to the
[Imp. Comp. List]button and press the button to read the list.
Before generating the spectra, you must pay attention to the information given by the resolution and offset info. This tells that the data currently in memory has a resolution of 1 and a mass offset of 0.2. The resolution of the data and the generated spectra should always be identical. In most cases the mass offset should also be identical. The exception is if mass offset is used to compensate for a systematic deviation in the mass accuracy of the instrument.
mass offsetin the settings to
0.2and also change
Weights. The other settings can be kept at default values.
The next step is to select the spectrum types to be generated. The function can generate several spectra, but in this case all relevant ions have positive H+ adducts.
[M+H]+option in the list of spectrum types.
[Gen. Spec.]to generate spectra with the isotope distribution of the compounds.
The spectra are first generated with resolution of 0.001 and thereafter downsampled to the required resolution and mass offset. In the list of generated spectra there are some that are set as not being "active". These are of compounds with very similar spectra to other compounds in the list; the correlation between the spectra is higher than the similarity threshold of 0.9. These will not be applied by the LSSR algorithm. Which of the similar spectra that will be set as active is decided by the weights that are inherited from the compound list. You may change these selections, but only one of the interfering spectra should be set as active at any time. With a higher mass resolution you would have experienced fewer interferents.
Tutorial-1in the field next to the
[Save Libr.]button and thereafter pressing the button.
[Accept], which will transfer the library to the method and close the library window.
The next step is to quantify the compounds using LSSR. Select the first chromatogram (CODBRAIN_PC) and ensure that the selected region marked by the vertical blue bars spans the region of the chromatogram where there are signals (Approx 15-47 min).
[LSSR win]button that will take you to the least squares spectral resolution window. The spectrum is resolved as long as there is a library in the method and the resolution of the library fits the resolution of the data. The window should look approximately like in Figure 1.5.
There are three plots in the window. The main plot is the sum spectrum of the selected region. The horiziontal red line is a threshold (in this case set to 2% relative to the most abundant mass). Compounds that do not have a base peak above the threshold are excluded from the calculations. Green masses are masses in the compounds that are included in the calculations. Any red masses (none in this case) are masses that are above the threshold, but that do not match any active compounds in the library. Other masses are blue. The horizontal green line is a baseline estimate. This level is subtracted from all masses in the regression.
The predicted versus measured plot shows how well the calculated solution explains the spectrum. Any severe deviations (none in this case) show that the masses are not properly explained. Right-click on a deviating mass will show which compounds the mass belongs to. The compounds may not be accurately estimated if there are severe deviations between predicted and measured values of the masses.
The third plot shows the total signal from each compound detected. There will usually be a large number of bars with low levels and also some negative values because of noise and baseline subtraction. It will therefore usually be necessary to do a recalculation after selecting a proper threshold level.
2in the edit field next to the
[Select]button and thereafter press the button. This will mark compounds that are above 2% relative to the most abundant. Press thereafter
[Recalc (sel)]. This will recalculate the abundances using only compounds that were above the selection threshold, and the plot will be simplified. To better view the plot you can right-cick on the background and select
Copy figure. The result should look approximately like in Figure 1.6.
The majority of the compounds belong to the GPC class, which are ordinary prosphatidyl cholines. “” indicates that two fatty acids are bound to the molecule and the numbers that follows “t” indicate total number of carbons and total number of double bonds in the two fatty acids. There are also some compounds belonging to the GPC[2o] class indicating that one of the fatty acids is ether-linked (plasmalogens) and a few minor compounds belonging to the SPCF class (sphingomyelins). It should be emphasized that the identities are the one the program regard as the most likely, based on the weights of fatty acids and compound classes, and that there may be several alternative explanations.
[Library]button, which will open the library, and scroll down to the compounds with base peak of 747 in the spectrum list. You will see that there are three compounds with this base peak and only one is active. Assume that you have reason to believe that this peak is the GPC t33:1. Set this peak as active and GPC[2o] t34:1 as inactive and leave the library window by pressing
[Accept]. Press thereafter
[Calculate]in the LSSR window, press the
[Select]button again with a threshold of 2, and press
[Recalc. (sel)]. The identity of the compound should now be GPC t33:1.
Results can be reported by pressing the
[Report] button. The report format can be selected by pressing
[Settings] down in the right corner and thereafter
Since the data is LC-MS data you can perform a chromatographic resolution based on the theoretical spectra of the compounds that is shown in the bar plot.
[Resolve]in the LSSR window and
[Subtract BL]in the resolution window.
This should give you a resolved chromatogram similar to the one in Figure 1.7. The numbers in brackets behind the identities are equivalent carbon numbers (ECN). Peaks belonging to the same class and with the same ECN should be grouped together. Severe deviations from this pattern may indicate incorrect identification. Clicking on a peak or on a label will highlight the peaks. You may see that some profiles have double-peaks, which indicates that there are several isomers.
[>]button in the main window. Adjust the vertical bars so they fit the region of the chromatogram with signals and repeat the LSSR procedure. Do the same for chromatogram 3 and 4.
In chromatogram 2 and 3 you can see that several of the major components consist of more than one isomer (e.g. GPC t36:4 at approx 28 min). Some of the peaks late in the chromatogram are sphingomyelins (SPCF).
Chromatogram 4 is a reference mixture of sphingomyelins. All major peaks belong to the SPCF group. Note that there are several isomeric compounds in the SPCF group, so the displayed identities may not be correct. In addition, low resolution mass spectrometry cannot distinguish between compounds such as SPCF[d18:1] 18:0 (C41H83N2O6P) and SPCF[t17:1] 18:1 (C40H79N2O7P). If you have knowledge about which sphingoid bases you can expect in your samples you can avoid many of these conflicts by deactivating or downweighting compounds in the list of sphingoid bases in the Lipid Generator. Four common sphingoid bases are set as active by default. These are d18:1 (Sphingosine), d18:0 (Sphinganine), t18:1 (Dehydrophytosphingosine) and d17:1 (C17 Sphingosine).
The purpose of this tutorial is to learn how to apply LSSR with high resolution direct infusion data, and how you should read in the data with best possible quality.
[Import Chrom]and select
mzXMLas the file type.
/Tutorial-2and thereafter select
SM_INFUSin the middle table.
[Load Sel.]button to read the file.
Ions Chromamong the display options in the upper left corner of the window. The ion traces should look like in Figure 2.1.
The plot in Fig. 2.1 is of unit resolution data. The unit resolution spectrum can be seen by selecting
Avg Spectr as the display option. The data were acquired by high resolution MS, and the next step is to find out how good resolution you can use when the data are analysed, by varying the resolution and mass offset. The resolution in AMU can be any value from 0.001 to 5 that will give an integer value when used as a divisor for 1, i.e. 0.5, 0.2, 0.1, 0.05, ..., 0.001). The offset can be set to any number but it should usually have an absolute value smaller than the resolution.
Ions Chromis selected as display option, type in gradually decreasing values for resolution in the table to the right and press
[Recalc Sel.]after each step. From 0.05 you will see that spikes start to appear in the traces because there are masses that are close to the borders of the mass bins, and that some times are rounded up and sometimes rounded down. You may get rid of the spikes by experimenting with different mass offsets. The mass offset is added to the raw m/z values before they are rounded.
The options for the binning and filtering algorithm is explained in Figure 2.2 and in the text below.
In most MS data files, spectra are stored as pairs of vectors of masses and abundances for each retention time, tr (Fig. 2.2a). These vectors typically vary in number of recorded ions and are without a strictly defined resolution.
The first filter that is applied is the abundance threshold filter that removes low signals that are expected to have insignificant influence on the data. The threshold is given in percent of the largest individual signal in the original spectra. Any signal below this threshold is deleted. The main purpose of this filter is to speed up the binning function and the threshold should normally not be so high that it has significant influence on the final results. The default value for
Abund. Thresh is 0.01% of the most intense signal in the data. After application of the filter, the spectra have fewer ions, but the original data structure is kept (Fig. 2.2b).
The next step is the binning function. The purpose of this is to organize the spectra into a matrix of intensities where each entry in the matrix corresponds to the signal from a defined mass and a defined retention time (Fig. 2.2c). This is controlled by the parameters
Mass offset and
Resolution is the selected mass resolution in AMU in the final matrix.
Mass offset is a value that is added to the original masses (from Fig. 2.2b) before they are rounded to the required resolution.
Mass win is the window size of the binning algorithm. This 100% by default, which means that all signals that pass the
Abund. Thresh. filter will contribute to the binned signal matrix. This can be constrained to leave out masses that are between the expected signals. If the window is set to 100% and the resolution is 1, all ions between m-0.5 and m+0.5 are assigned to m, and all ions between m+0.5 and m+1.5 are assigned to m+1. If it is set to 50% all ions between m-0.25 and m+0.25 are assigned to m and all ions between m+0.75 and m+1.25 are assigned to m+2. This means that signals from ions between m+0.25 and m+0.75 will not be recorded.
Each column in the signal matrix corresponds to an ion trace. After the signal matrix is constructed, vectors containing the maxima, means and the ranges of each column in the matrix are calculated. These values are compared to the maxima of each of these vectors by the
range filters. The threshold values for these are percents relative to the max value in each of the vectors of corresponding values. Ions that do not pass any of the active filters are deleted. When applying these filters it is important to consider which type of data one are working with. Important ions in chromatographic data can be expected to have a maximum well above baseline and a certain difference between the max and min values, so it makes sense to apply the
max and the
range filters. The
mean filter may be an efficient way to remove spikes because a single spike in the signal will have little influence on the overall mean of the signal. However, this is also the case for small and narrow peaks, so the filter should therefore be used with care with chromatographic data. On direct infusion data, as applied in this tutorial, one should expect intensities to be fairly stable. A
range filter therefore makes little sense and the
mean filter may work better than the
max filter, which is why the
mean filter should be active in this case and the two other should be unchecked.
Three more optional filters that work on the ions can be applied. CODA is a method for detecting relevant signals in chromatographic-mass spectrometric data [Windig et al., Analytical Chemistry 68 (1996) 3602-3606] and the parameters that control it are the
CODA Thresh. and
CODA winsize. You can also constrain the ions to only those that are in the library compounds if there is a library in memory. When applying the
library filter it is important that the library has the same resolution as the data, and the mass offset should usually also be the same. You can also set a maximum number of ions to return. If this filter is used it returns the ions with the largest maxima in the ion traces. By default the filter is active and the maximum number of ions are 5000.
Finally, the data matrix can be resampled in the chromatographic direction by summarizing two or more scans. The final result can be a data matrix (Fig. 2.2d) that is reduced both in the chromatographic and the spectral direction.
You can test how different filters and conditions affect the data. Inspect both the ion chromatograms and the average spectra.
In the following sections it is assumed that the data were sampled with a
0.01, a small negative
mass offset of
Mass win. of
Abund. thresh. of
Mean threshold of
0.1%, and all other filters turned off. The ion trace and the average spectrum with these conditions are shown in Figure 2.3. There are still a few negative spikes, but these will not have significant impact on the average spectrum. Apply these settings and thereafter return to the main window by pressing the
[Gen Lipids]button that will take you to the Lipid Generator window.
SPCFclass (Short SMC) in the list of lipid classes. Select
Weightsas filter next to the
[Gen. Comp.]button and thereafter press the button. There should now be approximately 130 compounds in the list after filtering if the default fatty acid and sphingoid base lists are applied.
Tutorial-2next to the
[Exp. Comp. list]button and thereafter press the button.
[Close]that will take you back to the main window.
[Library]button that will take you to the library window. Press
[Clear List]if there is already a library present.
Tutorial-2compound list in the field next to the
[Imp. Com. List]button and thereafter press the button to import the data.
[M+C2H3O2]-in the list of spectrum types. Set
mass offsetequal to that of the data,
-0.001, respectively. Set
[Gen. Spec.]button. If you set incorrect values for
mass offsetyou can set the correct values and then resample without generating new spectra by pressing the
[Resample]button any time after the spectra are generated.
[LSSR]button in the main window. The resolved solution should look approximately like in Figure 2.4.
The sphingomyelins have masses from approximately 700 to 900 AMU. In this region there are a few minor ions marked red. There are ions that are above the threshold, but that is not accounted for by the library. Their presence indicates that the sample is not pure sphingomyelin, or thst the sphingomyelin contains other sphingoid bases or fatty acids than those in the default lists.
By right-clicking on the bars or the labels in the bar plot you get information about each compound and you can search Lipid Maps or other bases to verify that these are common sphingomyelins, or get alternative identifications, by right-click in the information field.
The predicted versus measured plot shows that most masses are close to the 1:1-line, which indicates good accuracy. But there are a few ions that have a zero measured value and a calculated value above zero. The largest of these are 873.69. If you right-click on the label in the plot you will be told that this ion appears in SPCF[d18:1] 24:1. By right-click on this compound in the bar-plot you can see that the major ions of this compound is close to the 1:1-line, which indicates that the quantification of this peak is accurate.
Once a library that fits the data is generated you can also use the library as a filter when the data are imported.
[Import Chrom]and select
mzXMLas the file type. Select
SM_INFUSagain, as explained in 2.1
-0.001, turn the mean filter on and the max filter off, and Import the data by pressing
Avg Spectras view to display the loaded spectrum.
Library filterin the right table and press
[Recalc Sel.]. This will filter away ions that are not in the library and the result is a spectrum of the sphingomyelins.
[Accept]to go back to the main window and press the
[LSSR Win]button again.
This should give similar results as the previous solution, but with a cleaner spectrum.
The purpose of this tutorial is to give an overview of the lipid generator.
The lipid generator generates lipid compounds by combining lists of common lipid classes, common fatty acids, and common sphingoid bases. However, there are lipid compounds that are not covered, either because the entire lipid class is not implemented or because they require fatty acids or sphingoid bases that are not in the default lists.
Because the lipid generator creates every combination of the active classes and compounds it will also generate compounds that does not occur naturally. Which compounds that are generated can to a large extent be controlled by setting weights for the different compounds or by activating or deactivating fatty acids or sphingoid bases for each lipid class.
The molecular formula for the compounds in the lists are given in the fully hydrolyzed form, and the lipid compounds are built by condensation reactions, i.e. for each linkage formed, the molecular formulas of the fragments are added and a water molecule is subtracted. The exception is when ether and vinylether bonds are formed, where O2 and H2O2 are subtracted, respectively.
Glycerolipids have the following convention for naming: The letter G denotes that the molecule contains a glycerol, P denotes that it contains a phosphate group. The letters C, E, I and S denote choline, ethanolamine, inositol, and serine, respectively. A number following any of the letters denotes the total number of the group if it is more than one. Numbers in brackets denote how many fatty acids that are found in the molecule. If the number in brackets is followed by o or p it denotes that one of the fatty acids is bound by an ether or vinylether bond, respectively. The bracket is thereafter followed by a specification of the fatty acids, either as total number of carbons and total number of double bonds, denoted by t, or specified for each position, where letters a-c denotes sn-1 to sn-3 position in the glycerol, and x denotes an unknown position. Fatty acids with an additional oxygen (e.g. hydroxy or methoxy fatty acids) are denoted by +O following the number of double bonds.
In this system the compound in Figure 3.1 can be described in several ways with different levels of detail:
Alternative 1 is applied by the lipid generator, but if more details about the structure are known, the compounds can be further specified by alternatives 2-4 within the same system.
Most sphingolipids contain a single fatty acid in addition to the sphingoid base. The naming convention for the sphingolipids is therefore the following:
The letter S denotes that the molecule contains a sphingoid base. The letters C, E, I and S have the same meaning as for the glycerolipids. G and M denotes galactose and mannose units, respectively. Since G is also used for glycerol it is important that is it specified as the first letter in the code if it refers to glycerol. To account for hydrolyzed forms that contain no fatty acid, an F is added as the last letter if a fatty acid is present (in glycerolipids this can be handled by a zero in the bracket)
The bracket in sphingolipids refers to the type of sphingoid base. The numbers in brackets are the total number of carbons and double bonds in the base, and the letter d or t preceding the number denotes two or three hydroxy groups in the base, respectively. The bracket is followed by the number of carbons and double bonds in the fatty acid.
.There are some additional classes. Although platelet activation factor can be described as a phosphocholine with an ether bond (GPC[2o]) and an esterified C2 fatty acid, it is defined as a separate group named PAF. Free fatty acids are named F. Cholesterol is not in the current compound list but is denoted by C or as C followed by the fatty acid if esterified.
The complete list of compound classes with structures and examples are given in Appendix 1 at the end of the document.
The lipid generator window is shown in Figure 3.2. The list of lipid classes that can be generated is shown in the table to the right in the window. The table to the left shows the fatty acid list, fatty acid combinations, sphingoid bases or generated compounds. The fatty acid list is displayed when the window is opened. The list to display is selected in the list selection area. Below the list there are various controls that will change depending on the list shown.
Fatty acid combinationsas the list to display and select
G(diacylglycerols) as the list to display. This shows total number of carbons and double bonds for possible combinations of the fatty acids in the default fatty acid list. The first entry is 16:0 because 8:0 is the shortest fatty acid in the list, and a diacylglycerol contains two fatty acids. If you select
G(triacylglycerols) the first entry will be 24:0 (3×8:0).
Sphingoid basesthe default list of sphingoid bases will be shown, and if you select
Compoundsthe list should be empty because no compounds are generated yet.
Assume that you have been analyzing a mixture of phosphatidyl choline (CPC) and phosphatidyl ethanolamine (GPE) and would like to generate a library for these compounds.
activein the list to the right. Selection of classes can be done by clicking in the
activefield in the list, but if many classes are selected it is more convenient to right-click in the table and selecting
List viewand then select the classes by the
[Ctrl]key and the left mouse button. The same applies to selection in other lists.
[Gen Comp.]button that will generate 316 compounds of each class and display the compound list. The list is sorted according to the molecular mass and you can see that there are isomeric compounds in the list because a PE molecule has the same molecular formula as a PC with three less fatty acid carbons. The first of these pairs appears at 565.73 amu (GPC t20:0 and GPE t23:0).
You can also see that the molecules have different weights. The weights are inherited from the weight given to the fatty acids in the fatty acid list. All fatty acids with an odd number of carbons have a weight of 0.5 and all fatty acids with an additional oxygen have a weight of 0.4. Other fatty acids have a weight of 1. The molecule GPE t23:0 must contain one odd-numbered fatty acid and one even numbered fatty acid and therefore got the weight 1×0.5=0.5. The compound GPE t24:0+O2 at 611.75 amu has a weight of 0.16 because it contains two oxygenated fatty acids (0.4×0.4). Compounds with an additional fatty acid oxygen and odd number of fatty acid carbons have weights of 0.2 (0.4×0.5). If you scroll down the list to 1000.57 you will see that GPE t54:0 has a weight of 0.25. Even though this compound has an even number of fatty acid carbons, the only combination of the fatty acids in the list that can explain 54:0 is two 27:0 fatty acids.
Weightsnext to the
[Filter]button and press the button. This will remove the isomers with the lowest weights. The compounds that were removed will be listed as isomers to the right in the compound list. Note that the majority of the compounds that were removed contain odd-numbered or oxygenated fatty acids that are rare in Nature compared to normal fatty acids with even number of carbons. These are therefore considered less likely to appear in a sample.
The weights in the fatty acid list can be edited by the user. The weights can also be edited in the list of fatty acid combinations. This may for instance be necessary to do if you apply internal standards with odd-numbered fatty acids, to ensure that the standard is not deleted. The list of isomers may tell you about possible interferents with the standard. To ensure that a certain compound is always preferred you can set the weight higher than 1.
Orderthat can be edited by the user. If two isomeric compounds have the same weight the one that was generated first (lowest order) will be kept by the filter. There is also an option to filter only by the order.
weightof PE to 0.3 in the lipid class list, press
[Clear list]under the compound table and set the filter option back to
No filter. Press
[Gen. Comp.]again. You will now see that PE compounds have a maximum weight of 0.3. GPE t20:0+O has a weight of 0.12 (0.3×0.4×1) that is inherited from the PE class and the two fatty acids. If you filter by the weights again you should see that the compounds that are removed all belong to the PE class.
If you want further information about possible isomers you can search for isomers of the compound selected in the compound list in LipidMaps, EMBL or ChemSpider by using the popup-menu below the list.
Assume that you have analyzed phosphatidyl cholines and know that you have some plasmalogens with ether or vinyl-ether bound fatty acids. You may also know that usually you will only find 14:0, 16:0, 18:1 and 20:4 fatty acids in plasmalogens in the sample type you are working with. In this case it does not make sense to create all possible plasmalogens from the default fatty acid list. You can solve this in the following way:
List view. Select
14:0, hold down the
[Ctrl]key and select
20:4. Select the
GPC[2p](phosphocholine with one vinylether linked fatty acid) and repeat the process for this class. Select the default fatty acid list again and thereafter the two class specific lists to verify that the selections are correct. Select
GPC[2p]in the lipid class list and press the
[Gen. Comp.]button. This should generate approximately 300 ordinary PC compounds from the default fatty acid list and 10 compounds in each of the two other classes from the specific lists.
FA combinationsin the list selector and display the list for the
GPC[2p]class. Deselect one of the entries, select the compounds list and press
[Clear list]. Generate the compounds again and verify that there are now only nine compounds generated for the GPC[2p] class and that the number of compounds for the two other classes are the same.
The hierarchy of the different lists and functions is illustrated in Figure 3.3. Fatty acid lists for the specific classes are generated from the default fatty acid list, the fatty acid combinations (1 to 4 fatty acids) are thereafter generated from the fatty acid lists. The fatty acids are combined with the core molecule, and a sphingoid base the case of sphingolipids. The generated lists of molecules can be filtered (optionally) and thereafter saved to be used in the library function. Because there is a separate filtering step in the library function, unique spectra can also be generated from unfiltered compound lists.
The fatty acid lists for each class is generated from active compounds in the default fatty acid list the first time the list for a class is displayed or used to generate lipids. The same applies for the combination lists. So if lipid classes are generated one by one, any edits to the default list will affect which compounds that are generated for each class. The controls for the different lists are described below:
If the default fatty acid list is shown:
[Deal FA]will update all the class specific fatty acid lists according to the selections and weights of the default list.
[Deal FA + Comb.]will update all the class specific lists according to the selections and weights of the default list and update the combination lists.
If a specific fatty acid list is shown:
[Fetch FA]will update the displayed fatty acid list according to the current default list.
[Upd Comb]will update the combinations list according to the displayed fatty acid list.
If a fatty acid combinations list is shown:
[Calc from FA]will update the combinations list according to the current fatty acid list for the class.
If the compound list is shown:
[Del Sel.]deletes the selected compound.
[Clear list]deletes all compounds.
The default list of fatty acids may not be suitable for all sample types. User defined lists of fatty acids can be created and are stored in the
libraries folder as semicolon delimited csv files with the file names
falist_......csv. The number of carbons and double bonds must be specified. Optionally you can also specify additional oxygens, the weight, and whether the fatty acid is set as active or not by
0. If nothing is specified the default settings are no additional oxygens, weight 1, and active. An example of a fatty acid list opened in a text editor is shown below.
Assume you are analyzing triacylglycerols in vegetable oil. The G class will generate more than 1000 compounds when used with the default fatty acid list. Many of these will not be naturally present in vegetable oil and can lead to incorrect identifications and poor quantification.
Vegoilnext to the
[Imp. FA list]and press the button. This list will only generate 263 compounds that is more likely to be found in vegetable oils.
[Deal FA + Comb]after the fatty acid list is imported before you generate a new list of compounds. This will update the data lists for the classes. Alternatively you can close and open the window again to clear old data.
The generated compounds can be saved in two formats. Both these formats can be read by the library function.
You can save the list and all settings by selecting a file name next to the
[Save Settings] button. This will save all lists with selections and weights at the time you press
[Save Settings]. It is a good option if you have created a compound list based on other than the default settings and want to save all your modifications.
The other option is to save the data as a compound list in csv format that can be edited. These are saved in the
libraries folder as
cmplist_... .csv. An example of a compound list is shown below:
weight must be specified. Other fields are optional. A user defined list does not have to follow conventions for names and classes used by the lipid generator. Note that atoms in the molecular formula should be given in the order H, D (deuterium) ,C, Cl, N, Na, O, P and S. Other atoms are currently not handled.
Mylistnext to the
[Imp. Comp. List]button and press the button. This will import an experimental compound list with 96 PC and PE compounds. If you filter it you will see that two of the molecular formulas have isomers. Since all weights in this case are one, the compound that was specified first in the imported list is preferred, and the other compound is listed as an isomer.
[Deal FA + Comb.].
GPEin the lipid class list. Ensure that filter is set to
No filterand press
[Gen. Comp.]. This will add approximately 560 new compounds to the list. Set the filter to
Orderand press the
This will give you a filtered list consisting of both imported and generated compounds. If the imported compounds are covered by the generated compounds the name will be from the imported list and the generated compounds are shown as isomers. If an imported compound has no isomers it means that it is not covered by the generated compounds (such as PC28:6 at 665.85 amu). If the fatty acid composition of your sample is known you can use this procedure to create a fatty acid list and verify the experimental identifications. A compound that cannot be explained by the fatty acid composition of the sample is not correctly identified.
###3.8. Sphingoid bases
Because sphingolipids can vary in number of carbons and double bonds both in the fatty acid and in the sphingoid base, activating the full list of sphingoid bases will create a lot of isomers. The common sphingomyelin SPCF[d18:1] 16:0 (C39H79N2O6P at 703.04 amu) can for instance have the following isomers: SPCF[d17:1] 17:0, SPCF[d18:0] 16:1, SPCF[d17:0] 17:1, SPCF[d20:0] 14:1, SPCF[d20:1] 14:0, SPCF[d14:1] 20:0 and SPCF[d16:1] 18:0. To reduce the number of isomers, only the following sphingoid bases are set as active by default:
d17:1 is set with a weight of 0.85, which is higher than the default weight of fatty acids with odd number of carbon atoms (0.5). This means for instance that SPCF[d17:1] 16:0 is preferred over SPCF[d18:1] 15:0. For reasonable identifications based on the molecular mass alone it is therefore important to consider whether one can expect that odd-numbered sphingoid bases are more likely than odd-numbered fatty acids, and adjust the weights if necessary.
d18:1 (Sphingosine) and d18:0 (Sphinganine) are the most common sphingoid bases, and it can be a good approach to test an unknown sample first with only these two activated to check if they adequately explains the data.
The puropose of this tutorial is to give an introduction to the Profile Analyzer that will be applied also in Tutorials 5 and 6.
Masshunter XMLas file type
\Tutorial 4and read the file
This is a low resolution profile spectrum of porcine brain phosphatidyl choline
[Analyze Prof.]button at the bottom of the Import Spectra window. This will take you to the Profile Analyzer
The Profile Analyzer resolves the spectrum into individual peaks and is used for the following purposes:
This tutorial will give an introduction to the Profile Analyzer, and you will use it to create a centroided spectrum that can later be resolved by LSSR. The Profile Analyzer window is shown in Figure 4.1.
At the bottom left there is a table of check boxes that controls which methods and functions to apply to the spectrum, and the associated parameters for the different functions are given in the field to the right. A summary of the functions are given below:
Subtract baseline estimates the spectrum baseline and subtracts it. The baseline is found by comparing a smoothed spectrum with the raw spectrum and the parameter
Baseline win. size sets the size of the smoothing window in data points. A value of 5 should be appropriate in most cases, but it can be beneficial to increase it if there are many data points per peak. If the value is too high compared to the number of data points per peak it may detect baseline points at peak maxima, where the peaks have slope near zero. Such points will usually be filtered, but the detection algorithm may fail if there are many such points.
Baseline threshold decides how the baseline is estimated from the data points assumed to be at baseline. A lower value will elevate the baseline. The baseline points are shown as small black dots in the spectrum, and the estimated baseline is shown as a black line. The best way to find the parameters to use is by visual inspection of these.
Detect peaks decides if the analyzer should also perform peak detection. There are two parameters that affects the detection,
Peak sensitivity and
Peak maxnumber. A higher value for the sensitivity may detect more peaks, until the maximum number set by
Peak maxnumber is reached. Detected peaks are shown by green markers. Clicking on these will display the m/z value and the height.
Fit peaks decides how the peak models should be estimated. This is done by iterative algorithms, and which algorithm to use is determined by the
Method parameter, which can be set to
alternating global or
Methodis set to
independent, each peak will have its own peak model. This will usually give the best fit to the raw spectrum, and should be used for instance for centroiding or direct quantification based on the peak models. Because peaks may be overlapping, which cause interference, the peak fitting is run several times, using the estimates from the previous round as new start estimates.
regressionalso applies individual peak fitting and use the same algorithm as
independent, but after the models have been calculated in each round they are updated with a common model for all peaks that is calculated by regression on the individual peak descriptors.
alternating globaltry to describe all peaks with a single model. This will usually give a poorer fit to the raw data than
independentbut it may be a better choice if a general model for the peak shapes in the spectrum is more important than the quantitative accuracy.
Modelthat can be
cauchy(Lorenzian peaks) or
voigt, which is a mixture of Lorenzian and Gaussian models. Theare are also combination models like
cauchy/normalwhere different models are used to describe the left and right side of the peak (left specified first).
Postfit m/z does a separate optimization of the peak position after the models for peak shapes have been found. Thus, it provides better accuracy for mass determination.
Autodetect peak widths will apply initial peak widths from the peak detection algorithm. These estimates may be poor if there are few data points per peak. If the option is unchecked, the algorithms will test peak widths within the range specified by min/max
RWHM, which means that there parameters should be set properly.
Allow asymmetry allows the peak models to be asymmetric. Unchecking the option will speed up the algorithms, but it will usually give poorer fit to the raw data. If checked, allowed peak width range is specified by max/min
RWHM (left and right width at half maximum in mass units). If unchecked, allowed peak width range is specified by
max/min FWHM (full width at half maximum).
Vary kurtosis (vf) allows peaks to have kurtosis in the case of Gaussian profiles, and it allows the degree og "Cauchyness" to vary in the case of a Voigt model. It has no effect on Cauchy models. The parameters min/max kurtosis (vf) sets the limits for the models. In the case of Voigt models, a vf of 0.5 implies a pure Gaussian model, and a vf of 1.5 implies a pure Cauchy model.
Peak width m/z dependence and
Kurtosis m/z dependence allows the global peak models to be dependent on mass. If unchecked the models will be the (robust) mean of each individual peak, if checked, the peak widths and kurtosis/vf can be linearly increasing or decreasing with mass. These parameters have no effect on the individual peak profiles fitted by
independent, but it will affect the global model calculated from the individual peaks.
Resolve shoulders splits peaks detected with shoulders if this gives a significant reduction of the residuals.
The other parameters are the following:
Max iterationscontrols the maximum number of iterations in iterative procedures. Decreasing the value may give faster optimization, but on the cost of accuracy.
Stop criterionis a general setting for how similar results must be before the iterative methods will be stopped. Increasing the value may give faster optimization, but on the cost of accuracy.
Roundsare the maximum number of rounds applied in the independent and regression procedures. Note that the maximum number of iterations in these procedures is the
Max iterationsparameter divided by
Rounds. So setting
Roundstoo high may have a negative impact on accuracy.
Shoulder sens. is the sensitivity for the shoulder detection function. Higher values increase the sensitivity (more peaks may be regarded as having shoulders).
This section will teach you how to work with different functions and plot options.
[Analyze]button to remove all data about the spectrum.
Subtract baselineand press
[Analyze]again. You will now have an estimate for the baseline. If you select
spectrum datain the plot options you will see the equation describing the baseline.
Profileplot, check the
Detect peaksoption and press
[Analyze]. The green dots mark where the peaks were detected. You can increase number of peaks by increasing
peak sensitivityfor instance to 85. If you choose the plot option
Peak datayou will get the data for each peak, with preliminary estimates of mass, width and height.
Profileplot, check the
Fit peaksoption and press
[Analyze]. The green profile is the sum of the peak models, the red curve is the residuals (baseline adjusted raw data minus sum of peak models). As you can see there is a significant residual under the largest peaks, which means some degree of mismatch between the raw data and the models.
Autodetect peak widths,
Vary kurtosis (vf)and thereafter pressing
[Analyze]. The residuals should now be much smaller. If you choose the plot option
Peak statisticsyou will see that there is a general difference between LWHM and FWHM, which means that the peaks are asymmetric. There is nothing in the plots that indicate that peak widths or kurtosis depend on mass, so the parameters
Peak width m/z dependenceand
Kurtosis m/z dependenceshould remain unchecked. If you choose the plot option
Peak datayou will now see more detailed (and accurate) data for each peak. There are several estimates for areas and height. See section 4.5 for a further explanation of these. The spectrum data gives you the estimates for the global peak model described by LWHM, RWHM and kurtosis.
The standard LSSR algoritm works only on centroided and binned data. In this section you will use the Profile analyzer to generate a centroided spectrum to be resolved by LSSR.
[Defaults]above the methods and functions table. Thereafter set the
peak sensitivityto 85 and press
Centroidedplot option you will see the centroided mass spectrum. Press the
[Conv. Centr.]button to convert the profile spectrum to the centroided spectrum. This will take you back to the "Import spectra" window, and a centroided spectrum is displayed on top of the profile.
Finaland you will see that the masses have decimals. These have to be converted to integers before you continue with LSSR.
Bin centroidedin the menu to the left of the plot. and press
The library you are going to use in LSSR is already made, but if you want to create it yourself the parameters are: GPC, GPC[2p] and SPCF lipid classes with the respective weights 1, 0.5 and 0.75, [+H]+ spectra, resolution 1 and mass offset 0.2.
[LSSR Win]button, and in the LSSR window you click the
[Load Libr.]button you select
Tutorial-4-5and press the button followed by
[Calculate]and refine the results by pressing
[Recalc (sel)]. Verify that the two major compounds are GPC t34:1 (16:0+18:1 PC) and GPC t36:1 (10:0+18:1 PC).
The procedure for finding baseline points is based on correlation between a profile estimated by quadratic functions and the raw data within a window around each m/z value in the profile spectrum. If the point is at baseline there will be poor correlation because the data contain only noise. The parameter
Baseline win size sets the window size around each point.
Baseline threshold sets the sensitivity of the method. If the value is set to 10, the points with the 10% lowest correlations will be selected as initial baseline points. An iterative procedure is thereafter applied to remove points with large positive residuals (over 3 standard deviations) when a straight line is fitted through the baseline points. Finally, the baseline equation is calculated by regression on the intensities and m/z values of the remaining baseline points.
The parameters that affects the baseline are:
Baseline estimation (on/off).
Figure 4.2 shows a normally distributed peak with some noise, the profile of its first derivative, and the profile of the first derivative multiplied with the original profile (all three profiles are normalized to same max). Peak detection is based on the derivative multiplied with the raw profile. This is less affected by noise outside peak regions than the derivative alone.
Peak starts (point a in Figure 4.2) are detected where this profile exceeds a the upper threshold, and peak ends (point b in Figure 4.2) are detected where the profile exceeds the lower threshold. After the initial peak detection there are filters for peak starts with no matching ends and ends with no matching starts. In addition the results are filtered for deviating peak widths.
Peak maxima are determined by fitting a cubic polynomial between points c and d (max/min) in the profile and solving for intensity equal to zero. The final peak widths (LWHH, RWHH) are thereafter determined on the original profile.
The parameters that affects peak detection are:
Detect peaks (on/off). If this is not selected it is only baseline subtraction that will be performed.
Peak sensitivity. This is used to set the upper/lower threshold. The upper threshold value is based on the median of of positive values in the derivative multiplied with the raw profile. This number is thereafter multiplied with 10(5-(sensitivity/20)). Since sensitivity is subtracted in the exponent, increased sensitivity means lower threshold. The lower threshold is calculated the same way, except that it is based on he median of of negative values in the derivative multiplied with the raw profile
Peak maxnumber. Filter for the maximum number of peaks to detect. Set to
infto disable this filter.
Fitting of profiles
Peak profiles will be fitted to the detected peaks if the parameter
Fit peaks is selected. There are several types of peak fitting. Each peak will be modelled separately if the method
independent is selected. If not, the program applies a common model for the shape and widths of the profiles. There are different types of peak models that can be applied,
voigt (Gaussian/Lorentzian), and hybrid models where there are different models for the left and right side of the peak.
With normally distributed profiles the program can model three parameters for each peak in addition to the position, left width at half maximum (LWHM), right width at half maximum (RWHM) and kurtosis. A kurtosis value above one means a platykurtic peak and kurtosis below one means a leptokurtic peak. With Voigt profile the kurtosis parameter is replaced by the "voigt factor" (vf). A Voigt factor of 1.5 means that the peak is a Cauchy peak, a value of 0.5 means that the peaks is Gaussian, and values between these are mixtures of the two profile types. Normal and Voigt profiles with varying kurtosis and Voigt factors are shown in Figure 4.3. The two blue profiles are identical because both represent the normally distributed peak with no kurtosis.
The peak profiles are fitted to the raw profile by an iterative method based on the Nelder-Mead algorithm. If the option
Autodetect peak widths is checked, the ranges of initial peak widths are based on the estimates from the peak detection function. Otherwise the limits for peak widths are decided by the parameters
Min RWHM and
Max RWHM if the peaks are allowed to be asymmetric (
Allow asymmetry checked). The corresponding parameters for symmetric peaks are
Min FWHM and
Max FWHM. There are also similar settings for Kurtosis/Voigt factor.
The maximum number of iterations are set by the parameter
Max iterations. There is also a stop criterion, which is a percent value (1% by default). For each iteration the lowest residual is stored. If the development in the lowest residual for the last 100 iterations is lower than 1% of the total range of residuals, the iteration procedure is stopped.
There are four procedures for fitting the peaks. If
Independent models are selected, each peak is fitted individually. The position of the peak maximum is then one of the factors that are varied in addition to the peak shape parameters. Since the fitting of a peak may be dependent on the position and shapes of overlapping neighbours it is beneficial to repeat the entire peak fitting procedure several rounds. The maximum number of rounds is decided by the
If regression or global models are selected, the optimization uses the same peak models on all peaks, but the parameters are allowed to be dependent on mass (
Peak with m/z dependence and
Kurtosis (vf) m/z dependence). In this case the position of one of the peaks is optimized between each iteration. The number of iterations should therefore be substantially higher than the number of peaks, so that the maximum for each peak is fitted several times.
After the iteration, the statistics for the peaks are calculated. For independent models this is done by regression on all the peaks. For global models the statistics is acquired directly from the model with the lowest residual.
There are different ways of estimating the sizes when profiles are converted to centroids,
Norm area raw,
Norm area prof,
Height raw and
Height prof. The difference between the raw and prof estimates are illustrated in Figure 4.4. The figure shows an estimated peak profile (red) and this profile fitted to raw data acquired with relatively low frequency (blue). Size estimates ending with "raw" are acquired from the fit to the raw profile, and estimates ending with "prof" are estimated from the theoretical distribution. Particularly on the height estimates, there may be substantial differences between the raw and prof estimates, and the prof estimates can be assumed to be most accurate.
For estimates of areas, there are also variants starting with "Norm". This is the area multiplied with the distance between each point on the m/z scale (0.2 in figure 4.4). These areas will be independent of scan frequency.
In this tutorial you will use least squares spectral resolution (LSSR) to quantify phospholipids acquired by direct infusion mass spectrometry. The LSSR-P algorithm is an extension of LSSR for profile spectra.
LSSR was initially developed for centroided spectra, where the intensity of each peak is described by a single number. In profile spectra the peaks must be described by by vectors and the signals from the different peaks may be overlapping.
The differences between LSSR and LSSR-P are illustrated in Figure 5.1, where Figure 5.1a represents information about a single compound (PC 32:0 in this case) that is available from the library. This is the exact masses and the expected intensity of each mass.
In LSSR-P it is necessary to estimate the profiles of the library spectra and match these to the raw spectrum profile (Figure 5.1b). In LSSR for centroided spectra, the binning of both libraries and raw spectra to the same bins ensures that the mass values match, which is illustrated by the grey boxes in Figure 5.1 a and b.
To calculate the library profiles, it is necessary to know the exact masses and the expected intensity of each mass for the different compounds, which is available from the library. In addition, a model for the peak shape is needed, and this is provided by the Profile Analyzer.
The two algorithms are compared in Figure 5.1d. LSSR-P has one additional step. As illustrated in Figure 5.1b, there can be small mass shifts in the raw data compared to the exact masses provided by the library. Finding these mass shifts is handled by iterative optimization in LSSR-P, using the Nelder-Mead algorithm. The calculated shift can be equal for all compounds or they can be linearly dependent on the mass. Because LSSR-P is an iterative procedure, it is slower than LSSR. But in most cases it can be expected to be more accurate, because the centroiding and binning that is necessary for ordinary LSSR lead to loss of information.
The data to be analyzed are five samples where the first four are egg PC, egg sphingomyelin (SM), bovine brain PC and bovine brain sphingomyelin, and the last sample is a mixture of the first four. The data are acquired in direct infusion mode by a low resolution triple quadrupole instrument.
The data set is stored as mz5 raw data, which must be read by the function for importing chromatograms.
[Import Chrom]in the main window
\Tutorial-5. Thereafter select all 5 files and press the
[Load Sel.]button. Press
[Accept]when the data have been loaded.
TICoption, so that only one file is shown. Ensure that this is the first file, named
LR-Prof_A_PC_Brain, which contain ordinary brain PC.
Range by 20% thresh. Thereafter press
Analyze profile. This will take you to the Profile Analyzer.
[Analyze]without changing any parameters. Select
Peak statisticsas plot option and check that there is no pattern indicating that the residuals are mass dependent. If they show mass dependence, you should activate
Peak width m/z dependenceor
Kurtosis (vf) m/z dependence. In this case it is not necessary. Choose
Profileas plot option again.
What is important in this case is that you get a good general estimate of the peak shapes. Whether there are some residuals on some of the peaks is less important. There is also no need to detect all peaks since you only need enough peaks to get good general models.
alternating globaland thereafter press
[Analyze]again. You can see the equations (baseline, LWHM, RWHM, kurtosis) for the global model by selecting
Spectrum dataas plot option. Leave the Spectrum Analyzer by pressing
[Accept], which will transfer the equations to the data file.
The next thing you have to do is to generate or import a library. As illustrated in Figure 5.1d, LSSR-P does not apply the binned libraries. You can therefore use the same library as in Tutorial-4, even though this is binned to unit resolution with an offset.
[Library]button in the main window. In the library window, select
Tutorial-4-5next to the
[Load Libr.]button and press the button. Leave the library window by pressing
[Accept], which will transfer the library to the memory.
[LSSR-P Win]button which will take you to the LSSR-P window.
[Calc All]button, which will show you the results.
For more advanced usage you can step through the different processes illustrated in Figure 5.1 by the
[Get Cand] (get candidates),
[Filter Cand] (filtering),
[Fit] (determination of mass shifts) and
[Quantify] (quantification) buttons. In each step you can make changes in the "Candidates", "Filtered" and "Results" tables. You may for instance replace compounds in the filtered table with compounds with similar mass to see if it reduces the residuals.
The results plots for the different samples should look approximately as in Figure 5.2. Verify that the two samples of ordinary PC contain basically PC, and that the two samples of SM contain basically SM (marked SMCi or SPCF). In the mixture (Figure 5.2e) you should find all the major compounds.
The same samples have also been analyzed by direct infusion of a time-of-flight (TOF) instrument. In this case you read the spectra by the
[Import Spec] function.
JEOL jpfas the file type, thereafter select
\Tutorial-5, select all 5 files in the folder and press
[Accept]when the spectra are imported.
HR-Prof_A_PC_Brain, is selected, right click on the spectrum and select
In this case the profiles are not well described by variants of the normal distribution. If you zoom on the largest peaks you will see that the peaks are asymmetric and tailing. The correct peak model to use in this case is the
normal/voigt, where the left side of the peak is normally distributed and the right side of the peak has a Voigt distribution.
normal/voigtas the model to use. You can run a first test with
independentas method and check if there is any mass dependence. There should be no clear mass dependence, so you can also directly select
alternating globalbefore you press
[Analyze]and leave the window with
You can use the same library as for the low resolution data. Load the library as explained in Section 5.2 if it is not already in the memory.
In this case there is a quite large mass shift compared to the width of the peaks.
[Library]button, check the
Show Libr. Bpoption to show where the base peaks of the library spectra are located. You should then get a plot similar to Figure 5.3 a.
The difference between the exact mass and the peak maximum is therefore −0.127 (760.585 − 760.712), which is similar to FWHM (approx 0.11). The parameter
m/z shift window sets a max limit for m/z shift in percent of the peak width, and the default is 50%. You can increase this value, but it is usually better to manually adjust the m/z scale by the
m/z offset parameter.
m/z offsetto the difference between the exact and measured mass, −0.127. This should shift the mass spectrum to fit better with the library data. It should then look approximately like in Figure 5.3 b, where the blue curve is the shifted profile and the grey curve is the original profile.
[Calc All]button to get the results.
You can analyze the remaining samples with the same procedure. The mass offset of −0.127 can be used for all samples. The results should be similar to those shown in Figure 5.4. There are small differences between the two instruments even though there is a large degree of overlap in the low resolution spectra. However, in general the high resolution instrument detects a few more samples and it will usually be better in cases where samples are very complex.
In this tutorial you will apply the peak models from the Profile Analyzer to determine the mass of unknowns. The tutorial has two sections, one for low resolution and one for high resolution data.
The data are six replicates of brain phosphocholine analysed by direct infusion on a triple quadrupole instrument. It is the same sample as applied in Tutorial 5. We therefore know that the three most abundant compounds are PC 34:1, PC 36:1 and PC 36:2. However, the sample is now spiked with the saturated PCs 24:0, 28:0, 32:0, 36:0, 40:0, 44:0, and 48:0. These will be used as reference series for the mass calibration.
\Tutorial-6. Select all the six samples and press
[Load Sel]. Return to the main window by pressing
TICas display option.
You will now have to apply the Profile Analyzer on all six data files in the same way as you did in Tutorial 5. The procedure for the first file is:
Range by 20% thresh.
You should now be in the Profile Analyzer.
[Analyze]once to ensure that you are using the right peak model. There should be almost no residuals.
alternating globaland press
[Analyze]again, since it is critical in this case to apply the same models on all peaks.
Repeat the steps above for the remaining 5 data files.
[Mass Calibr.]button in the main window. This will take you to the mass calibration window.
Spectra: [M+H]+is selected.
You should now have a table showing the accurate masses for the library compounds, and there will be vertical grey lines in the spectrum indicating the masses of the two largest peaks from each library compound. The next step is to match these masses from the library to the peaks in the spectrum.
[Match All], and all the grey lines should turn pink, indicating that they have been matched to a peak. The plot of the spectrum should look like in Figure 6.1, except or the three red arrows that indicate which peaks to work with later. You can use the
[>]buttons to go through the spectra and check that all peaks are correctly matched.
Match plotat the bottom of the table you will get a plot that explains the difference between the library masses and the matched peaks. In this case it will vary from approximately 0.16 to 0.23 mass units. Closed circles are the values for the current spectrum. Open circles are the values for the other replicates. Thi plot should look like Figure 6.2.
The next step is to find a regression model that explain the differences . This function will later be used to calculate the accurate masses of the other peaks in the spectrum. The plot shows that the differences are dependent on the mass and that the dependence is close to linear. We should therefor apply a linear regression model.
Polynom ord. 1in the field under the
[Regression]button and also that
Robustregression is checked. Thereafter press [Regr. All].
Two regression lines will now be shown in the plot. The black is the regression model for the current spectrum. The grey is the regression line based on all the other replicates (the open circles).
[Prediction], and a table of predicted masses will be shown.
We know from Tutorial 5 that the three most abundant compounds in the sample are PC 34:1, PC 36:1 and PC 36:2. The base peaks of these are indicated by the red arrows in Figure 6.1. We should now assume that these are unknown compounds and check the predicted masses versus the Lipid Maps database.
|Raw m/z, [M+H]+||760.8|
|fitted m/z, [M+H]+||760.7626|
|Predicted m/z, [M+H]+ (internal calibr.)||760.5751|
|Predicted m/z, [M+H]+ (external calibr.)||760.5729|
|Predicted neutral mass (internal calibr.)||760.5751|
|Predicted neutral mass (external calibr.)||759.5656|
|Internal calibr. RMSE||0.0095|
|Internal calibr. median residual||0.0066|
|External calibr. RMSE||0.011|
|External calibr. median residual||0.0075|
|Mean neutral mass (internal calibr.)||759.572|
|Median neutral mass (internal calibr.)||759.5739|
|Number of samples, consensus estimates (internal calibr.)||5|
|Mean neutral mass (external calibr.)||759.5752|
|Median neutral mass (external calibr.)||759.5755|
|Number of samples, consensus estimates (external calibr.)||6|
There are several estimates for the mass. You can apply the
Median neutral mass based on internal calibration that is shown with bold text. This is the median of all six replicates, so it is enough to acquire the statistics from a single sample. The two error estimates show how the predictions vary between the replicates. They indicate that the error may be slightly lower than 0.01 mass units.
In the following web page you can search the Lipid Maps database by mass:
Mass (m/z). Set
M(neutral)and mass tolerance to
+/- 0.01and press the
[submit]button on the web page. You will see that all the alternatives are variants of PC 34:1 with molecular formula C44H87NO8P. So the accuracy is good enough to identify this as PC 34:1 as long as you can assume that it is a lipid.
The accurate masses from Lipid Maps and the predictions for the three most abundant compounds are compared in Table 6.2. None of the errors were above 0.01 mass units and none of the compounds have any interference within this mass tolerance. It is worth noting that the errors are much smaller than the digital data resolution of the instrument, which is 0.1 mass units.
You can play around with the mass tolerances to see when interferents (compounds with different gross formula) appear among the alternatives.
|Accurate mass||Predicted mass||Error|
The high resolution data are from a time-of-flight instrument. The samples are exactly the same as used with low resolution.
[Import Spec]button, Set
JEOL jpf, select
\Tutorial-6, select all six files, press
[Load Sel]and exit with
alternating global. This is similar settings as used in Tutorial 5.
[Accept]to transfer the models to the data file
[Mass Calibr.]button that opens the mass calibration window.
The procedure to analyze the high resolution data is the same as used before.
If you inspect the difference plot (Fig. 6.3) you will see that there is a tendency to curvature in the data, so it may be beneficial to change the model to
Polynom ord. 2. If you step through the replicates using the
[>] buttons you will also see that there is a tendency to drift over time.
If you click on the peak numbers and compare predictions to the correct values for the same three compounds analyzed with the low resolution data you should get values similar to those in Table 6.3. The average error is approximately 1/3 of what it was with the low resolution instrument. The FWHM for the high resolution instrument was 0.10, while it was 0.65 for the low resolution instrument, which means that the peak width was more than 6 times higher for the low resolution instrument. It is therefore no direct link between resolution and the expected accuracy of the masses after calibration.
|Accurate mass||Predicted mass||Error|
|CodeCore||Explanation / short Example|
|F||Free fatty acid / FFA|
|G||Mono-acylglycerol / MAG|
|G||Di-acylglycerol / DAG|
|G||Tri-acylglycerol / TAG|
|GP||Lyso-phosphatidic acid / LPA|
|GP||Phosphatidic acid / PA|
|GPC||Lyso-phosphatidylcholine / LPC|
|GPC||Phosphatidylcholine / PC|
|GPC[1o]||Plasmanyl-lyso-phosphatidylcholine / LPCo|
|GPC[2o]||Plasmanyl-phosphatidylcholine / PCo|
|GPC[1p]||Plasmenyl-lyso-phosphatidylcholine (LPC plasmalogen) / LPCp|
|GPC[2p]||Plasmenyl-phosphatidylcholine (PC plasmalogen) / PCp|
|GPE||Lyso-phosphatidylethanolamine / LPE|
|GPE||Phosphatidylethanolamine / PE|
|GPE[1o]||Plasmanyl-lyso-phosphatidylethanolamine / LPEo|
|GPE[2o]||Plasmanyl-phosphatidylethanolamine / PEo|
|GPE[1p]||Plasmenyl-lyso-phosphatidylethanolamine / LPEp|
|GPE[2p]||Plasmenyl-phosphatidylethanolamine (PE plasmalogen) / PEp|
|GPI||Lyso-phosphatidylinositol / LPI|
|GPI||Phosphatidylinositol / PI|
|GPI[1o]||Plasmanyl-lyso-phosphatidylinositol / LPIo|
|GPI[2o]||Plasmanyl-phosphatidylinositol / PIo|
|GPS||Lyso-phosphatidylserine / LPS|
|GPS||Phosphatidylserine / PS|
|GPS[1o]||Plasmanyl-lyso-phosphatidylserine / LPSo|
|GPS[2o]||Plasmanyl-phosphatidylserine / PSo|
|G3P2||Cardiolipin / CL|
|SF||Ceramide / Cer (Sphingosine base)|
|SF||Ceramide / Cer (Phytosphingosine base)|
|SPCF||PC Ceramide / SMC (Sphingomyelin, sphingosine base)|
|SPCF||PC Ceramide / SMC (Sphingomyelin, phytosphingosine base)|
|SPEF||PE Ceramide / SME|
|SPIF||PI Ceramide (IPC) / SMI (Sphinganine base)|
|SPIF||PI Ceramide (IPC) / SMI (Phytosphingosine base)|
|SMIPF||Mannosylinositol phosphoceramide / MIPC (Sphinganine base)|
|SMIPF||Mannosylinositol phosphoceramide / MIPC (N,N-dimethyl-Safingol base)|
|SMI2P2F||Inositol phosphomannosylinositol phosphoceramide / MIP2C|
|SGF||Galactosyl ceramide / GCer (Sphinganine base)|
|SGF||Galactosyl ceramide / GCer (Sphingosine base)|
|PAF[1o]||Platelet activation factor / PAF|