Papers by Karoly Heberger
The authors have completed a meta-analysis of gene lists using simulated and real data. Indeed, m... more The authors have completed a meta-analysis of gene lists using simulated and real data. Indeed, many studies answer the same-similar questions and hence, they can be combined in a meta-analysis to find a consensus or a more reliable answer. The selected performance parameter (merit) was the coverage rate.
Medical Applications of Mass Spectrometry, 2008
Publisher Summary This chapter provides an introduction to chemoinformatics covering multivariate... more Publisher Summary This chapter provides an introduction to chemoinformatics covering multivariate mathematical–statistical methods for data evaluation. It is expedient to distinguish the variables on the basis of three scales: nominal, ordinal, and numeric. The nominal scales are qualitative only and can be measured solely in terms of whether the individual items belong to some distinctively different categories. The ordinal scales are also qualitative, but they can rank (order) the items measured in terms of which has less and which has more of the quality represented by the variable, and the numerical scale is quantitative in nature. There are numerous applications evaluating results of instrumental–analytical methods such as mass spectrometry, NMR spectroscopy, and chromatography using chemoinformatics. Combinations involving NMR spectroscopy and chromatography have many applications in the biomedical field. On the other hand, although chemometric techniques are frequently applied to analyze mass spectral data, applications in the biomedical field are rare. Multivariate data analysis is applied to mass spectrometry, especially revealing relationships of mass spectral data and chemical structure. Pyrolysis mass spectrometry and chemometrics are coupled to analyze the adulteration of orange juice quantitatively, to test the authenticity of honey, and to discriminate the unfractionated plant extracts.
Food Quality and Preference, 2015
In product development using JAR (just-about-right) scales, it is important to identify precisely... more In product development using JAR (just-about-right) scales, it is important to identify precisely, which direction of a given attribute affects hedonic scores the most. The Generalized Pairwise Correlation Method (GPCM) is a non-parametric one and it is useful to rank JAR variables according to their impact on liking. This is done using appropriate statistical tests: the McNemar's, the Chi-square, the Conditional Fisher's and the Williams' t-test. As GPCM requires one-directional variables, JAR data needs to be transformed based on the dummy variable approach. GPCM gives those attributes in that order, which should be increased/decreased to gain higher consumer liking scores. An order can be created according to the impact on liking, which order determines the development of product attributes, as well. The non-parametric tests incorporated in the method are able to identify smaller differences than other statistical methods. As a result, GPCM identifies more significant product attributes; hence, it can help product development processes even if other methods cannot.
Journal of High Resolution Chromatography, 1998
CORE View metadata, citation and similar papers at core.ac.uk provided by Repository of the Acade... more CORE View metadata, citation and similar papers at core.ac.uk provided by Repository of the Academy's Library Tartalomjegyzék* A résztvevők tervezettől való eltérésének okai Az elvégzett munka megfelel az eredeti célkitűzéseknek Zárójelentés témák szerinti bontásban HPLC oszlopok, kromatográfiás rendszerek osztályozása Gázkromatográfiás mennyiségi szerkezet-retenció összefüggések (QSRR) keresése Új változószelektálási módszerek kidolgozása Egyéb, környezetvédelmi, gyakorlati jellegű szerkezet-hatás vizsgálatok Az eredeti szerződésbeli programhoz fűzött megjegyzések Az eredmények gazdasági hasznosíthatósága 12 A kutatási tevékenység rövid összefoglalása (magyarul) 12 A kutatási tevékenység rövid összefoglalása (angolul) 13 Melléklet: Különlevonatlista az adott időszakra (2002-2004) 13 * Mivel az egyes mezőkbe többszöri próbálkozás után sem sikerült szöveget beírni vagy bemásolni, kénytelen voltam az egyes kérdésekre adott válaszokat egyetlen szöveges dokumentumba integrálni. A résztvevők tervezettől való eltérésének okai: Az eredeti szerződésben két doktoráns (Farkas Orsolya és Stadler Krisztián) szerepelt. Már az első éves részjelentésben megírtam, hogy Stadler Krisztián helyett Vanyúr Rozáliát vontuk be a munkába. Sajnos azonban ő sem maradt véglegesnek. Ma már egyikük sem dolgozik az intézetben. Stadler Krisztián Amerikába távozott, Vanyúr Rozália 2003 augusztusától nem dolgozik a csoportunkban, később az intézettől is megvált. Ezért Forlay-Frick Pétert, aki akkor a Brüsszeli Egyetem munkatársa volt, kértem fel a számítások elvégzésére, aminek nagy örömmel tett eleget. Megjegyzem meg, hogy a 4 éves periódus alatt 7 PhD-s fordult meg a csoportunkban (és 3-an készítettek diplomamunkát). Az első évben munkát kezdő négy közül ma már csak egy dolgozik nálunk, egy szülési szabadságon van. Az elvégzett munka megfelel az eredeti célkitűzéseknek. Megjegyzendő, hogy a négy év folyamán jelentős előrehaladást értünk el és ezért jelentősen kibővítettük az elérendő célokat, többek között azzal a háttérfeltevéssel, hogy ezért majd könnyebben kapunk a pályázat folytatására pénzt. Talán nem szükségtelen megemlíteni, hogy várakozásainkban keserűen csalódtunk. Hangsúlyozni szeretném azonban, hogy nem arról van szó, hogy minden lehető és lehetetlen problémával elkezdtünk foglalkozni, és minden cikkre, amire megtehettem ráírtam a nevemet. Ez könnyen ellenőrizhető a különlevonat listám és az OTKA jelentés benyújtott cikkei listájának összehasonlításával (ld. melléklet). Mindig szem előtt tartottuk az eredeti OTKA célkitűzést: a mennyiségi szerkezet-hatás összefüggések keresését új kemometriai módszerekkel. Megjegyzendő még, hogy az eredeti szerződés 3 évre szólt, melyet egy évvel külföldi tartózkodás miatt meg kellett hosszabbítani. A hosszabbításra engedélyt kaptunk. Az összes munkatervben szereplő elemzést elvégeztük, sőt cikket is írtunk a tervezett témákból, (némelyikből többet is). Kétségtelen, hogy nem mindegyik jelent meg. Zárójelentés témák szerinti bontásban: A mondandót témák szerint, nem az elért eredmények tudományos jelentősége vagy pedig időrend szerinti bontásban csoportosítottuk.
A data set containing acute toxicity values (96-h LC 50) of 69 substituted benzenes for fathead m... more A data set containing acute toxicity values (96-h LC 50) of 69 substituted benzenes for fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure-Activity Relationship (QSAR) models, either using or not using molecular descriptors, respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the molecular structure, described through an appropriate graphical tool (variable-size labeled rooted ordered trees) by defining suitable representation rules. The input trees are encoded by an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity training examples. Owing to the use of a flexible encoding approach, the model is target invariant and does not need a priori definition of molecular descriptors. The results obtained in this study were analyzed together with those of a model based on molecular descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression selection of descriptors (CROMRsel). The comparison revealed interesting similarities that could lead to the development of a combined approach, exploiting the complementary characteristics of the two approaches.
Journal of Pharmaceutical Sciences, 2013
The surface properties of hybrid materials (potential carriers for sustained release of active ag... more The surface properties of hybrid materials (potential carriers for sustained release of active agents) have been examined by inverse gas chromatography (IGC). A nonsteroidal antiinflammatory agent-ibuprofen was used as a model for active compound. The following parameters have been used to characterize the interactions between the constituents of the hybrid material and the active agent: dispersive component of the surface free energy D S γ , K A and K D parameters describing the acidity and basicity, respectively, and Flory-Huggins parameter ' 23 χ (the magnitude of interactions). Principal component analysis (PCA) and the procedure based on sum of ranking differences (SRD) were applied for selection of hybrid materials and parameters for characterization of these materials. One loose cluster found by PCA grouping of hybrid materials is refined by SRD analysis: SRD grouping indicates three groups having somewhat dissimilar properties.
Journal of Chromatography A, 2007
The solubility parameter (delta(2)), corrected solubility parameter (delta(T)) and its components... more The solubility parameter (delta(2)), corrected solubility parameter (delta(T)) and its components (delta(d), delta(p), delta(h)) were determined for series of pharmaceutical excipients by using inverse gas chromatography (IGC). Principal component analysis (PCA) was applied for the selection of the solubility parameters which assure the complete characterization of examined materials. Application of PCA suggests that complete description of examined materials is achieved with four solubility parameters, i.e. delta(2) and Hansen solubility parameters (delta(d), delta(p), delta(h)). Selection of the excipients through PCA of their solubility parameters data can be used for prediction of their behavior in a multi-component system, e.g. for selection of the best materials to form stable pharmaceutical liquid mixtures or stable coating formulation.
Journal of Chromatography A, 1999
Principal component analysis was performed on a data matrix consisting of Kováts indices of 35 al... more Principal component analysis was performed on a data matrix consisting of Kováts indices of 35 aliphatic ketones and aldehydes. The calculations were carried out on the correlation matrices of Kováts indices. The Kováts indices were determined on capillary columns with four different stationary phases, namely bonded methyl- (HP-1), methylphenyl- (HP-50), and trifluoropropylmethylsiloxane (DB-210), as well as polyethylene glycol (HP-Innowax) at
Journal of Chromatography A, 2007
Data analysis has become a fundamental task in analytical chemistry due to the great quantity of ... more Data analysis has become a fundamental task in analytical chemistry due to the great quantity of analytical information provided by modern analytical instruments. Supervised pattern recognition aims to establish a classification model based on experimental data in order to assign unknown samples to a previously defined sample class based on its pattern of measured features. The basis of the supervised pattern recognition techniques mostly used in food analysis are reviewed, making special emphasis on the practical requirements of the measured data and discussing common misconceptions and errors that might arise. Applications of supervised pattern recognition in the field of food chemistry appearing in bibliography in the last two years are also reviewed.
Journal of Chromatography A, 2002
Journal of Chemometrics, 2014
Classification is an important part of chemometrics and mostly based on optimization by vector ro... more Classification is an important part of chemometrics and mostly based on optimization by vector rotations. The present study is a continuation of the classification of medieval Hungarian silver coins including the 16 kings of the Hungarian Árpád Dynasty (997AD-1301AD) (Rácz et al.: Heritage Science 2013 1:2) The Rácz et al. paper identified three historical periods of the Árpád Dynasty from chemical data. The aim of the present study is to test whether the classification could be further refined by marker object projection aided classification. It offers an example of the efficiency of this method in unscrambling a classinside-class situation. The frequency distribution of concentrations of the coins are skewed and to a certain extent bi-modal, and the arithmetic mean value and standard deviation around the mean frequently used in parametric methods may be poor descriptors of the information carried by the data. We test a combination of principal components decomposition and the nonparametric, non-iterative object target rotation method to overcome some of the theoretical limitations of parametric methods. This test includes identification of archetypical class-Ambassadors‖ of each of the three historical periods of the Árpád Dynasty and shows a class-inside-class situation.
Journal of Chemical Information and Computer Sciences, 2003
Journal of Chemical Information and Modeling, 2005
A quantitative structure-retention relationship (QSRR) study based on multiple linear regression ... more A quantitative structure-retention relationship (QSRR) study based on multiple linear regression (MLR) was performed for the description and prediction of Kováts retention indices (RI) of alcohol compounds. Alcohols were of saturated, linear or branched types and contained a hydroxyl group on the primary, secondary or tertiary carbon atoms. Constitutive and weighted holistic invariant molecular (WHIM) descriptors were used to represent the structure of alcohols in the MLR models. Before the model building, five variable selection methods were applied to select the most relevant variables from a large set of descriptors, respectively. The selected molecular properties were included into the MLR models. The efficiency of the variable selection methods was also compared. The selection methods were as follows: ridge regression (RR), partial least-squares method (PLS), pair-correlation method (PCM), forward selection (FS) and best subset selection (BSS). The stability and the validity of the MLR models were tested by a cross-validation technique using a leave-n-out technique. Neither RR nor PLS selected variables were able to describe the Kováts retention index properly, and PCM gave reliable results in the description but not for prediction. We built models with good predicting ability using FS and BSS as a selection method. The most relevant variables in the description and prediction of RIs were the mean electrotopological state index, the molecular mass, and WHIM indices characterizing size and shape.
Journal of Agricultural and Food Chemistry, 2010
(1)H NMR fingerprints of virgin olive oils (VOOs) from the Mediterranean basin (three harvests) w... more (1)H NMR fingerprints of virgin olive oils (VOOs) from the Mediterranean basin (three harvests) were analyzed by principal component analysis, linear discriminant analysis (LDA), and partial least-squares discriminant analysis (PLS-DA) to determine their geographical origin at the national, regional, or PDO level. Further delta(13)C and delta(2)H measurements were performed by isotope ratio mass spectrometry (IRMS). LDA and PLS-DA achieved consistent results for the characterization of PDO Riviera Ligure VOOs. PLS-DA afforded the best model: for the Liguria class, 92% of the oils were correctly classified in the modeling step, and 88% of the oils were properly predicted in the external validation; for the non-Liguria class, 90 and 86% of hits were obtained, respectively. A stable and robust PLS-DA model was obtained to authenticate VOOs from Sicily: the recognition abilities were 98% for Sicilian oils and 89% for non-Sicilian ones, and the prediction abilities were 93 and 86%, respectively. More than 85% of the oils of both categories were properly predicted in the external validation. Greek and non-Greek VOOs were properly classified by PLS-DA: >90% of the samples were correctly predicted in the cross-validation and external validation. Stable isotopes provided complementary geographical information to the (1)H NMR fingerprints of the VOOs.
Journal of Agricultural and Food Chemistry, 2003
Principal component analysis (PCA) and linear discriminant analysis (LDA) were used to classify 1... more Principal component analysis (PCA) and linear discriminant analysis (LDA) were used to classify 187 Hungarian white and red wines according to wine-making technology, geographic origin (winemaking region), grape variety, and year of vintage based on free amino acid and biogenic amine contents. Determination of free amino acids and biogenic amines was accomplished by ion-exchange chromatography. Six principal components accounted for >77% of the total variance in the data. The plots of component loadings showed significant groupings of free amino acids and biogenic amines. The component scores grouped according to wines made by different wine-making technologies. Using LDA the variables with a major discriminant capacity were determined. Almost complete classification (94.7%) was achieved concerning both white and red wines and wines made by different wine-making technologies. The results of differentiation between white wines according to geographic origin, grape variety, and year of vintage were 70.8, 62.4, and 73.5%, respectively. The same numbers for red wines according to geographic origin, grape variety, and year of vintage were 64.9, 71.6, and 82.4%, respectively.
Chromatographia, 1988
Summary Relative retention data and Kováts retention indices were measured for several hydrocarb... more Summary Relative retention data and Kováts retention indices were measured for several hydrocarbons (mainly for alkylbenzenes) on dinonylphtalate and polyethylenelycol 4000 stationary phase. Correlations were searched between these retention data and the following physical (boiling point, molrefraction, molvolume) and topological (connectiviity index and general index of molecular complexity) properties of solutes. The best fitting equations was choosen among more than 150
Chromatographia, 1998
ABSTRACT
Chemosphere, 2006
The salting-out effects of 27 lithium, sodium, potassium, ammonium and magnesium salts and HCl on... more The salting-out effects of 27 lithium, sodium, potassium, ammonium and magnesium salts and HCl on chloroform, benzene, chlorobenzene and anisole were characterized in aqueous solutions at 303 K by measuring the Henry's law constants. The concentration of the salt solutions was 0.5 mol dm(-3), i.e., similar to the salinity of sea water. The solubility change was described in terms of the Setschenow constant, K(S)(salt,solute). The highest salting-out effects were observed for the solutions of salts involving doubly charged anions, and the smallest for NO(-)(3). The individual ionic Setschenow constants, K(S)(cation,solute) and K(S)(anion,solute), were determined by multilinear regression, using the assumption of additivity for the ions. Cl(-) was selected as the reference ion for calculation of the K(S)(ion,solute) values of the other ions. The estimations resulted systematically in significant positive K(S)(cation,solute) values, ranging from 0.13+/-0.026 (NH(+)(4)) to 0.28+/-0.032 (Mg(2+)), which were hardly affected by the accompanying anion in solution, and only slightly affected by the non-electrolytes present. NO(-)(3) resulted in a slight salting-in effect: K(S)(NO(-)(3),solute)=-0.083+/-0.019; the other anions displayed salting-out effect for all of the non-electrolytes studied, with K(S)(anion,solute) ranging between 0.090+/-0.008 (HCO(-)(3)) and 0.21+/-0.035 (CO(2-)(3)).
Chemometrics and Intelligent Laboratory Systems, 2001
The pair-correlation method PCM has been developed recently for discrimination between two variab... more The pair-correlation method PCM has been developed recently for discrimination between two variables. PCM can be Ž. used to identify the decisive fundamental, basic factor from among correlated variables even in cases when all other statistical criteria fail to indicate significant difference. These decisions are needed frequently in QSAR studies andror chemical model building. The conditional Fisher's exact test, based on testing significance in the 2 = 2 contingency tables is a suitable selection criterion for PCM. The test statistic provides a probabilistic aid for accepting the hypothesis of significant differ-Ž. ences between two factors, which are almost equally correlated with the response dependent variable. Differentiating between factors can lead to alternative models at any arbitrary significance level. The power function of the test statistic has Ž also been deduced theoretically. A similar derivation was undertaken for the description of the influence of Type I false-. Ž. positive conclusion, error of the first kind and Type II false-negative conclusion, error of the second kind errors. The appropriate decision is indicated from the low probability levels of both false conclusions.
Uploads
Papers by Karoly Heberger