Academia.eduAcademia.edu

Using data mining for wine quality assessment

2009

Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (eg alcohol levels) and sensory (eg human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal.

Using Data Mining for Wine Quality Assessment Paulo Cortez1 , Juliana Teixeira1 , António Cerdeira2 , Fernando Almeida2 , Telmo Matos2 , and José Reis12 1 Dep. of Information Systems/Algoritmi Centre, University of Minho, 4800-058 Guimarães, Portugal, [email protected], WWW home page: http://www3.dsi.uminho.pt/pcortez 2 Viticulture Commission of the Vinho Verde region (CVRVV), 4050-501 Porto, Portugal Abstract. Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production. Keywords: Ordinal Regression, Sensitivity Analysis, Sensory Preferences, Support Vector Machines, Variable and Model Selection, Wine Science. 1 Introduction Nowadays wine is increasingly enjoyed by a wider range of consumers. In particular, Portugal is a top ten wine exporting country and exports of its vinho verde wine (from the northwest region) have increased by 36% from 1997 to 2007 [7]. To support this growth, the industry is investing in new technologies for both wine making and selling processes. Wine certification and quality assessment are key elements within this context. Certification prevents the illegal adulteration of wines (to safeguard human health) and assures quality for the wine market. Quality evaluation is often part of the certification process and can be used to improve wine making (by identifying the most influential factors) and to stratify wines such as premium brands (useful for setting prices). Wine certification is often assessed by physicochemical and sensory tests [9]. Physicochemical laboratory tests routinely used to characterize wine include determination of density, alcohol or pH values, while sensory tests rely mainly on human experts. It should be stressed that taste is the least understood of the human senses [20], thus wine classification is a difficult task. Moreover, the relationships between the physicochemical and sensory analysis are complex and still not fully understood [16]. On the other hand, advances in information technologies have made it possible to collect, store and process massive, often highly complex datasets. All this data hold valuable information such as trends and patterns, which can be used to improve decision making and optimize chances of success [23]. Data mining (DM) techniques [26] aim at extracting high-level knowledge from raw data. There are several DM algorithms, each one with its own advantages. When modeling continuous data, the linear/multiple regression (MR) is the classic approach. Neural networks (NNs) have become increasingly used since the introduction of the backpropagation algorithm [19]. More recently, support vector machines (SVMs) have also been proposed [3]. Due to their higher flexibility and nonlinear learning capabilities, both NNs and SVMs are gaining an attention within the DM field, often attaining high predictive performances [13]. SVMs present theoretical advantages over NNs, such as the absence of local minima in the learning phase. When applying these methods, performance highly depends on a correct variable and model selection, since simple models may fail in mapping the underlying concept and too complex ones tend to overfit the data [13][12]. The use of decision support systems by the wine industry is mainly focused on the wine production phase [10]. Despite the potential of DM techniques to predict wine quality based on physicochemical data, their use is rather scarce and mostly considers small datasets. For example, in 1991 the famous “Wine” dataset was donated into the UCI repository [2]. The data contain 178 examples with measurements of 13 chemical constituents (e.g. alcohol, Mg) and the goal is to classify three cultivars from Italy. This dataset is very easy to discriminate and has been mainly used as a benchmark for new DM classifiers. In 1997 [22], a NN fed with 15 input variables (e.g. Zn and Mg levels) was used to predict six geographic wine origins. The data included 170 samples from Germany and a 100% predictive rate was reported. In 2001 [24], NNs were used to classify three sensory attributes (e.g. sweetness) of Californian wine, based on grape maturity levels and chemical analysis (e.g. titrable acidity). Only 36 examples were used and a 6% error was achieved. More recently, mineral characterization (e.g. Zn and Mg) was used to discriminate 54 samples into two red wine classes [17]. A probabilistic NN was adopted, attaining 95% accuracy. As a powerful learning tool, SVM has outperformed NN in several applications, such as predicting meat preferences [6]. Yet, in the field of wine quality only one application has been reported, where spectral measurements from 147 bottles were successfully used to predict 3 categories of rice wine age [27]. In this paper, we present a real-world application, where wine taste preferences are modeled by DM algorithms that use analytical data that are easily available at the certification step. In contrast with previous studies, a large dataset is considered with a total of 4898 samples. Wine quality is modeled under a regression approach that preserves the order of the grades. Explanatory knowledge is given by a sensitivity analysis, which measures how the responses are affected when a given input is varied through its domain [14][6]. Variable and model selection are performed simultaneously, in a process that is guided by the sensitivity analysis. Also, we propose a parsimony search method to select the best NN and SVM parameters with a low computational effort. Finally, we show the impact of the obtained models in the wine domain. 2 2.1 Materials and methods Wine data This study will consider vinho verde, a unique product from the Minho (northwest) region of Portugal. Medium in alcohol, is it particularly appreciated due to its freshness (specially in the summer). This wine accounts for 15% of the total Portuguese production [7], and around 10% is exported, mostly white wine. In this work, we will analyze this common variant from the demarcated region of vinho verde. The data were collected from May/2004 to February/2007 using only protected designation of origin samples that were tested at the official certification entity (CVRVV). The CVRVV is an inter-professional organization with the goal of improving the quality and marketing of vinho verde. The data were recorded by a computerized system (iLab), which automatically manages the process of wine sample testing from producer requests to laboratory and sensory analysis. Each entry denotes a given test (analytical or sensory) and the final database was exported into a single sheet (.csv). During the preprocessing stage, the database was transformed in order to include a distinct wine sample (with all tests) per row. To avoid discarding examples, only the most common physicochemical tests were selected. Table 1 presents the physicochemical statistics per dataset. Regarding the preferences, each sample was evaluated by a minimum of three sensory assessors (using blind tastes), which graded the wine in a scale that ranges from 0 (very bad) to 10 (excellent). The final sensory score is given by the median of these evaluations. Fig. 1 plots the histograms of the target variable, denoting a typical normal shape distribution (i.e. with more normal grades that extreme ones). Table 1. The physicochemical data statistics 1500 1000 500 0 Frequency (wine samples) 2000 Attribute (units) Min Max Mean fixed acidity (g(tartaric acid)/dm3 ) 3.8 14.2 6.9 volatile acidity (g(acetic acid)/dm3 ) 0.1 1.1 0.3 citric acid (g/dm3 ) 0.0 1.0 0.3 residual sugar (g/dm3 ) 0.6 65.8 6.4 chlorides (g(sodium chloride)/dm3 ) 0.01 0.35 0.05 free sulfur dioxide (mg/dm3 ) 2 260 35 total sulfur dioxide (mg/dm3 ) 9 260 138 density (g/cm3 ) 0.987 1.039 0.994 pH 2.7 3.8 3.1 sulphates (g(potassium sulphate)/dm3 ) 0.2 1.1 0.5 alcohol (% vol.) 8.0 14.2 10.4 3 4 5 6 7 8 9 Sensory preference Fig. 1. The histogram for the white wine preferences 2.2 Data mining approach and evaluation We will adopt a regression approach, which preserves the order of the preferences. For instance, if the true grade is 3, then a model that predicts 4 is better than one that predicts 7. A regression dataset D is made up of k ∈ {1, ..., N } examples, each mapping an input vector with I input variables (xk1 , . . . , xkI ) to a given target yk . The regression performance is commonly measured by an error metric, such as the mean absolute deviation (MAD) [26]: PN M AD = i=1 |yi − ybi |/N (1) where ybk is the predicted value for the k input pattern. The regression error characteristic (REC) curve [1] is also used to compare regression models, with the ideal model presenting an area of 1.0. The curve plots the absolute error tolerance T (x-axis), versus the percentage of points correctly predicted (the accuracy) within the tolerance (y-axis). The confusion matrix is often used for classification analysis, where a C × C matrix (C is the number of classes) is created by matching the predicted values (in columns) with the desired classes (in rows). For an ordered output, the predicted class is given by pi = yi , if |yi − ybi | ≤ T , else pi = yi′ , where yi′ denotes the closest class to ybi , given that yi′ 6= yi . From the matrix, several metrics can be used to access the overall classification performance, such as the accuracy and precision (i.e. the predicted column accuracies) [26]. The holdout validation is often used to estimate the generalization capability of a model. This method randomly partitions the data into training and test subsets. The former subset is used to fit the model (typically with 2/3 of the data), while the latter (with the remaining 1/3) is used to compute the estimate. A more robust estimation procedure is the k-fold cross-validation [8], where the data is divided into k partitions of equal size. One subset is tested each time and the remaining data are used for fitting the model. The process is repeated sequentially until all subsets have been tested. Therefore, under this scheme, all data are used for training and testing. However, this method requires around k times more computation, since k models are fitted. The validation method will be applied several runs and statistical confidence will be given by the t-student test at the 95% confidence level [11]. 2.3 Data mining methods We will adopt the most common NN type, the multilayer perceptron, where neurons are grouped into layers and connected by feedforward links (Fig. 2). Supervised learning is achieved by an iterative adjustment of the network connection weights, called the training procedure, in order to minimize an error function. For regression tasks, this NN architecture is often based on one hidden layer of H hidden nodes with a logistic activation and one output node with a linear function [13]: yb = wo,0 + o−1 X j=I+1 1 + exp(− PI 1 i=1 xi wj,i − wj,0 ) · wo,i (2) where wi,j denotes the weight of the connection from node j to i and o the output node. The performance is sensitive to the topology choice (H). A NN with H = 0 is equivalent to the MR model. By increasing H, more complex mappings can be performed, yet an excess value of H will overfit the data, leading to generalization loss. A computationally efficient method to set H is to search through the range {0, 1, 2, 3, . . . , Hmax } (i.e. from the simplest NN to more complex ones). For each H value, a NN is trained and its generalization estimate is measured (e.g. over a validation sample). The process is stopped when the generalization decreases or when H reaches the maximum value (Hmax ). In SVM regression [21], the input x ∈ ℜI is transformed into a high mdimensional feature space, by using a nonlinear mapping (φ) that does not need to be explicitly known but that depends of a kernel function (K). The aim of a SVM is to find the best linear separating hyperplane in the feature space: yb = w0 + m X wi φi (x) (3) i=1 To select the best hyperplane, the ǫ-insensitive loss function is often used [21]. This function sets an insensitive tube around the residuals and the tiny errors within the tube are discarded (Fig. 2). input layer x1 x2 hidden layer output layer support vectors j wi,j +ε ε−insensitive loss function: 0 −ε i y^ x3 −ε 0 +ε Fig. 2. Example of a multilayer perceptron with 3 inputs, 2 hidden nodes and one output (left) and a linear SVM regression (right, adapted from [21]) We will adopt the popular gaussian kernel, which presents less parameters than other kernels (e.g. polynomial) [25]: K(x, x′ ) = exp(−γ||x − x′ ||2 ), γ > 0. Under this setup, the SVM performance is affected by three parameters: γ, ǫ and C (a trade-off between fitting the errors and the flatness of the mapping). To reduce the search space, the first two values √ will be set using the heuristics [4]: PN b = 1.5/N × i=1 (yi − C = 3 (for a standardized output) and ǫ = σ b/ N , where σ ybi )2 and yb is the value predicted by a 3-nearest neighbor algorithm. The kernel parameter (γ) produces the highest impact in the SVM performance, with values that are too large or too small leading to poor predictions. A practical method to set γ is to start the search from one of the extremes and then search towards the middle of the range while the predictive estimate increases [25]. 2.4 Input Relevance and Variable/Model Selection Sensitivity analysis [14] is a simple procedure that is applied after the training phase and analyzes the model responses when the inputs are changed. Origi- nally proposed for NNs, this sensitivity method can also be applied to other algorithms, such as SVM [6]. Let ybaj denote the output obtained by holding all input variables at their average values except xa , which varies through its entire range with j ∈ {1, . . . , L} levels. If a given input variable (xa ∈ {x1 , . . . , xI }) is relevant then it should produce a high variance (Va ). Thus, its relative importance (Ra ) can be given by: PL Va = j=1 (b ya − ybaj )2 /(L − 1) PI j Ra = Va / i=1 Vi × 100 (%) (4) The Ra values will be used to measure the relevance of the inputs. For a more detailed input influence analysis, in this work we propose the Variable Effect Characteristic (VEC) curve. For a given a attribute, the VEC plots the xaj values (x-axis) versus the ybaj predictions (y-axis) (see Section 3.3). The sensitivity analysis will be also used to discard irrelevant inputs, guiding the variable selection algorithm. We will adopt a backward selection scheme, which starts with all variables and iteratively deletes one input until a stopping criterion is met [12]. The difference, when compared to the standard backward selection, is that we guide the variable deletion (at each step) by the sensitivity analysis, in a variant that allows a reduction of the computational effort by a factor of I and that in [14] has outperformed other methods (e.g. backward and genetic algorithms). Similarly to [28], the variable and model selection will be performed simultaneously, i.e. in each backward iteration several models are searched, with the one that presents the best generalization estimate selected. For a given DM method, the overall procedure is: 1. Start with all F = {x1 , . . . , xI } input variables. 2. If there is a hyperparameter P ∈ {P1 , . . . , Pk } to tune (e.g. NN or SVM), start with P1 and go through the remaining range until the generalization estimate decreases. Compute the generalization estimate of the model by using an internal validation method. For instance, if the holdout method is used, the available data are further split into training (to fit the model) and validation sets (to get the predictive estimate). 3. After fitting the model, compute the relative importances (Ri ) of all xi ∈ F variables and delete from F the least relevant input. Go to step 4 if the stopping criterion is met, otherwise return to step 2. 4. Select the best F (and P in case of NN or SVM) values, i.e., the input variables and model that provide the best predictive estimates. Finally, retrain this configuration with all available data. 3 3.1 Empirical results Experimental setup All experiments reported in this work were written in R [18] and conducted in a Linux server, with an Intel dual core processor. R is an open source, multiple platform (e.g. Windows, Linux) and high-level matrix programming language for statistical and data analysis. In particular, we adopted the RMiner [5], a library for the R tool that facilitates the use of DM techniques in classification and regression tasks. Before fitting the models, the data was first standardized to a zero mean and one standard deviation [13]. RMiner uses the efficient BFGS algorithm to train the NNs (nnet R package), while the SVM fit is based on the Sequential Minimal Optimization implementation provided by LIBSVM (kernlab package). The the hyperparameters (H and γ) will be set using the procedure described in the previous section and with the search ranges of H ∈ {0, 1, . . . , 11} [28] and γ ∈ {23 , 21 , . . . , 2−15 } [25]. While the maximum number of searches is 12/10, in practice the parsimony approach (step 2 of Section 2.4) will reduce this number substantially. Regarding the variable selection, we set the estimation metric to the M AD value (Eq. 1), as advised in [25]. To reduce the computational effort, we adopted the simpler 2/3 and 1/3 holdout split as the internal validation method. The sensitivity analysis parameter was set to L = 6, i.e. xa ∈ {−1.0, −0.6, . . . , 1.0} for a standardized input. As a reasonable balance between the pressure towards simpler models and the increase of computational search, the stopping criterion was set to 2 iterations without any improvement or when only one input is available. 3.2 Predictive Knowledge To evaluate the selected models, we adopted 20 runs of the more robust 5-fold cross-validation, in a total of 20×5=100 experiments for each tested configuration. The results are summarized in Table 2. The test set errors are shown in terms of the mean and 95% confidence intervals. Three metrics are present: M AD, the classification accuracy for different tolerances (i.e. T = 0.25, 0.5 and 1.0) and Kappa (T = 0.5). The selected models are described in terms of the average number of inputs (I) and hyperparameter value (H or γ). The last row shows the total computational time required in seconds. Table 2. The wine modeling results (test set errors and selected models; best values are in bold; underline denotes a statistical significance when compared with MR and NN) MR 0.59±0.00 25.6±0.1 51.7±0.1 84.3±0.1 20.9±0.1 9.6 – 551 NN 0.58±0.00 26.5±0.3 52.6±0.3 84.7±0.1 23.5±0.6 9.3 H = 2.1 1339 SVM 0.45±0.00 50.2±1.1 64.3±0.4 86.8±0.2 43.4±0.4 10.0 γ = 20.7 34644 0 20 40 60 80 100 MAD AccuracyT =0.25 (%) AccuracyT =0.50 (%) AccuracyT =1.00 (%) KappaT =0.5 (%) Inputs (I) Model Time (s) 0 0.5 1 1.5 2 Fig. 3. The average test set REC curves (SVM – solid line, NN - gray line and MR – dashed line) For all error metrics, the SVM is the best choice. The differences are higher for small tolerances (e.g. for T = 0.25, the SVM accuracy is almost two times better when compared to other methods). This effect is clearly visible when plotting the full REC curves (Fig. 3). The Kappa statistic [26] measures the accuracy when compared with a random classifier (which presents a Kappa value of 0%). The higher the statistic, the more accurate the result. The most practical tolerance values are T = 0.5 and T = 1.0. The former tolerance rounds the regression response into the nearest class, while the latter accepts a response that is correct within one of the two closest classes (e.g. a 3.1 value can be interpreted as grade 3 or 4 but not 2 or 5). For T = 0.5, the SVM accuracy improvement is 11.7 pp (19.9 pp for Kappa). The NN model slightly outperforms the MR results. Regarding the variable selection, the average number of deleted inputs ranges from 1.0 to 1.7, showing that most of the physicochemical tests used are relevant. In terms of computational effort, the SVM is the most expensive method. A detailed analysis of the SVM classification results is presented by the average confusion matrix for T = 0.5 (Table 3). To simplify the visualization, the 3 and 9 grade predictions were omitted, since these were always empty. Most of the values are close to the diagonals (in bold), denoting a good fit by the model. The true predictive accuracy for each class is given by the precision metric (e.g. for the grade 4, precisionT =0.5 =18/(18+6+4)=64.3%). This statistic is important in practice, since in a real deployment setting the actual values are unknown and all predictions within a given column would be treated the same. For a tolerance of 0.5, the accuracies are 60.1/64.3% for classes 6 and 4, 67.1/72.3% for grades 7 and 5, and a surprising 86.6% for the class 8 (the exception are the 3 and 9 extremes with 0%, not shown in the table). When the tolerance is increased (T = 1.0), high accuracies are obtained, ranging from 82.0 to 96.2%. Table 3. The average confusion matrix (T = 0.5) and precision values (T = 0.5 and 1.0) for the SVM model (bold denotes accurate predictions) Actual White wine predictions Class 4 5 6 7 8 3 0 3 17 1 0 4 18 53 91 1 0 5 6 832 598 21 0 6 4 241 1806 144 3 7 0 20 418 436 6 8 0 2 71 45 58 9 0 0 2 2 0 PrecisionT =0.5 64.3% 72.3% 60.1% 67.1% 86.6% PrecisionT =1.0 89.7% 93.4% 82.0% 90.1% 96.2% 3.3 Explanatory Knowledge The relative importances of the SVM input variables, given in terms of the mean and 95% confidence intervals of the Ra values, are shown in Fig. 4. It should be noted that the whole 11 inputs are shown, since in each simulation different sets of variables can be selected. A more detailed analysis will be given to sixth most relevant analytical tests (Fig. 5). For a given input, each plot shows the histogram (frequency values are shown at the right of the y-axis) and the VEC curves (b yaj values, shown at the left of the y-axis) when the analytical test values (x-axis) are changed through their domain. For a given test, we built a VEC curve with L = 6 points (the sensitivity levels). Since 100 experiments we performed, we 0 5 10 15 5 10 15 alcohol sulphates pH free sulfur dioxide volatile acidity residual sugar fixed acidity total sulfur dioxide citric acid density chlorides 0 Fig. 4. The relative input importances for the SVM model (in %; bars denote the average value while the whiskers show the 95% confidence intervals) performed a vertical averaging (with the respective 95% confidence intervals) of the 100 curves. In several cases, the obtained results confirm the oenological theory. For instance, an increase in the alcohol (the most relevant factor) tends to result in a higher quality wine. Fig. 5 shows that this is true between the range from 9 to 13 % (which is related to most samples). In addition, the volatile acidity has a negative impact within the range that corresponds to the majority of the examples. This outcome was expected, since acetic acid is the key ingredient in vinegar. Moreover, residual sugar levels are important in white wine, where the equilibrium between the freshness and sweet taste is more appreciated. The most intriguing result is the high importance of sulphates, ranked second. Oenologically this result could be very interesting. An increase in sulphates might be related to the fermenting nutrition, which is very important to improve the wine aroma, in an effect that occurs within the range 0.4 to 0.7 that contains most of the samples. 4 Conclusions Due to the increase in the interest in wine, companies are investing in new technologies to improve their production and selling processes. Quality certification is a crucial step for both processes and is currently dependent on wine tasting by human experts. This work aims at the prediction of wine preferences from objective analytical tests that are available at the certification step. A large dataset (with 4898 entries) was considered, including white vinho verde samples from the northwest region of Portugal. This case study was addressed by a regression tasks, where wine preference is modeled in a continuous scale, from 0 (very bad) to 10 (excellent). This approach preserves the order of the classes, allowing the evaluation of distinct accuracies, according to the degree of error tolerance (T ) that is accepted. Due to advances in the data mining (DM) field, it is possible to extract knowledge from raw data. Indeed, powerful techniques such as neural networks (NNs) and more recently support vector machines (SVMs) are emerging. While being more flexible models (i.e. no a priori restriction is imposed), the performance depends on a correct setting of hyperparameters (e.g. SVM kernel parameter) and the input variables used by the model. In this study, we present an integrated and computationally efficient approach that simultaneously addresses both issues. Sensitivity analysis is used to extract knowledge from the NN/SVM models, given in terms of the effect on the responses when one input is varied, leading to the proposed Variable Effect Characteristic (VEC) curves, and relative importance of the inputs (measured by the variance of the response changes). The the variable selection is guided by sensitivity analysis and the model selection is based on parsimony search that starts from a reasonable value and is stopped when the generalization estimate decreases. Encouraging results were achieved, with the SVM model providing the best performances, outperforming the NN and MR techniques. The overall accuracies are 64.3% (T = 0.5) and 86.8% (T = 1.0). It should be noted that the datasets contain six/seven classes (from 3 to 8/9) and these accuracies are much better than the ones expected by a random classifier. While requiring more computation, the SVM fitting can still be achieved within a reasonable time with current processors. For example, one run of the 5-fold cross-validation testing takes around 26 minutes. The result of this research is relevant to the wine science domain, helping in the understanding of how physicochemical characterization affects the final quality. In addition, this work can have an impact in the wine industry. At the certification phase and by Portuguese law, the sensory analysis has to be performed by human tasters. Yet, the evaluations are based in the experience and knowledge of the experts, which are prone to subjective factors. The proposed data-driven approach is based on objective tests and thus it can be integrated into a decision support system, aiding the speed and quality of the oenologist performance. For instance, the expert could repeat the tasting only if her/his grade is far from the one predicted by the DM model. In effect, within this domain the T = 1.0 distance is accepted as a good quality control process and, as shown in this study, high accuracies were achieved for this tolerance. The model could also be used to improve the training of oenology students. Furthermore, the relative importance of the inputs brought interesting insights regarding the impact of the analytical tests. Since some variables can be controlled in the production process this information can be used to improve the wine quality. For instance, alcohol concentration can be increased or decreased by monitoring the grape sugar concentration prior to the harvest. Also, the residual sugar in wine could be raised by suspending the sugar fermentation carried out by yeasts. In future work, we intend to model preferences from niche and/or profitable markets (e.g. for a particular country by providing free wine tastings at supermarkets), aiming at the design of brands that match these market needs. We will also test other DM algorithms that specifically build rankers, such as regression trees [15]. References 1. J. Bi and K. Bennett. Regression Error Characteristic curves. In Proceedings of 20th Int. Conf. on Machine Learning (ICML), Washington DC, USA, 2003. 2. C. Blake and C. Merz. UCI Repository of Machine Learning Databases, 1998. 3. B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In COLT ’92: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152, NY, USA, 1992. ACM. 4. V. Cherkassy and Y. Ma. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Networks, 17(1):113–126, 2004. 5. P. Cortez. RMiner: Data Mining with Neural Networks and Support Vector Machines using R. In R. Rajesh (Ed.), Introduction to Advanced Scientific Softwares and Toolboxes, In press. 6. P. Cortez, M. Portelinha, S. Rodrigues, V. Cadavez, and A. Teixeira. Lamb Meat Quality Assessment by Support Vector Machines. Neural Processing Letters, 24(1):41–51, 2006. 7. CVRVV. Portuguese Wine - Vinho Verde. Comissão de Viticultura da Região dos Vinhos Verdes (CVRVV), http://www.vinhoverde.pt, July 2008. 8. T. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7):1895–1923, 1998. 9. S. Ebeler. Flavor Chemistry - Thirty Years of Progress, chapter Linking flavour chemistry to sensory analysis of wine, pages 409–422. Kluwer Academic Publishers, 1999. 10. J. Ferrer, A. MacCawley, S. Maturana, S. Toloza, and J. Vera. An optimization approach for scheduling wine grape harvest operations. Production Economics, pages 985–999, 2008. 11. A. Flexer. Statistical evaluation of neural networks experiments: Minimum requirements and current practice. In Proceedings of the 13th European Meeting on Cybernetics and Systems Research, volume 2, pages 1005–1008, Vienna, Austria, 1996. 12. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003. 13. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, NY, USA, 2001. 14. R. Kewley, M. Embrechts, and C. Breneman. Data Strip Mining for the Virtual Design of Pharmaceuticals with Neural Networks. IEEE Transactions on Neural Networks, 11(3):668–679, May 2000. 15. S. Kramer, G. Widmer, B. Pfahringer, and M. De Groeve. Prediction of Ordinal Classes Using Regression Trees. Fundamenta Informaticae, 47(1):1–13, 2001. 16. A. Legin, A. Rudnitskaya, L. Luvova, Y. Vlasov, C. Natale, and A. D’Amico. Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimica Acta, pages 33–34, 2003. 17. I. Moreno, D. González-Weller, V. Gutierrez, M. Marino, A. Cameán, a. González, and A. Hardisson. Differentiation of two Canary DO red wines according to their metal content from inductively coupled plasma optical emission spectrometry and graphite furnace atomic absorption spectrometry by using Probabilistic Neural Networks. Talanta, 72:263–268, 2007. 18. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-00-3, http://www.R-project.org, 2008. 19. D. Rumelhart, G. Hinton, and R. Williams. Learning Internal Representations by Error Propagation. In D. Rulmelhart and J. McClelland, editors, Parallel Distributed Processing: Explorations in the Microstructures of Cognition, volume 1, pages 318–362, MIT Press, Cambridge MA, 1986. 20. D. Smith and R. Margolskee. Making sense of taste. Scientific American, 284:26– 33, 2001. 21. A. Smola and B. Scholkopf. A tutorial on support vector regression. Statistics and Computing, 14:199–222, 2004. 22. L. Sun, K. Danzer, and G. Thiel. Classification of wine samples by means of artificial neural networks and discrimination analytical methods. Fresenius’ Journal of Analytical Chemistry, 359:143–149, 1997. 23. E. Turban, R. Sharda, J. Aronson, and D. King. Business Intelligence, A Managerial Approach. Prentice-Hall, 2007. 24. S. Vlassides, J. Ferrier, and D. Block. Using Historical Data for Bioprocess Optimization: Modeling Wine Characteristics Using Artificial Neural Networks and Archived Process Information. Biotechnology and Bioengineering, 73(1), 2001. 25. W. Wang, Z. Xu, W. Lu, and X. Zhang. Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing, 55:643–663, 2003. 26. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2005. 27. H. Yu, H. Lin, H. Xu, Y. Ying, B. Li, and X. Pan. Prediction of Enological Parameters and Discrimination of Rice Wine Age Using Least-Squares Support Vector Machines and Near Infrared Spectroscopy. Agricultural and Food Chemistry, 56:307–313, 2008. 28. M. Yu, M. Shanker, G. Zhang, and M. Hung. Modeling consumer situational choice of long distance communication with neural networks. Decision Support Systems, 44:899–908, 2008. sulphates 6.2 1000 alcohol 800 600 600 6.2 800 6.1 6.0 ● ● 6.0 ● 400 5.8 ● 5.6 ● 200 200 5.8 5.6 0 ● 9.24 10.48 11.72 12.96 14.2 0.22 0.39 0.74 0.91 1.08 ● 1400 1200 6.1 0.56 free sulfur dioxide 6.0 pH 2000 8 0 5.7 ● ● ● 400 5.9 ● ● ● ● 1500 1000 5.9 800 6.0 5.8 ● ● 1000 600 ● 5.6 ● ● ● 500 ● 0 5.6 ● 3.16 3.38 3.6 3.82 2 53.6 105.2 156.8 208.4 260 residual sugar ● 1500 ● 26.68 39.72 52.76 65.8 1000 1000 ● ● ● ● 500 500 5.6 5.5 ● ● 5.6 ● 0 5.7 ● 5.7 5.8 5.8 5.9 1500 5.9 6.0 ● 6.0 2000 6.1 volatile acidity 2500 2.94 2000 2.72 0 200 5.7 400 ● 5.4 5.8 ● 0.08 0.28 0.49 0.69 0.9 1.1 0 5.5 ● 0.6 13.64 Fig. 5. The vertical averaging of the VEC curves (points and whiskers) and histogram (in bars) for the SVM model and the sixth most relevant physicochemical tests