Academia.eduAcademia.edu
Environmental Pollution 137 (2005) 273e280 www.elsevier.com/locate/envpol The analysis of soil cores polluted with certain metals using the BoxeCox transformation Milan Meloun a,*, Milan Sáňka b, Pavel Němec b, Soňa Křı́tková a, Karel Kupka c b a Department of Analytical Chemistry, University of Pardubice, CZ532 10 Pardubice, Czech Republic Central Institute for Supervisiting and Testing in Agriculture Division of Agrochemistry, Soil and Plant Nutrition, Hroznová 2, CZ656 06 Brno - Pisárky, Czech Republic c Trilobyte Statistical Software Ltd., CZ530 02 Pardubice, Czech Republic Received 21 July 2004; accepted 28 January 2005 A new procedure of statistical analysis, with exploratory data diagnostics and BoxeCox transformation was used. Abstract To define the soil properties for a given area or country including the level of pollution, soil survey and inventory programs are essential tools. Soil data transformations enable the expression of the original data on a new scale, more suitable for data analysis. In the computer-aided interactive analysis of large data files of soil characteristics containing outliers, the diagnostic plots of the exploratory data analysis (EDA) often find that the sample distribution is systematically skewed or reject sample homogeneity. Under such circumstances the original data should be transformed. The BoxeCox transformation improves sample symmetry and stabilizes spread. The logarithmic plot of a profile likelihood function enables the optimum transformation parameter to be found. Here, a proposed procedure for data transformation in univariate data analysis is illustrated on a determination of cadmium content in the plough zone of agricultural soils. A typical soil pollution survey concerns the determination of the elements Be (16 544 values available), Cd (40 317 values), Co (22 176 values), Cr (40 318 values), Hg (32 344 values), Ni (34 989 values), Pb (40 344 values), V (20 373 values) and Zn (36 123 values) in large samples. Ó 2005 Elsevier Ltd. All rights reserved. Keywords: Data transformation; Exploratory analysis; Soil pollution; Risk element contents 1. Introduction High concentrations of some (specifically heavy) metals in soils can cause long-term harm to ecosystems and humans. As society becomes increasingly concerned * Corresponding author. Tel.: C42 40 603 7026; fax: C42 40 603 7068. E-mail addresses: milan.meloun@upce.cz (M. Meloun), pavel.nemec@ ukzuz.cz (P. Němec), kupka@trilobyte.cz (K. Kupka). URL: http://meloun.upce.cz 0269-7491/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.envpol.2005.01.027 about the hazard posed by polluted soil it wants to know how much of any pollutant there is in the ground. Soil survey, monitoring and inventarization programs are the inevitable tools used to define soil properties for a given area or country, including its pollution status. The Register of Contaminated Sites is one such program, established as part of the Fertilisers Act (Act No. 156/98 S.B., as amended) and connected decrees in the Czech Republic. Within the framework of the Register, a survey of the risk element (Cd, Pb, Cr, Hg) content of agricultural soils on a 1-km2 grid was 274 M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 implemented from 1990 to 1993. The four elements were successively complemented by analyses of Be, Co, Ni, V and Zn. This survey established a database which has since been continuously filled out by the results of supplementary sampling. Each sample in the batch is identified by geographical co-ordinates and the number of the plot in the relevant agricultural enterprise. The results of risk element content quantification in 2 M nitric acid extract or aqua regia extract are related. A batch of over 40 000 soil samples has been analysed for the Register database, and the exact sample sizes for each element (2 M HNO3 extraction) are Be: 16 544 values available, Cd: 40 317 values, Co: 22 176 values, Cr: 40 318 values, Hg: 32 344 values, Ni: 34 989 values, Pb: 40 344 values, V: 20 373 values and Zn: 36 123 values. When an exploratory data analysis (Tukey, 1977; Chambers et al., 1983) indicates that the sample distribution strongly differs from a normal one, the problem arises as to how to analyze the soil sample data. Raw data may require re-expression to produce an informative display, effective summary, or straightforward analysis (Meloun et al., 1992). Difficulties may arise because the raw data have (i) a strong asymmetry, or (ii) batches at different levels with a widely differing spread. By altering the shape of the batch or batches these problems may be alleviated. The data are transformed by applying a single mathematical function to all of the raw data values. It is not only the units in which the data are stated that may need to be changed but also the basic scale of the measurement. Changes of origin and scale mean linear transformations, and these leave shape alone; nonlinear transformations such as the logarithm and square root are necessary to change shape. Changing the scale of measurement is natural because it provides an alternative means of reporting the information. Batch symmetry is often a desirable property, as many estimates of location work best, and are best understood, when the data come from a symmetric distribution. The BoxeCox transformation eliminates heteroscedasticity, and the reconstructed mean and standard deviation are the estimates for the corrected distribution of the data. But heteroscedasticity has nothing in common with the outliers, objects which do not follow the distribution of data majority. The geochemical and pedological data sometimes arrive in several batches at different levels and a systematic relationship between spread and level is often found: increasing level usually brings increasing spread. When this relationship is strong there are several reasons for transforming the data in a way that reduces or eliminates the dependence spread on level: the transformed data will be better suited for comparison and visual exploration, and may be better suited for common confirmatory techniques, while individual batches become more nearly symmetric and have fewer outliers. This paper provides a description of the BoxeCox transformation and a re-expression of statistics for transformed data. The procedure of BoxeCox transformation is illustrated on a typical soil pollution survey case study concerning a determination of cadmium content (mg kg1) and other elements (Be, Cd, Co, Cr, Hg, Ni, Pb, V and Zn). 2. Materials and methods 2.1. Sampling The batches of soil samples for risk element analyses were taken from agricultural areas across the whole Czech Republic using the following sampling technique: one sample from arable land and grassland represents an area of 7e10 ha, from hop gardens and orchards 3 ha and from vineyards 2 ha. One composite sample consists of 30 individual probes to depths of 30 cm or 15 cm on arable land and grassland, respectively. The spatial distribution of the composite samples is arranged so as to achieve approximately 1 soil composite sample per 100 ha of agricultural soil. The following parameters were measured in the soil samples: soil texture (determined according to the maps of the Complex Soil Survey in the categories light, medium and heavy), pH exchangeable in KCl extraction, As, Be, Cd, Co, Cr, Cu, Mo, Ni, Pb, V, Zn determined by the AAS or ICP method in 2 M HNO3 extraction, and total Hg content. Samples from different areas were analysed for the selected range of elements. The detection limit x^D [mg kg1] and quantification limit x^Q [mg kg1] for the quantitative determination of elements are, respectively, for As 1.307 and 4.619, for Be 0.06 and 0.197, for Cd 0.061 and 0.196, for Co 0.658 and 2.203, for Cr 0.598 and 2.179, for Cu 0.515 and 1.821, for Hg 0.02 and 0.06, for Mo 0.1 and 0.297, for Ni 0.574 and 1.992, for Pb 0.957 and 2.916, for V 1.611 and 5.764, and for Zn 1.37 and 3.979. 2.2. Proposed procedure of statistical data treatment Step 1 Survey of descriptive statistics: the statistical software for an actual sample batch usually calculates a survey of parameters of location and spread. Step 2 Basic diagnostic plots in the exploratory data analysis for a graphical visualization of data, diagrams and simple plots are used i.e. (a) the dot diagram and the jitter dot diagram, (b) the box-and-whisker plot and the notched box-and-whisker plot, (c) the quantile plot, and M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 (d) the symmetry plot (cf. Meloun et al., 1992, pp. 45e67). Step 3 Determination of sample distribution: the sample distribution represented by symmetry, skewness and kurtosis is examined by (e) the kernel density estimator of the probability density function n hx  x i 1X i ^ fðxÞZ ; K nh iZ1 h ð1aÞ where h is bandwidth, which controls the ^ smoothness of fðxÞ, and K(x) is the kernel function, which is symmetric around zero, and also has the properties of a frequency function. The actual choice of shape for the kernel function is not important, so here a bi-quadratic kernel estimate is used 2.3. Measurement of location using data transformation Examining the data, the proper transformation is often found to be that which leads to a symmetric data distribution, stabilizes the variance or makes the distribution closer to normal. Such transformation of the original data x to the new variable value y Z g(x) is based on an assumption that the original experimental data represent a nonlinear transformation of a normally distributed variable x Z g1( y). Transformation for variance stabilization implies ascertaining the transformation y Z g(x) in which the variance s2( y) is constant. If the variance of the original variable x is a function of the type s2(x) Z f1(mx) where mx is the population mean of the original data, the variance s2( y) may be expressed by ð1bÞ ^ The quality of the kernel estimate fðxÞ is controlled mainly by the selection of parameter h. If h is too small, the estimate is too ^ is rough; if it is too large, the shape of fðxÞ flattened too much (cf. Meloun et al., 1992, pp. 58). (f) the quantileequantile plot, which is used for comparison of the actual with the theoretical sample distribution. Step 4 Tests of basic assumptions about the sample (cf. Meloun et al., 1992, pp. 78e82): applying an analysis of basic assumptions about the data, the following examinations were applied: (a) Examination for the independence of sample elements, (b) Examination for the normality of the sample distribution, (c) Examination of sample homogeneity. All of the algorithms used are available on the internet (Kupka, 2004). Step 5 Data transformation: when most of the EDA diagnostic plots exhibit an asymmetric distribution in the original sample data, data transformation seems to be the most convenient of several possible techniques to apply. For the transformation the estimate l maximizing ln L(l) is calculated. The selected l is used in the calculation of estimates y, s2( y), g^1 ðyÞ, and g^2 ðyÞ. From these estimates, the re-expressed estimates of the original variables xR , s2 ð xR Þ, and the 95% confidence interval of the reexpressed variable m are then calculated (cf. Meloun et al., 1992, pp. 70e77).  dgðxÞ s ðyÞZ dx 2  2 2 KðxÞZ 0:9375ð1  x Þ for  1%x%1 0 for x outside ½1; 1 275 2 f1 ðxÞZC; where C is a constant. The chosen transformation g(x) is then the solution of the differential equation Z dx gðxÞzC pffiffiffiffiffiffiffiffiffiffi: f1 ðxÞ When the dependence s2(x) Z f1(x) is of a power (exponent) nature, the optimal transformation will also be a power transformation. Since for a normal distribution the mean is not dependent on a variance, a transformation that stabilizes the variance makes the distribution closer to normal. Transformation leading to approximate normality may be carried out by the use of the BoxeCox transformation family (Box and Cox, 1964) defined as   l  jxj  1 =l for parameter ls0 yZgðxÞZ ; ð2Þ ln jxj for parameter lZ0 where x is a positive variable and l is a real number. The BoxeCox transformation can be applied only to positive data. To extend this transformation means to make a substitution of x values by (x  x0) values which are always positive. Here x0 is the threshold value x0 ! x(1). To estimate the parameter l in a BoxeCox transformation the method of profile likelihood may be used, because for lZ^l the distribution of the transformed variable y is considered to be normal, N(my, s2( y)). The logarithm of the profile likelihood function may be written as n X n ln LðlÞZ  ln s2 ðyÞCðl  1Þ ln xi ; 2 iZ1 ð3Þ where s2( y) is the sample variance of the transformed data y (Box and Cox, 1964). The function ln L Z f(l) is 276 M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 expressed graphically for a suitable interval, for example, 3 % l % 3. The maximum on this curve represents the maximum likelihood estimate ^l. The asymptotic 100(1  a) % confidence interval of parameter l is expressed by 2 ln L ^l  ln LðlÞ %c21a ð1Þ, where c21a ð1Þ is the quantile of the c2 distribution with 1 degree of freedom. This interval contains all l values for which it is true that ln LðlÞRln L ^l  0:5c21a ð1Þ. This BoxeCox transformation is less suitable if the confidence interval for l is too wide e and if the sample size is small then the confidence interval for the parameter will be wide. When the value l Z 1 is also covered by this confidence interval, the transformation is not efficient. 2.4. Re-expression of the statistical measurements after data transformation After an appropriate transformation of the original data {x} has been found, such that the transformed data give an approximately normal symmetrical distribution with constant variance, the statistical measurements of location and spread for the transformed data {y} are calculated. These include the sample mean y, the sample variance s2 ðyÞ, and the pffiffifficonfidence interval of the mean yGt1a=2 ðn  1ÞsðyÞ= n. These estimates must then be recalculated for the original data {x}. Two different approaches to the re-expression of the statistics for transformed data can be used without difficulty: (a) Rough re-expressions represented by a single reverse transformation xR Zg1 ðyÞ. This re-expression for a simple power transformation leads to the general re-expressed mean Pn l 1=l iZ1 xi xl Z xR Z ; ð4Þ n where for l Z 0, ln x is used instead of xl and ex instead of x1/l. The re-expressed mean xR Z x1 stands for the harmonic mean, xR Z x0 for the geometric mean, xR Z x1 for the arithmetic mean and xR Z x2 for the quadratic mean. (b) More correct re-expressions are based on the Taylor series expansion of the function y Z g(x) in a neighbourhood of the value y. The re-expressed mean xR is then given by ( )  2 1 d2 gðxÞ dgðxÞ 1 2 xR zg y  s ðyÞ : ð5Þ 2 dx2 dx The variance is then expressed as  2 dgðxÞ 2 s2 ðyÞ; xR Þz s ð dx where individual derivatives are calculated at the point xZ xR . The 100(1  a) %confidence interval of the reexpressed mean for the original data may be defined as xR  IL %m% xR CIU ; ð6Þ where sðyÞ IL Zg1 yCG  t1a=2 ðn  1Þ pffiffiffi ; n ð7Þ sðyÞ IU Zg1 yCGCt1a=2 ðn  1Þ pffiffiffi ; n ð8Þ and GZ   2 1 d2 gðxÞ dgðxÞ s2 ðyÞ: 2 dx2 dx ð9Þ On the basis of the (known) actual transformation y Z g(x) and the estimates y, s2( y) it is easy to calculate the re-expressed estimates xR and s2 ð xR Þ: 1. For a logarithmic transformation (when l Z 0) and g(x) Z ln x the re-expressed mean and variance are calculated by  xR zexp yC0:5 s2 ðyÞ ; ð10Þ and s2 ð xR Þz x2R s2 ðyÞ: ð11Þ 2. For l s 0 and the BoxeCox transformation, the reexpressed mean xR will be represented by one of the two roots of the quadratic equation xR;1;2 Z 0:5ð1Cl yÞG0:5 ! qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1C2lð yCs2 ðyÞÞCl2 y2  2s2 ðyÞ 1=l ; ð12Þ which is closest to the median x~0:5 Zg1 ð~ y0:5 Þ. If xR is known, the corresponding variance may be calculated from ð2lC2Þ 2 s2 ðxÞZ xR s ðyÞ: ð13Þ 3. Results Many statistical programs offer a list of the estimates of various point parameters of location and spread, but they rarely help the user to choose the M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 277 statistically adequate parameter for an actual sample batch. Exploratory data analysis and an examination of sample assumptions will find an answer to this question. The first case study with this methodology runs on typical geochemical sample data and will illustrate a rigorous procedure of the statistical treatment of univariate data with exploratory data analysis. Properly processed analytical data can be used in research, government and legislation. For example, (a) results may serve as a national database characterising the degree of pollution of agricultural soils; (b) the appropriate parts of such a database may be distributed to local offices, (environmental sections) to be available to the regional and local government (e.g. in urban planning, privatisation projects, changes in land use, the application of sewage sludge or sediment to agricultural soil); (c) results may be used in the process of constructing legislative measures concerning the limit values of harmful substances in soil; and (d) a database can serve as one source for calculating critical loads and balances of risk elements in agroecosystems. (1) Survey of descriptive statistics: ADSTAT statistical software calculates an actual sample batch, a survey of parameters of location and spread for n Z 40 317 (for an elucidation of the statistics cf. Meloun et al., 1992). On the basis of EDA, the user should select the most convenient parameter of location from the following available estimates: the arithmetic mean xZ0:238G0:003 mg kg1 , the median x^0:5 Z 0:19 mg kg1 and the following trimmed means xð10%ÞZ 0:210G0:001 mg kg1 , xð20%ÞZ0:203G 0:001, xð40%ÞZ 0:194G0:001 mg kg1 (calculated with ADSTAT, TriloByte Statistical Software, Pardubice, Czech Republic), the standard deviation s Z 0.300 mg kg1, and the parameters of shape are the skewness g^1 Z30:7 and the kurtosis g^2 Z2123, proving strongly skewed asymmetric distribution with a very sharp peak. (2) Basic diagnostic plots in the EDA are used for a graphical visualization of the data: in Fig. 1 all of the exploratory diagnostic graphs prove a strong deviation from a normal distribution. The box-andwhisker plot (Fig. 1a) indicates many outliers at high values, and the quantile plot (Fig. 1b) an asymmetric, skewed distribution. (3) Determination of sample distribution in the EDA: the sample distribution represented by symmetry, skewness and kurtosis is examined by the kernel density estimator of the probability density function (Fig. 1c). This plot shows that most sample points are located in one class and the plot indicates a strongly skewed sample distribution. The normal probability plot (Fig. 1d) checking a normal distribution does not Fig. 1. (a) The box-and-whisker plot of the Cd sample data. (b) The quantile plot of the Cd sample data. (c) The kernel density estimator of the probability density function of the Cd sample data. (d) The quantileequantile plot (for normal distribution called the Rankit plot) of the Cd sample data. exhibit close agreement of the sample points with a straight line. (4) Basic assumptions about the sample (cf. Meloun et al., 1992, pp. 78e82): applying an analysis of basic 278 M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 Fig. 2. The plot of the logarithm of the maximum likelihood for the Cd sample data with BoxeCox transformation. assumptions about the data the following conclusions were derived: sample elements are independent and homogeneous. A combined sample skewness and kurtosis test leads to the test statistic 2 C1 Z ½^ g ðxÞ  3 g^21 ðxÞ ; C 22 2 s ð^ s ð^ g2 ðxÞÞ g1 ðxÞÞ is 291.06 O c2(0.95, 2) Z 5.992 and therefore normality of data distribution was rejected. (5) Data transformation: as most diagnostic plots of the EDA exhibit an asymmetric distribution of original sample data, data transformation is a convenient technique to employ. In the case of the BoxeCox transformation the true mean value of a sample distribution with both confidence limits IL and IU is calculated. From the plot of the logarithm of the likelihood function for the BoxeCox transformation (Fig. 2) the maximum of the curve is at l Z 0.0556. Fig. 3. The quantileequantile plot for the Cd sample data with BoxeCox transformation. The corresponding 95% confidence interval does not contain the exponent value l Z 1, so all transformations are statistically significant. The normal probability plot (also called the Rankit plot) on Fig. 3 shows that the BoxeCox transformation brings more accurate results. While classical measures of location, spread and shape for the original data are xZ0:238 mg kg1 , s(x) Z 0.300 mg kg1, the skewness g^1 ðxÞZ30:74 and curtosis g^2 ðxÞZ2123:04 are out of statistical significance and may be taken as false estimates of location. The BoxeCox transformation (^lZ  0:0556) calculates the corrected mean xR Z0:187G0:001 mol dm3 . (6) Conclusion: all EDA display techniques prove that the sample distribution is skewed with many outliers, and does not come from a population with a normal distribution. For the best estimate of a location parameter the arithmetic mean does not represent an objective measure of location, 0.238 G 0.003 mg kg1, and cannot be used. On the basis of the quantileequantile plot the Boxe Cox transformation is considered the most rigorous technique to estimate a measure of location, with the corrected mean value xR Z0:187G0:001 mol dm3 . 4. Conclusions Often, chemical data are less than ideal and do not fulfill all basic assumptions. Original data can be transformed to improve the symmetry of data distribution and variance stabilization. Statistical measures of the transformed data are re-transformed to get these rigorous measures for the original data. Table 1 shows a survey of summary statistics for the elements beryllium, cadmium, cobalt, chromium, mercury, nickel, lead, vanadium and zinc. This survey includes classical and robust measures of central tendency, measures of variability, and measures of shape. Of particular interest here are sample size, the minimum and maximum values in a large sample, and both quartiles. The classical measures x and s are strongly corrupted with outliers and cannot be used here while the robust measures seem to be more accurate. Since the exploratory data analysis, the skewness and kurtosis and the quantile measures of location prove that the sample distribution strongly differs from a normal one, the data should be examined to find the proper transformation leading to symmetric distribution, stabilizing variance and making the distribution closer to normal. The most rigorous estimate of location is represented by the re-transformed mean xR after BoxeCox transformation of original data. This Table 1 Survey of summary statistics for the elements Be, Cd, Co, Cr, Hg, Ni, Pb, V and Zn including classical and robust measures of central tendency, measures of variability, and measures of shape Beryllium Cadmium Cobalt Chromium Mercury Nickel Lead Vanadium Zinc Sample size n Minimum x1 Maximum xn Lower quartile FL Upper quartile FU Interquartile range FUeFL 16 544 0 9.33 0.32 0.57 0.25 40 317 0 28.1 0.14 0.27 0.13 22 176 0.2 110.5 3.9 6.7 2.8 40 318 0.1 1577.4 3.2 6.9 3.7 32 344 0 69.086 0.06 0.11 0.05 34 989 0.1 662.0 3.0 7.3 4.3 40 344 0.17 1121.0 11.7 19.4 7.7 20 373 0.37 86.0 7.0 13.0 6.0 36 123 0.7 2070.0 12.0 22.0 10.0 0.238 G 0.003 0.300 30.74 2123.1 5.593 G 0.039 2.930 4.19 89.85 7.104 G 0.170 17.35 40.09 2608.52 0.105 G 0.006 0.534 107.88 12 963.7 6.033 G 0.081 7.728 34.49 2298.8 18.637 G 0.299 30.594 19.77 528.2 10.878 G 0.083 6.015 2.16 12.41 19.354 G 0.234 22.73 34.20 2265.0 0.19 G 0.00 0.210 G 0.001 0.203 G 0.001 0.194 G 0.001 5.0 G 0.0 5.356 G 0.033 5.264 G 0.032 5.150 G 0.030 4.60 G 0.05 5.361 G 0.040 5.072 G 0.033 4.795 G 0.030 0.08 G 0.00 0.086 G 0.001 0.083 G 0.001 0.081 G 0.001 4.70 G 0.05 5.320 G 0.039 5.109 G 0.037 4.883 G 0.036 14.90 G 0.05 15.860 G 0.067 15.548 G 0.063 15.214 G 0.061 9.60 G 0.10 10.320 G 0.074 10.050 G 0.072 9.752 G 0.067 16.0 G 0.05 17.446 G 0.089 17.020 G 0.085 16.531 G 0.084 JarqueeBerra normality test, critical value for a Z 0.05 is c20.95(2) Z 5.99 157.1 291.1 145.9 Testing criterion C1 Normality is rejected rejected rejected 311.1 rejected 382.0 rejected 294.3 rejected 259.3 rejected 111.4 rejected 294.9 rejected Homogeneity test Number of outliers 265 2095 496 2285 1180 1128 1359 486 961 BoxeCox transformation Re-transformed mean xR 0.427 G 0.003 0.187 G 0.001 5.078 G 0.030 4.922 G 0.023 0.082 G 0.001 4.797 G 0.020 15.172 G 0.050 9.611 G 0.050 16.360 G 0.050 Classical estimates of location, scale and shape Sample mean x 0.470 G 0.004 Standard deviation s 0.264 5.99 Skewness g^1 Kurtosis g^2 119 Robust estimates of location Median x~0:5 Trimmed mean xð10%Þ Trimmed mean xð20%Þ Trimmed mean xð40%Þ 0.43 G 0.01 0.449 G 0.003 0.443 G 0.003 0.438 G 0.003 M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 Estimate of Of particular interest here are sample size, minimum and maximum values within the sample, and both quartiles. The most rigorous estimates of location are re-transformed means after BoxeCox transformation. 279 280 M. Meloun et al. / Environmental Pollution 137 (2005) 273e280 estimate can be taken as the best for each element studied. Acknowledgements The financial support of the Grant Agency of the Ministry of Education of the Czech Republic (Grant No MSM0021627502) is gratefully acknowledged. References Box, G.E.P., Cox, D.R., 1964. Journal of Royal Statistical Association B26, 211e252. Chambers, J., Cleveland, W., Kleiner, W., Tukey, P., 1983. Graphical Methods for Data Analysis. Duxbury Press, Boston. Kupka, K., 2004. !http://www.trilobyte.cz/edaO. Meloun, M., Militký, J., Forina, M., 1992. Chemometrics for analytical chemistry. In: PC-Aided Statistical Data Analysis, vol. 1. Ellis Horwood, Chichester. Tukey, J.W., 1977. Exploratory Data Analysis. Addison Wesley, Reading, MA.