Environmental Pollution 137 (2005) 273e280
www.elsevier.com/locate/envpol
The analysis of soil cores polluted with certain
metals using the BoxeCox transformation
Milan Meloun a,*, Milan Sáňka b, Pavel Němec b,
Soňa Křı́tková a, Karel Kupka c
b
a
Department of Analytical Chemistry, University of Pardubice, CZ532 10 Pardubice, Czech Republic
Central Institute for Supervisiting and Testing in Agriculture Division of Agrochemistry, Soil and Plant Nutrition,
Hroznová 2, CZ656 06 Brno - Pisárky, Czech Republic
c
Trilobyte Statistical Software Ltd., CZ530 02 Pardubice, Czech Republic
Received 21 July 2004; accepted 28 January 2005
A new procedure of statistical analysis, with exploratory data diagnostics and
BoxeCox transformation was used.
Abstract
To define the soil properties for a given area or country including the level of pollution, soil survey and inventory programs are
essential tools. Soil data transformations enable the expression of the original data on a new scale, more suitable for data analysis. In
the computer-aided interactive analysis of large data files of soil characteristics containing outliers, the diagnostic plots of the
exploratory data analysis (EDA) often find that the sample distribution is systematically skewed or reject sample homogeneity.
Under such circumstances the original data should be transformed. The BoxeCox transformation improves sample symmetry and
stabilizes spread. The logarithmic plot of a profile likelihood function enables the optimum transformation parameter to be found.
Here, a proposed procedure for data transformation in univariate data analysis is illustrated on a determination of cadmium content
in the plough zone of agricultural soils. A typical soil pollution survey concerns the determination of the elements Be (16 544 values
available), Cd (40 317 values), Co (22 176 values), Cr (40 318 values), Hg (32 344 values), Ni (34 989 values), Pb (40 344 values), V
(20 373 values) and Zn (36 123 values) in large samples.
Ó 2005 Elsevier Ltd. All rights reserved.
Keywords: Data transformation; Exploratory analysis; Soil pollution; Risk element contents
1. Introduction
High concentrations of some (specifically heavy)
metals in soils can cause long-term harm to ecosystems
and humans. As society becomes increasingly concerned
* Corresponding author. Tel.: C42 40 603 7026; fax: C42 40 603
7068.
E-mail addresses: milan.meloun@upce.cz (M. Meloun), pavel.nemec@
ukzuz.cz (P. Němec), kupka@trilobyte.cz (K. Kupka).
URL: http://meloun.upce.cz
0269-7491/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envpol.2005.01.027
about the hazard posed by polluted soil it wants to know
how much of any pollutant there is in the ground. Soil
survey, monitoring and inventarization programs are
the inevitable tools used to define soil properties for
a given area or country, including its pollution status.
The Register of Contaminated Sites is one such
program, established as part of the Fertilisers Act (Act
No. 156/98 S.B., as amended) and connected decrees in
the Czech Republic. Within the framework of the
Register, a survey of the risk element (Cd, Pb, Cr, Hg)
content of agricultural soils on a 1-km2 grid was
274
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
implemented from 1990 to 1993. The four elements were
successively complemented by analyses of Be, Co, Ni, V
and Zn. This survey established a database which has
since been continuously filled out by the results of
supplementary sampling. Each sample in the batch is
identified by geographical co-ordinates and the number
of the plot in the relevant agricultural enterprise. The
results of risk element content quantification in 2 M
nitric acid extract or aqua regia extract are related. A
batch of over 40 000 soil samples has been analysed for
the Register database, and the exact sample sizes for each
element (2 M HNO3 extraction) are Be: 16 544 values
available, Cd: 40 317 values, Co: 22 176 values, Cr:
40 318 values, Hg: 32 344 values, Ni: 34 989 values, Pb:
40 344 values, V: 20 373 values and Zn: 36 123 values.
When an exploratory data analysis (Tukey, 1977;
Chambers et al., 1983) indicates that the sample
distribution strongly differs from a normal one, the
problem arises as to how to analyze the soil sample data.
Raw data may require re-expression to produce an
informative display, effective summary, or straightforward analysis (Meloun et al., 1992). Difficulties may
arise because the raw data have (i) a strong asymmetry,
or (ii) batches at different levels with a widely differing
spread. By altering the shape of the batch or batches
these problems may be alleviated. The data are transformed by applying a single mathematical function to all
of the raw data values. It is not only the units in which
the data are stated that may need to be changed but also
the basic scale of the measurement. Changes of origin
and scale mean linear transformations, and these leave
shape alone; nonlinear transformations such as the
logarithm and square root are necessary to change
shape. Changing the scale of measurement is natural
because it provides an alternative means of reporting the
information. Batch symmetry is often a desirable
property, as many estimates of location work best, and
are best understood, when the data come from
a symmetric distribution. The BoxeCox transformation
eliminates heteroscedasticity, and the reconstructed
mean and standard deviation are the estimates for the
corrected distribution of the data. But heteroscedasticity
has nothing in common with the outliers, objects which
do not follow the distribution of data majority. The
geochemical and pedological data sometimes arrive in
several batches at different levels and a systematic
relationship between spread and level is often found:
increasing level usually brings increasing spread. When
this relationship is strong there are several reasons for
transforming the data in a way that reduces or
eliminates the dependence spread on level: the transformed data will be better suited for comparison and
visual exploration, and may be better suited for
common confirmatory techniques, while individual
batches become more nearly symmetric and have fewer
outliers.
This paper provides a description of the BoxeCox
transformation and a re-expression of statistics for
transformed data. The procedure of BoxeCox transformation is illustrated on a typical soil pollution survey
case study concerning a determination of cadmium
content (mg kg1) and other elements (Be, Cd, Co, Cr,
Hg, Ni, Pb, V and Zn).
2. Materials and methods
2.1. Sampling
The batches of soil samples for risk element analyses
were taken from agricultural areas across the whole
Czech Republic using the following sampling technique:
one sample from arable land and grassland represents
an area of 7e10 ha, from hop gardens and orchards 3 ha
and from vineyards 2 ha. One composite sample consists
of 30 individual probes to depths of 30 cm or 15 cm on
arable land and grassland, respectively. The spatial
distribution of the composite samples is arranged so as
to achieve approximately 1 soil composite sample per
100 ha of agricultural soil. The following parameters
were measured in the soil samples: soil texture (determined according to the maps of the Complex Soil
Survey in the categories light, medium and heavy), pH
exchangeable in KCl extraction, As, Be, Cd, Co, Cr, Cu,
Mo, Ni, Pb, V, Zn determined by the AAS or ICP
method in 2 M HNO3 extraction, and total Hg content.
Samples from different areas were analysed for the
selected range of elements. The detection limit x^D
[mg kg1] and quantification limit x^Q [mg kg1] for the
quantitative determination of elements are, respectively,
for As 1.307 and 4.619, for Be 0.06 and 0.197, for Cd
0.061 and 0.196, for Co 0.658 and 2.203, for Cr 0.598
and 2.179, for Cu 0.515 and 1.821, for Hg 0.02 and 0.06,
for Mo 0.1 and 0.297, for Ni 0.574 and 1.992, for Pb
0.957 and 2.916, for V 1.611 and 5.764, and for Zn 1.37
and 3.979.
2.2. Proposed procedure of statistical data treatment
Step 1 Survey of descriptive statistics: the statistical
software for an actual sample batch usually
calculates a survey of parameters of location
and spread.
Step 2 Basic diagnostic plots in the exploratory data
analysis for a graphical visualization of data,
diagrams and simple plots are used i.e.
(a) the dot diagram and the jitter dot diagram,
(b) the box-and-whisker plot and the notched
box-and-whisker plot,
(c) the quantile plot, and
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
(d) the symmetry plot (cf. Meloun et al., 1992,
pp. 45e67).
Step 3 Determination of sample distribution: the sample
distribution represented by symmetry, skewness
and kurtosis is examined by
(e) the kernel density estimator of the probability density function
n
hx x i
1X
i
^
fðxÞZ
;
K
nh iZ1
h
ð1aÞ
where h is bandwidth, which controls the
^
smoothness of fðxÞ,
and K(x) is the kernel
function, which is symmetric around zero,
and also has the properties of a frequency
function. The actual choice of shape for the
kernel function is not important, so here
a bi-quadratic kernel estimate is used
2.3. Measurement of location using
data transformation
Examining the data, the proper transformation is
often found to be that which leads to a symmetric data
distribution, stabilizes the variance or makes the distribution closer to normal. Such transformation of the
original data x to the new variable value y Z g(x) is
based on an assumption that the original experimental
data represent a nonlinear transformation of a normally
distributed variable x Z g1( y).
Transformation for variance stabilization implies
ascertaining the transformation y Z g(x) in which the
variance s2( y) is constant. If the variance of the original
variable x is a function of the type s2(x) Z f1(mx) where
mx is the population mean of the original data, the
variance s2( y) may be expressed by
ð1bÞ
^
The quality of the kernel estimate fðxÞ
is
controlled mainly by the selection of parameter h. If h is too small, the estimate is too
^ is
rough; if it is too large, the shape of fðxÞ
flattened too much (cf. Meloun et al., 1992,
pp. 58).
(f) the quantileequantile plot, which is used for
comparison of the actual with the theoretical sample distribution.
Step 4 Tests of basic assumptions about the sample (cf.
Meloun et al., 1992, pp. 78e82): applying an
analysis of basic assumptions about the data,
the following examinations were applied:
(a) Examination for the independence of sample elements,
(b) Examination for the normality of the
sample distribution,
(c) Examination of sample homogeneity. All of
the algorithms used are available on the
internet (Kupka, 2004).
Step 5 Data transformation: when most of the EDA
diagnostic plots exhibit an asymmetric distribution in the original sample data, data transformation seems to be the most convenient of
several possible techniques to apply. For the
transformation the estimate l maximizing
ln L(l) is calculated. The selected l is used in
the calculation of estimates y, s2( y), g^1 ðyÞ, and
g^2 ðyÞ. From these estimates, the re-expressed
estimates of the original variables xR , s2 ð
xR Þ,
and the 95% confidence interval of the reexpressed variable m are then calculated (cf.
Meloun et al., 1992, pp. 70e77).
dgðxÞ
s ðyÞZ
dx
2
2 2
KðxÞZ 0:9375ð1 x Þ for 1%x%1
0
for x outside ½1; 1
275
2
f1 ðxÞZC;
where C is a constant. The chosen transformation g(x) is
then the solution of the differential equation
Z
dx
gðxÞzC pffiffiffiffiffiffiffiffiffiffi:
f1 ðxÞ
When the dependence s2(x) Z f1(x) is of a power
(exponent) nature, the optimal transformation will also
be a power transformation. Since for a normal distribution the mean is not dependent on a variance,
a transformation that stabilizes the variance makes the
distribution closer to normal. Transformation leading to
approximate normality may be carried out by the use of
the BoxeCox transformation family (Box and Cox,
1964) defined as
l
jxj 1 =l for parameter ls0
yZgðxÞZ
;
ð2Þ
ln jxj
for parameter lZ0
where x is a positive variable and l is a real number.
The BoxeCox transformation can be applied only to
positive data. To extend this transformation means to
make a substitution of x values by (x x0) values which
are always positive. Here x0 is the threshold value
x0 ! x(1). To estimate the parameter l in a BoxeCox
transformation the method of profile likelihood may be
used, because for lZ^l the distribution of the transformed variable y is considered to be normal, N(my,
s2( y)). The logarithm of the profile likelihood function
may be written as
n
X
n
ln LðlÞZ ln s2 ðyÞCðl 1Þ
ln xi ;
2
iZ1
ð3Þ
where s2( y) is the sample variance of the transformed
data y (Box and Cox, 1964). The function ln L Z f(l) is
276
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
expressed graphically for a suitable interval, for
example, 3 % l % 3. The maximum on this curve
represents the maximum likelihood estimate ^l. The
asymptotic 100(1 a) % confidence interval of parameter l is expressed by 2 ln L ^l ln LðlÞ %c21a ð1Þ,
where c21a ð1Þ is the quantile of the c2 distribution with
1 degree of freedom. This interval contains all l values
for which it is true that ln LðlÞRln L ^l 0:5c21a ð1Þ.
This BoxeCox transformation is less suitable if the
confidence interval for l is too wide e and if the sample
size is small then the confidence interval for the
parameter will be wide. When the value l Z 1 is also
covered by this confidence interval, the transformation
is not efficient.
2.4. Re-expression of the statistical measurements
after data transformation
After an appropriate transformation of the original
data {x} has been found, such that the transformed data
give an approximately normal symmetrical distribution
with constant variance, the statistical measurements of
location and spread for the transformed data {y} are
calculated. These include the sample mean y, the sample
variance s2 ðyÞ, and the
pffiffifficonfidence interval of the mean
yGt1a=2 ðn 1ÞsðyÞ= n. These estimates must then be
recalculated for the original data {x}. Two different
approaches to the re-expression of the statistics for
transformed data can be used without difficulty:
(a) Rough re-expressions represented by a single reverse
transformation xR Zg1 ðyÞ. This re-expression for
a simple power transformation leads to the general
re-expressed mean
Pn
l 1=l
iZ1 xi
xl Z
xR Z
;
ð4Þ
n
where for l Z 0, ln x is used instead of xl and ex
instead of x1/l. The re-expressed mean xR Z
x1
stands for the harmonic mean, xR Z
x0 for the
geometric mean, xR Z
x1 for the arithmetic mean
and xR Z
x2 for the quadratic mean.
(b) More correct re-expressions are based on the Taylor
series expansion of the function y Z g(x) in a neighbourhood of the value y. The re-expressed mean xR
is then given by
(
)
2
1 d2 gðxÞ dgðxÞ
1
2
xR zg
y
s ðyÞ :
ð5Þ
2 dx2
dx
The variance is then expressed as
2
dgðxÞ
2
s2 ðyÞ;
xR Þz
s ð
dx
where individual derivatives are calculated at the point
xZ
xR . The 100(1 a) %confidence interval of the reexpressed mean for the original data may be defined as
xR IL %m%
xR CIU ;
ð6Þ
where
sðyÞ
IL Zg1 yCG t1a=2 ðn 1Þ pffiffiffi ;
n
ð7Þ
sðyÞ
IU Zg1 yCGCt1a=2 ðn 1Þ pffiffiffi ;
n
ð8Þ
and
GZ
2
1 d2 gðxÞ dgðxÞ
s2 ðyÞ:
2 dx2
dx
ð9Þ
On the basis of the (known) actual transformation
y Z g(x) and the estimates y, s2( y) it is easy to calculate
the re-expressed estimates xR and s2 ð
xR Þ:
1. For a logarithmic transformation (when l Z 0) and
g(x) Z ln x the re-expressed mean and variance are
calculated by
xR zexp yC0:5 s2 ðyÞ ;
ð10Þ
and
s2 ð
xR Þz
x2R s2 ðyÞ:
ð11Þ
2. For l s 0 and the BoxeCox transformation, the reexpressed mean xR will be represented by one of the
two roots of the quadratic equation
xR;1;2 Z 0:5ð1Cl
yÞG0:5
!
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1C2lð
yCs2 ðyÞÞCl2 y2 2s2 ðyÞ
1=l
;
ð12Þ
which is closest to the median x~0:5 Zg1 ð~
y0:5 Þ. If xR
is known, the corresponding variance may be
calculated from
ð2lC2Þ 2
s2 ðxÞZ
xR
s ðyÞ:
ð13Þ
3. Results
Many statistical programs offer a list of the
estimates of various point parameters of location and
spread, but they rarely help the user to choose the
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
277
statistically adequate parameter for an actual sample
batch. Exploratory data analysis and an examination
of sample assumptions will find an answer to this
question. The first case study with this methodology
runs on typical geochemical sample data and will
illustrate a rigorous procedure of the statistical
treatment of univariate data with exploratory data
analysis.
Properly processed analytical data can be used in
research, government and legislation. For example, (a)
results may serve as a national database characterising
the degree of pollution of agricultural soils; (b) the
appropriate parts of such a database may be distributed to local offices, (environmental sections) to be
available to the regional and local government (e.g. in
urban planning, privatisation projects, changes in land
use, the application of sewage sludge or sediment to
agricultural soil); (c) results may be used in the process
of constructing legislative measures concerning the
limit values of harmful substances in soil; and (d)
a database can serve as one source for calculating
critical loads and balances of risk elements in agroecosystems.
(1) Survey of descriptive statistics: ADSTAT statistical
software calculates an actual sample batch, a survey
of parameters of location and spread for n Z 40 317
(for an elucidation of the statistics cf. Meloun et al.,
1992). On the basis of EDA, the user should select
the most convenient parameter of location from
the following available estimates: the arithmetic
mean xZ0:238G0:003 mg kg1 , the median x^0:5 Z
0:19 mg kg1 and the following trimmed means
xð10%ÞZ 0:210G0:001 mg kg1 , xð20%ÞZ0:203G
0:001, xð40%ÞZ 0:194G0:001 mg kg1 (calculated
with ADSTAT, TriloByte Statistical Software,
Pardubice, Czech Republic), the standard deviation
s Z 0.300 mg kg1, and the parameters of shape are
the skewness g^1 Z30:7 and the kurtosis g^2 Z2123,
proving strongly skewed asymmetric distribution
with a very sharp peak.
(2) Basic diagnostic plots in the EDA are used for
a graphical visualization of the data: in Fig. 1 all of
the exploratory diagnostic graphs prove a strong
deviation from a normal distribution. The box-andwhisker plot (Fig. 1a) indicates many outliers at high
values, and the quantile plot (Fig. 1b) an asymmetric, skewed distribution.
(3) Determination of sample distribution in the EDA: the
sample distribution represented by symmetry, skewness and kurtosis is examined by the kernel density
estimator of the probability density function (Fig. 1c).
This plot shows that most sample points are located in
one class and the plot indicates a strongly skewed
sample distribution. The normal probability plot
(Fig. 1d) checking a normal distribution does not
Fig. 1. (a) The box-and-whisker plot of the Cd sample data. (b) The
quantile plot of the Cd sample data. (c) The kernel density estimator of
the probability density function of the Cd sample data. (d) The
quantileequantile plot (for normal distribution called the Rankit plot)
of the Cd sample data.
exhibit close agreement of the sample points with
a straight line.
(4) Basic assumptions about the sample (cf. Meloun
et al., 1992, pp. 78e82): applying an analysis of basic
278
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
Fig. 2. The plot of the logarithm of the maximum likelihood for the Cd
sample data with BoxeCox transformation.
assumptions about the data the following conclusions were derived: sample elements are independent
and homogeneous. A combined sample skewness
and kurtosis test leads to the test statistic
2
C1 Z
½^
g ðxÞ 3
g^21 ðxÞ
;
C 22
2
s ð^
s ð^
g2 ðxÞÞ
g1 ðxÞÞ
is 291.06 O c2(0.95, 2) Z 5.992 and therefore normality of data distribution was rejected.
(5) Data transformation: as most diagnostic plots of the
EDA exhibit an asymmetric distribution of original
sample data, data transformation is a convenient
technique to employ. In the case of the BoxeCox
transformation the true mean value of a sample
distribution with both confidence limits IL and IU is
calculated. From the plot of the logarithm of the
likelihood function for the BoxeCox transformation
(Fig. 2) the maximum of the curve is at l Z 0.0556.
Fig. 3. The quantileequantile plot for the Cd sample data with
BoxeCox transformation.
The corresponding 95% confidence interval does not
contain the exponent value l Z 1, so all transformations are statistically significant. The normal
probability plot (also called the Rankit plot) on
Fig. 3 shows that the BoxeCox transformation
brings more accurate results.
While classical measures of location, spread and
shape for the original data are xZ0:238 mg kg1 ,
s(x) Z 0.300 mg kg1, the skewness g^1 ðxÞZ30:74
and curtosis g^2 ðxÞZ2123:04 are out of statistical
significance and may be taken as false estimates
of location. The BoxeCox transformation
(^lZ 0:0556) calculates the corrected mean
xR Z0:187G0:001 mol dm3 .
(6) Conclusion: all EDA display techniques prove that
the sample distribution is skewed with many
outliers, and does not come from a population
with a normal distribution. For the best estimate of
a location parameter the arithmetic mean does not
represent an objective measure of location,
0.238 G 0.003 mg kg1, and cannot be used. On
the basis of the quantileequantile plot the Boxe
Cox transformation is considered the most rigorous
technique to estimate a measure of location, with
the corrected mean value xR Z0:187G0:001
mol dm3 .
4. Conclusions
Often, chemical data are less than ideal and do not
fulfill all basic assumptions. Original data can be
transformed to improve the symmetry of data distribution and variance stabilization. Statistical measures of
the transformed data are re-transformed to get these
rigorous measures for the original data. Table 1 shows
a survey of summary statistics for the elements
beryllium, cadmium, cobalt, chromium, mercury, nickel,
lead, vanadium and zinc. This survey includes classical
and robust measures of central tendency, measures of
variability, and measures of shape. Of particular interest
here are sample size, the minimum and maximum values
in a large sample, and both quartiles. The classical
measures x and s are strongly corrupted with outliers
and cannot be used here while the robust measures seem
to be more accurate. Since the exploratory data analysis,
the skewness and kurtosis and the quantile measures of
location prove that the sample distribution strongly
differs from a normal one, the data should be examined
to find the proper transformation leading to symmetric
distribution, stabilizing variance and making the distribution closer to normal. The most rigorous estimate of
location is represented by the re-transformed mean xR
after BoxeCox transformation of original data. This
Table 1
Survey of summary statistics for the elements Be, Cd, Co, Cr, Hg, Ni, Pb, V and Zn including classical and robust measures of central tendency, measures of variability, and measures of shape
Beryllium
Cadmium
Cobalt
Chromium
Mercury
Nickel
Lead
Vanadium
Zinc
Sample size n
Minimum x1
Maximum xn
Lower quartile FL
Upper quartile FU
Interquartile range FUeFL
16 544
0
9.33
0.32
0.57
0.25
40 317
0
28.1
0.14
0.27
0.13
22 176
0.2
110.5
3.9
6.7
2.8
40 318
0.1
1577.4
3.2
6.9
3.7
32 344
0
69.086
0.06
0.11
0.05
34 989
0.1
662.0
3.0
7.3
4.3
40 344
0.17
1121.0
11.7
19.4
7.7
20 373
0.37
86.0
7.0
13.0
6.0
36 123
0.7
2070.0
12.0
22.0
10.0
0.238 G 0.003
0.300
30.74
2123.1
5.593 G 0.039
2.930
4.19
89.85
7.104 G 0.170
17.35
40.09
2608.52
0.105 G 0.006
0.534
107.88
12 963.7
6.033 G 0.081
7.728
34.49
2298.8
18.637 G 0.299
30.594
19.77
528.2
10.878 G 0.083
6.015
2.16
12.41
19.354 G 0.234
22.73
34.20
2265.0
0.19 G 0.00
0.210 G 0.001
0.203 G 0.001
0.194 G 0.001
5.0 G 0.0
5.356 G 0.033
5.264 G 0.032
5.150 G 0.030
4.60 G 0.05
5.361 G 0.040
5.072 G 0.033
4.795 G 0.030
0.08 G 0.00
0.086 G 0.001
0.083 G 0.001
0.081 G 0.001
4.70 G 0.05
5.320 G 0.039
5.109 G 0.037
4.883 G 0.036
14.90 G 0.05
15.860 G 0.067
15.548 G 0.063
15.214 G 0.061
9.60 G 0.10
10.320 G 0.074
10.050 G 0.072
9.752 G 0.067
16.0 G 0.05
17.446 G 0.089
17.020 G 0.085
16.531 G 0.084
JarqueeBerra normality test, critical value for a Z 0.05 is c20.95(2) Z 5.99
157.1
291.1
145.9
Testing criterion C1
Normality is
rejected
rejected
rejected
311.1
rejected
382.0
rejected
294.3
rejected
259.3
rejected
111.4
rejected
294.9
rejected
Homogeneity test
Number of outliers
265
2095
496
2285
1180
1128
1359
486
961
BoxeCox transformation
Re-transformed mean xR
0.427 G 0.003
0.187 G 0.001
5.078 G 0.030
4.922 G 0.023
0.082 G 0.001
4.797 G 0.020
15.172 G 0.050
9.611 G 0.050
16.360 G 0.050
Classical estimates of location, scale and shape
Sample mean x
0.470 G 0.004
Standard deviation s
0.264
5.99
Skewness g^1
Kurtosis g^2
119
Robust estimates of location
Median x~0:5
Trimmed mean xð10%Þ
Trimmed mean xð20%Þ
Trimmed mean xð40%Þ
0.43 G 0.01
0.449 G 0.003
0.443 G 0.003
0.438 G 0.003
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
Estimate of
Of particular interest here are sample size, minimum and maximum values within the sample, and both quartiles. The most rigorous estimates of location are re-transformed means after BoxeCox
transformation.
279
280
M. Meloun et al. / Environmental Pollution 137 (2005) 273e280
estimate can be taken as the best for each element
studied.
Acknowledgements
The financial support of the Grant Agency of the
Ministry of Education of the Czech Republic (Grant No
MSM0021627502) is gratefully acknowledged.
References
Box, G.E.P., Cox, D.R., 1964. Journal of Royal Statistical Association
B26, 211e252.
Chambers, J., Cleveland, W., Kleiner, W., Tukey, P., 1983.
Graphical Methods for Data Analysis. Duxbury Press, Boston.
Kupka, K., 2004. !http://www.trilobyte.cz/edaO.
Meloun, M., Militký, J., Forina, M., 1992. Chemometrics for
analytical chemistry. In: PC-Aided Statistical Data Analysis,
vol. 1. Ellis Horwood, Chichester.
Tukey, J.W., 1977. Exploratory Data Analysis. Addison Wesley,
Reading, MA.