Next: Functions and Variables for statistical graphs, Previous: Functions and Variables for data manipulation, Up: descriptive [Contents][Index]
This is the sample mean, defined as
n ==== _ 1 \ x = - > x n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean (s1); 471 (%o3) --- 100
(%i4) %, numer; (%o4) 4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) mean (s2); (%o6) [9.9485, 10.1607, 10.8685, 15.7166, 14.8441]
This is the sample variance, defined as
n ==== 2 1 \ _ 2 s = - > (x - x) n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var (s1), numer; (%o3) 8.425899999999999
See also function var1
.
This is the sample variance, defined as
n ==== 1 \ _ 2 --- > (x - x) n-1 / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) var1 (s1), numer; (%o3) 8.5110101010101
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) var1 (s2); (%o5) [17.39586540404041, 15.13912778787879, 15.63204924242424, 32.50152569696971, 24.66977392929294]
See also function var
.
This is the square root of the function var
, the variance with denominator \(n\).
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std (s1), numer; (%o3) 2.902740084816414
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std (s2); (%o5) [4.149928523480858, 3.871399812729241, 3.933920277534866, 5.672434260526957, 4.941970881136392]
See also functions var
and std1
.
This is the square root of the function var1
, the variance with denominator \(n-1\).
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) std1 (s1), numer; (%o3) 2.917363553109228
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) std1 (s2); (%o5) [4.170835096721089, 3.89090320978032, 3.953738641137555, 5.701010936401517, 4.966867617451963]
See also functions var1
and std
.
The non central moment of order \(k\), defined as
n ==== 1 \ k - > x n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) noncentral_moment (s1, 1), numer; /* the mean */ (%o3) 4.71
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) noncentral_moment (s2, 5); (%o6) [319793.8724761505, 320532.1923892463, 391249.5621381556, 2502278.205988911, 1691881.797742255]
See also function central_moment
.
The central moment of order \(k\), defined as
n ==== 1 \ _ k - > (x - x) n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) central_moment (s1, 2), numer; /* the variance */ (%o3) 8.425899999999999
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) central_moment (s2, 3); (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102, 37.5986572057918, 25.85981904394192]
See also functions central_moment
and mean
.
The variation coefficient is the quotient between the sample standard deviation (std
) and the mean
,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) cv (s1), numer; (%o3) .6193977819764815
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) cv (s2); (%o5) [.4192426091090204, .3829365309260502, 0.363779605385983, .3627381836021478, .3346021393989506]
See also functions std
and mean
.
This is the minimum value of the sample list.
When the argument is a matrix, smin
returns
a list containing the minimum values of the columns,
which are associated to statistical variables.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smin (s1); (%o3) 0
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smin (s2); (%o5) [0.58, 0.5, 2.67, 5.25, 5.17]
See also function smax
.
This is the maximum value of the sample list.
When the argument is a matrix, smax
returns
a list containing the maximum values of the columns,
which are associated to statistical variables.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) smax (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) smax (s2); (%o5) [20.25, 21.46, 20.04, 29.63, 27.63]
See also function smin
.
The range is the difference between the extreme values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) range (s1); (%o3) 9
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) range (s2); (%o5) [19.67, 20.96, 17.37, 24.38, 22.46]
This is the p-quantile, with p a number in \([0, 1]\), of the sample list. Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package descriptive
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) /* 1st and 3rd quartiles */ [quantile (s1, 1/4), quantile (s1, 3/4)], numer; (%o3) [2.0, 7.25]
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quantile (s2, 1/4); (%o5) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median (s1); 9 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median (s2); (%o5) [10.06, 9.855, 10.73, 15.48, 14.105]
The median is the 1/2-quantile.
See also function quantile
.
The interquartilic range is the difference between the third and first quartiles, quantile(list,3/4) - quantile(list,1/4)
,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) qrange (s1); 21 (%o3) -- 4
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) qrange (s2); (%o5) [5.385, 5.572499999999998, 6.022500000000001, 8.729999999999999, 6.649999999999999]
See also function quantile
.
The mean deviation, defined as
n ==== 1 \ _ - > |x - x| n / i ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) mean_deviation (s1); 51 (%o3) -- 20
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) mean_deviation (s2); (%o5) [3.287959999999999, 3.075342, 3.23907, 4.715664000000001, 4.028546000000002]
See also function mean
.
The median deviation, defined as
n ==== 1 \ - > |x - med| n / i ==== i = 1
where med
is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) median_deviation (s1); 5 (%o3) - 2
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) median_deviation (s2); (%o5) [2.75, 2.755, 3.08, 4.315, 3.31]
See also function mean
.
The harmonic mean, defined as
n -------- n ==== \ 1 > -- / x ==== i i = 1
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) harmonic_mean (y), numer; (%o3) 3.901858027632205
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) harmonic_mean (s2); (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745, 13.44199028193692, 13.01439145898509]
See also functions mean
and geometric_mean
.
The geometric mean, defined as
/ n \ 1/n | /===\ | | ! ! | | ! ! x | | ! ! i| | i = 1 | \ /
Example:
(%i1) load ("descriptive")$ (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i3) geometric_mean (y), numer; (%o3) 4.454845412337012
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) geometric_mean (s2); (%o5) [8.82476274347979, 9.22652604739361, 10.0442675714889, 14.61274126349021, 13.96184163444275]
See also functions mean
and harmonic_mean
.
The kurtosis coefficient, defined as
n ==== 1 \ _ 4 ---- > (x - x) - 3 4 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) kurtosis (s1), numer; (%o3) - 1.273247946514421
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) kurtosis (s2); (%o5) [- .2715445622195385, 0.119998784429451, - .4275233490482861, - .6405361979019522, - .4952382132352935]
See also functions mean
, var
and skewness
.
The skewness coefficient, defined as
n ==== 1 \ _ 3 ---- > (x - x) 3 / i n s ==== i = 1
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) skewness (s1), numer; (%o3) .009196180476450424
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) skewness (s2); (%o5) [.1580509020000978, .2926379232061854, .09242174416107717, .2059984348148687, .2142520248890831]
See also functions mean
,, var
and kurtosis
.
Pearson’s skewness coefficient, defined as
_ 3 (x - med) ----------- s
where med is the median of list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) pearson_skewness (s1), numer; (%o3) .2159484029093895
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) pearson_skewness (s2); (%o5) [- .08019976629211892, .2357036272952649, .1050904062491204, .1245042340592368, .4464181795804519]
See also functions mean
, var
and median
.
The quartile skewness coefficient, defined as
c - 2 c + c 3/4 1/2 1/4 -------------------- c - c 3/4 1/4
where \(c_p\) is the p-quantile of sample list.
Example:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) quartile_skewness (s1), numer; (%o3) .04761904761904762
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) quartile_skewness (s2); (%o5) [- 0.0408542246982353, .1467025572005382, 0.0336239103362392, .03780068728522298, .2105263157894735]
See also function quantile
.
Kaplan Meier estimator of the survival, or reliability, function \(S(x)=1-F(x)\).
Data can be introduced as a list of pairs, or as a two column matrix. The first component is the observed time, and the second component a censoring index (1 = non censored, 0 = right censored).
The optional argument is the name of the variable in the returned expression, which is x by default.
Examples:
Sample as a list of pairs.
(%i1) load ("descriptive")$
(%i2) S: km([[2,1], [3,1], [5,0], [8,1]]); charfun((3 <= x) and (x < 8)) (%o2) charfun(x < 0) + ----------------------------- 2 3 charfun((2 <= x) and (x < 3)) + ------------------------------- 4 + charfun((0 <= x) and (x < 2))
(%i3) load ("draw")$
(%i4) draw2d( line_width = 3, grid = true, explicit(S, x, -0.1, 10))$
Estimate survival probabilities.
(%i1) load ("descriptive")$ (%i2) S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
(%i3) S(6); 1 (%o3) - 2
Empirical distribution function \(F(x)\).
Data can be introduced as a list of numbers, or as an one column matrix.
The optional argument is the name of the variable in the returned expression, which is x by default.
Example:
Empirical distribution function.
(%i1) load ("descriptive")$
(%i2) F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9])); (%o2) F(x) := (charfun(x >= 9) + charfun(x >= 8) + 3 charfun(x >= 7) + charfun(x >= 5) + 2 charfun(x >= 3) + charfun(x >= 1))/9
(%i3) F(6); 4 (%o3) - 9
(%i4) load("draw")$ (%i5) draw2d( line_width = 3, grid = true, explicit(F(z), z, -2, 12)) $
The covariance matrix of the multivariate sample, defined as
n ==== 1 \ _ _ S = - > (X - X) (X - X)' n / j j ==== j = 1
where \(X_j\) is the \(j\)-th row of the sample matrix.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$ /* change precision for pretty output */
(%i4) cov (s2); [ 17.22191 13.61811 14.37217 19.39624 15.42162 ] [ ] [ 13.61811 14.98774 13.30448 15.15834 14.9711 ] [ ] (%o4) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ] [ ] [ 19.39624 15.15834 17.32544 32.17651 20.44685 ] [ ] [ 15.42162 14.9711 16.18171 20.44685 24.42308 ]
See also function cov1
.
The covariance matrix of the multivariate sample, defined as
n ==== 1 \ _ _ S = --- > (X - X) (X - X)' 1 n-1 / j j ==== j = 1
where \(X_j\) is the \(j\)-th row of the sample matrix.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec : 7$ /* change precision for pretty output */
(%i4) cov1 (s2); [ 17.39587 13.75567 14.51734 19.59216 15.5774 ] [ ] [ 13.75567 15.13913 13.43887 15.31145 15.12232 ] [ ] (%o4) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ] [ ] [ 19.59216 15.31145 17.50044 32.50153 20.65338 ] [ ] [ 15.5774 15.12232 16.34516 20.65338 24.66977 ]
See also function cov
.
Function global_variances
returns a list of global variance measures:
trace(S_1)
,
trace(S_1)/p
,
determinant(S_1)
,
sqrt(determinant(S_1))
,
determinant(S_1)^(1/p)
, (defined in: Peña, D. (2002) Análisis de datos multivariantes; McGraw-Hill, Madrid.)
determinant(S_1)^(1/(2*p))
.
where p is the dimension of the multivariate random variable and \(S_1\) the covariance matrix returned by cov1
.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) global_variances (s2); (%o3) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608501, 6.636590811800795, 2.576158149609762]
Calculate the global_variances
from the covariance matrix.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) s : cov1 (s2)$
(%i4) global_variances (s, data=false); (%o4) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608501, 6.636590811800795, 2.576158149609762]
The correlation matrix of the multivariate sample.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Example:
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) cor (s2); [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o4) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ]
Calculate de correlation matrix from the covariance matrix.
(%i1) load ("descriptive")$ (%i2) fpprintprec : 7 $ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) s : cov1 (s2)$
(%i5) cor (s, data=false); /* this is faster */ [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o5) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ]
Function list_correlations
returns a list of correlation measures:
-1 ij S = (s ) 1 i,j = 1,2,...,p
2 1 R = 1 - ------- i ii s s ii
being an indicator of the goodness of fit of the linear multivariate regression model on \(X_i\) when the rest of variables are used as regressors.
ij s r = - ------------ ij.rest / ii jj\ 1/2 |s s | \ /
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
Example:
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) z : list_correlations (s2)$ (%i4) fpprintprec : 5$ /* for pretty output */
(%i5) z[1]; /* precision matrix */ [ .38486 - .13856 - .15626 - .10239 .031179 ] [ ] [ - .13856 .34107 - .15233 .038447 - .052842 ] [ ] (%o5) [ - .15626 - .15233 .47296 - .024816 - .10054 ] [ ] [ - .10239 .038447 - .024816 .10937 - .034033 ] [ ] [ .031179 - .052842 - .10054 - .034033 .14834 ]
(%i6) z[2]; /* multiple correlation vector */ (%o6) [.85063, .80634, .86474, .71867, .72675]
(%i7) z[3]; /* partial correlation matrix */ [ - 1.0 .38244 .36627 .49908 - .13049 ] [ ] [ .38244 - 1.0 .37927 - .19907 .23492 ] [ ] (%o7) [ .36627 .37927 - 1.0 .10911 .37956 ] [ ] [ .49908 - .19907 .10911 - 1.0 .26719 ] [ ] [ - .13049 .23492 .37956 .26719 - 1.0 ]
Calculates the principal components of a multivariate sample. Principal components are used in multivariate statistical analysis to reduce the dimensionality of the sample.
Option:
'data
, default 'true
, indicates whether the input matrix contains the sample data,
in which case the covariance matrix cov1
must be calculated, or not, and then the covariance
matrix (symmetric) must be given, instead of the data.
The output of function principal_components
is a list with the following results:
Examples:
In this sample, the first component explains 83.13 per cent of total variance.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) fpprintprec:4 $ (%i4) res: principal_components(s2); 0 errors, 0 warnings (%o4) [[87.57, 8.753, 5.515, 1.889, 1.613], [83.13, 8.31, 5.235, 1.793, 1.531],
[ .4149 .03379 - .4757 - 0.581 - .5126 ] [ ] [ 0.369 - .3657 - .4298 .7237 - .1469 ] [ ] [ .3959 - .2178 - .2181 - .2749 .8201 ]] [ ] [ .5548 .7744 .1857 .2319 .06498 ] [ ] [ .4765 - .4669 0.712 - .09605 - .1969 ]
(%i5) /* accumulated percentages */ block([ap: copy(res[2])], for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1], ap); (%o5) [83.13, 91.44, 96.68, 98.47, 100.0] (%i6) /* sample dimension */ p: length(first(res)); (%o6) 5 (%i7) /* plot percentages to select number of principal components for further work */ draw2d( fill_density = 0.2, apply(bars, makelist([k, res[2][k], 1/2], k, p)), points_joined = true, point_type = filled_circle, point_size = 3, points(makelist([k, res[2][k]], k, p)), xlabel = "Variances", ylabel = "Percentages", xtics = setify(makelist([concat("PC",k),k], k, p))) $
In case de covariance matrix is known, it can be passed to the function,
but option data=false
must be used.
(%i1) load ("descriptive")$ (%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]); [ 1 - 2 0 ] [ ] (%o2) [ - 2 5 0 ] [ ] [ 0 0 2 ] (%i3) fpprintprec:4 $ (%i4) /* the argument is a covariance matrix */ res: principal_components(S, data=false); 0 errors, 0 warnings [ - .3827 0.0 .9239 ] [ ] (%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [ .9239 0.0 .3827 ]] [ ] [ 0.0 1.0 0.0 ] (%i5) /* transformation to get the principal components from original records */ matrix([a1,b2,c3],[a2,b2,c2]).last(res); [ .9239 b2 - .3827 a1 1.0 c3 .3827 b2 + .9239 a1 ] (%o5) [ ] [ .9239 b2 - .3827 a2 1.0 c2 .3827 b2 + .9239 a2 ]
Next: Functions and Variables for statistical graphs, Previous: Functions and Variables for data manipulation, Up: descriptive [Contents][Index]