Title: | Model and Analyse Interval Data |
---|---|
Description: | Implements methodologies for modelling interval data by Normal and Skew-Normal distributions, considering appropriate parameterizations of the variance-covariance matrix that takes into account the intrinsic nature of interval data, and lead to four different possible configuration structures. The Skew-Normal parameters can be estimated by maximum likelihood, while Normal parameters may be estimated by maximum likelihood or robust trimmed maximum likelihood methods. |
Authors: | Pedro Duarte Silva <[email protected]>, Paula Brito <mpbrito.fep.up.pt> |
Maintainer: | Pedro Duarte Silva <[email protected]> |
License: | GPL-2 |
Version: | 2.7.1 |
Built: | 2025-01-10 05:19:55 UTC |
Source: | https://github.com/cran/MAINT.Data |
MAINT.Data implements methodologies for modelling Interval Data by Normal and Skew-Normal distributions, considering four different possible configurations structures for the variance-covariance matrix. It introduces a data class for representing interval data and includes functions and methods for parametric modelling and analysing of interval data. It performs maximum likelihood and trimmed maximum likelihood estimation, statistical tests, as well as (M)ANOVA, Discriminant Analysis and Gaussian Model Based Clustering.
In the classical model of multivariate data analysis, data is represented in a data-array where n “individuals" (usually in rows) take exactly one value for each variable (usually in columns).
Symbolic Data Analysis (see, e.g., Noirhomme-Fraiture and Brito (2011)) provides a framework where new variable types allow to take directly into account variability and/or uncertainty associated to each single “individual",
by allowing multiple, possibly weighted, values for each variable.
New variable types - interval, categorical multi-valued and modal variables - have been introduced.
We focus on the analysis of interval data, i.e., where elements are described by variables whose values are intervals.
Parametric inference methodologies based on probabilistic models for interval variables are developed in Brito and Duarte Silva (2011) where each interval is represented by its midpoint and log-range,for which Normal and Skew-Normal (Azzalini and Dalla Valle (1996)) distributions are assumed.
The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which are represented by four different possible configurations.
MAINT.Data implements the proposed methodologies in R, introducing a data class for representing interval data; it
includes functions for modelling and analysing interval data, in particular maximum likelihood and trimmed maximum likelihood (Duarte Silva, Filzmoser and Brito (2017)) estimation, and statistical tests for the different considered configurations.
Methods for (M)ANOVA, Discriminant Analysis (Duarte Silva and Brito (2015)) and model based clustering (Brito, Duarte Silva and Dias (2015)) of this data class are also provided.
Package: | MAINT.Data |
Type: | Package |
Version: | 2.7.0 |
Date: | 2020-06-06 |
License: | GPL-2 |
LazyLoad: | yes |
LazyData: | yes |
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Maintainer: Pedro Duarte Silva <[email protected]>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P. and Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Brito, P., Duarte Silva, A. P. and Dias, J. G. (2015), Probabilistic Clustering of Interval Data. Intelligent Data Analysis 19(2), 293–313.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Noirhomme-Fraiture, M. and Brito, P. (2011), Far Beyond the Classical Data Models: Symbolic Data Analysis. Statistical Analysis and Data Mining 4(2), 157–170.
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Display the first and last observations head(ChinaT) tail(ChinaT) #Print summary statistics summary(ChinaT) #Create a new data set considering only the Winter (1st and 4th) quarter intervals ChinaWT <- ChinaT[,c(1,4)] # Estimate normal distribution parameters by maximum likelihood, assuming # the classical (unrestricted) covariance configuration Case 1 ChinaWTE.C1 <- mle(ChinaWT,CovCase=1) cat("Winter temperatures of China -- normal maximum likelhiood estimation results:\n") print(ChinaWTE.C1) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C1)) # Estimate normal distribution parameters by maximum likelihood, # assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold ChinaWTE.C234 <- mle(ChinaWT,CovCase=2:4) cat("Winter temperatures of China -- normal maximum likelihood estimation results:\n") print(ChinaWTE.C234) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C234)) # Estimate normal distribution parameters robustly by fast maximun trimmed likelihood, # assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold ## Not run: ChinaWTE.C234 <- fasttle(ChinaWT,CovCase=2:4) cat("Winter temperatures of China -- normal maximum trimmed likelhiood estimation results:\n") print(ChinaWTE.C234) # Estimate skew-normal distribution parameters ChinaWTE.SkN <- mle(ChinaWT,Model="SKNormal") cat("Winter temperatures of China -- Skew-Normal maximum likelhiood estimation results:\n") print(ChinaWTE.SkN) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.SkN)) ## End(Not run) #MANOVA tests assuming that configuration case 1 (unrestricted covariance) # or 3 (MidPoints independent of Log-Ranges) holds. ManvChinaWT.C13 <- MANOVA(ChinaWT,ChinaTemp$GeoReg,CovCase=c(1,3)) cat("Winter temperatures of China -- MANOVA by geografical regions results:\n") print(ManvChinaWT.C13) #Linear Discriminant Analysis ChinaWT.lda <- lda(ManvChinaWT.C13) cat("Winter temperatures of China -- linear discriminant analysis results:\n") print(ChinaWT.lda) cat("lda Prediction results:\n") print(predict(ChinaWT.lda,ChinaWT)$class) ## Not run: #Estimate error rates by ten-fold cross-validation CVlda <- DACrossVal(ChinaWT,ChinaTemp$GeoReg,TrainAlg=lda, CovCase=BestModel(H1res(ManvChinaWT.C13)),CVrep=1) #Robust Quadratic Discriminant Analysis ChinaWT.rqda <- Robqda(ChinaWT,ChinaTemp$GeoReg) cat("Winter temperatures of China -- robust quadratic discriminant analysis results:\n") print(ChinaWT.rqda) cat("robust qda prediction results:\n") print(predict(ChinaWT.rqda,ChinaWT)$class) ## End(Not run) # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) print(LbyPIdt) ## Not run: #Fit homoscedastic Gaussian mixtures with up to six components mclustres <- Idtmclust(LbyPIdt,G=1:6) plotInfCrt(mclustres,legpos="bottomright") print(mclustres) #Display the results of the best mixture according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Display the first and last observations head(ChinaT) tail(ChinaT) #Print summary statistics summary(ChinaT) #Create a new data set considering only the Winter (1st and 4th) quarter intervals ChinaWT <- ChinaT[,c(1,4)] # Estimate normal distribution parameters by maximum likelihood, assuming # the classical (unrestricted) covariance configuration Case 1 ChinaWTE.C1 <- mle(ChinaWT,CovCase=1) cat("Winter temperatures of China -- normal maximum likelhiood estimation results:\n") print(ChinaWTE.C1) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C1)) # Estimate normal distribution parameters by maximum likelihood, # assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold ChinaWTE.C234 <- mle(ChinaWT,CovCase=2:4) cat("Winter temperatures of China -- normal maximum likelihood estimation results:\n") print(ChinaWTE.C234) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C234)) # Estimate normal distribution parameters robustly by fast maximun trimmed likelihood, # assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold ## Not run: ChinaWTE.C234 <- fasttle(ChinaWT,CovCase=2:4) cat("Winter temperatures of China -- normal maximum trimmed likelhiood estimation results:\n") print(ChinaWTE.C234) # Estimate skew-normal distribution parameters ChinaWTE.SkN <- mle(ChinaWT,Model="SKNormal") cat("Winter temperatures of China -- Skew-Normal maximum likelhiood estimation results:\n") print(ChinaWTE.SkN) cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.SkN)) ## End(Not run) #MANOVA tests assuming that configuration case 1 (unrestricted covariance) # or 3 (MidPoints independent of Log-Ranges) holds. ManvChinaWT.C13 <- MANOVA(ChinaWT,ChinaTemp$GeoReg,CovCase=c(1,3)) cat("Winter temperatures of China -- MANOVA by geografical regions results:\n") print(ManvChinaWT.C13) #Linear Discriminant Analysis ChinaWT.lda <- lda(ManvChinaWT.C13) cat("Winter temperatures of China -- linear discriminant analysis results:\n") print(ChinaWT.lda) cat("lda Prediction results:\n") print(predict(ChinaWT.lda,ChinaWT)$class) ## Not run: #Estimate error rates by ten-fold cross-validation CVlda <- DACrossVal(ChinaWT,ChinaTemp$GeoReg,TrainAlg=lda, CovCase=BestModel(H1res(ManvChinaWT.C13)),CVrep=1) #Robust Quadratic Discriminant Analysis ChinaWT.rqda <- Robqda(ChinaWT,ChinaTemp$GeoReg) cat("Winter temperatures of China -- robust quadratic discriminant analysis results:\n") print(ChinaWT.rqda) cat("robust qda prediction results:\n") print(predict(ChinaWT.rqda,ChinaWT)$class) ## End(Not run) # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) print(LbyPIdt) ## Not run: #Fit homoscedastic Gaussian mixtures with up to six components mclustres <- Idtmclust(LbyPIdt,G=1:6) plotInfCrt(mclustres,legpos="bottomright") print(mclustres) #Display the results of the best mixture according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) ## End(Not run)
A interval-valued data set containing 24 units, created from from the Abalone dataset (UCI Machine Learning Repository), after aggregating by sex and age.
data(Abalone)
data(Abalone)
AbdaDF: A data frame containing the original 4177 Abalone individuals described by 7 variables.
AbUnits: A factor with 4177 observations and 24 levels indicating the sex by age combination to which each orginal individual belongs to.
AbaloneIdt: An IData object with 24 observations and 7 interval-valued variables, describing the intervals formed by aggregating the AbdaDF microdata by the AbUnits factor.
AgrMcDt creates IData
objects by agregating a Data Frame of Micro Data.
AgrMcDt(MicDtDF, agrby, agrcrt="minmax")
AgrMcDt(MicDtDF, agrby, agrcrt="minmax")
MicDtDF |
A data frame with the original values of the micro data. |
agrby |
A factor with categories on which the micro data should be aggregated. |
agrcrt |
The aggregation criterion. Either the ‘minmax’ string, or a two dimensional vector with the prob. value for the left (lower) percentile, followed by the prob. value for the right (upper) percentile, used in the aggregation. |
An object of class IData
with the data set of Interval-valued variables resulting from the aggregation performed.
# Create an Interval-Data object by agregating the microdata consisting # of 336776 NYC flights included in the FlightsDF data frame, # by the statistical units specified in the FlightsUnits factor. Flightsminmax <- AgrMcDt(FlightsDF,FlightsUnits) #Display the first and last observations head(Flightsminmax) tail(Flightsminmax) #Print summary statistics summary(Flightsminmax) ## Not run: # Repeat this procedure using now the 10th and 90th percentiles. Flights1090prcnt <- AgrMcDt(FlightsDF,FlightsUnits,agrcrt=c(0.1,0.9)) #Display the first and last observations head(Flights1090prcnt) tail(Flights1090prcnt) summary(Flights1090prcnt) ## End(Not run)
# Create an Interval-Data object by agregating the microdata consisting # of 336776 NYC flights included in the FlightsDF data frame, # by the statistical units specified in the FlightsUnits factor. Flightsminmax <- AgrMcDt(FlightsDF,FlightsUnits) #Display the first and last observations head(Flightsminmax) tail(Flightsminmax) #Print summary statistics summary(Flightsminmax) ## Not run: # Repeat this procedure using now the 10th and 90th percentiles. Flights1090prcnt <- AgrMcDt(FlightsDF,FlightsUnits,agrcrt=c(0.1,0.9)) #Display the first and last observations head(Flights1090prcnt) tail(Flights1090prcnt) summary(Flights1090prcnt) ## End(Not run)
Selects the best model according to the chosen selection criterion (currently, BIC or AIC)
BestModel(ModE,SelCrit=c("IdtCrt","BIC","AIC"))
BestModel(ModE,SelCrit=c("IdtCrt","BIC","AIC"))
ModE |
An object of class |
SelCrit |
The model selection criterion. “IdtCrt” stands for the criterion originally used in the ModE estimation, while “BIC” and “AIC” represent respectively the Bayesian and Akaike information criteria. |
An integer with the index of the model chosen by the selection criterion
This data set consist of the intervals for four characteristics (Price, EngineCapacity, TopSpeed and Acceleration) of 27 cars models partitioned into four different classes (Utilitarian, Berlina, Sportive and Luxury).
data(Cars)
data(Cars)
A data frame containing 27 observations on 9 variables, the first eight with the the lower and upper bounds of the interval characteristics for 27 car models, the last one a factor indicating the model class.
This data set consist of the intervals of observed temperatures (Celsius scale) in each of the four quarters, Q_1 to Q_4, of the years 1974 to 1988 in 60 chinese meteorologic stations; one outlier observation (YinChuan_1982) has been discarded. The 60 stations belong to different regions in China, which therefore define a partition of the 899 stations-year combinations.
data(ChinaTemp)
data(ChinaTemp)
A data frame containing 899 observations on 9 variables, the first eight with the lower and upper bounds of the temperatures by quarter in the 899 stations-year combinations, the last one a factor indicating the geographic region of each station.
S4 methods for function coef. As in the generic coef S3 ‘stats’ method, these methods extract parameter estimates for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' coef(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtSNDE' coef(object, selmodel=BestModel(object), ParType=c("Centr", "Direct", "All"), ...) ## S4 method for signature 'IdtNandSNDE' coef(object, selmodel=BestModel(object), ParType=c("Centr", "Direct", "All"), ...)
## S4 method for signature 'IdtNDE' coef(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtSNDE' coef(object, selmodel=BestModel(object), ParType=c("Centr", "Direct", "All"), ...) ## S4 method for signature 'IdtNandSNDE' coef(object, selmodel=BestModel(object), ParType=c("Centr", "Direct", "All"), ...)
object |
An object representing a model fitted to interval data. |
selmodel |
Selected model from a list of candidate models saved in object. |
ParType |
Parameterization of the Skew-Normal distribution. Only used when object has class |
... |
Additional arguments for method functions. |
A list of parameter estimates. The list components depend on the model and parametriztion assumed by the model. For Gaussian models these are respectivelly mu (vector of mean estimates) and Sigma (matrix of covariance estimates). For Skew-Normal models the components are mu, Sigma and gamma1 (one vector of skewness coefficient estimates) for the centred parametrization and the vectors ksi, and alpha, and the matrix Omega for the direct parametrization.
Arellano-Valle, R. B. and Azzalini, A. (2008): "The centred parametrization for the multivariate skew-normal distribution". Journal of Multivariate Analysis, Volume 99, Issue 7, 1362-1382.
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ChinaT_NE <- mle(ChinaT) # Display model estimates print(coef(ChinaT_NE)) ## Not run: # Estimate Skew-Normal distribution parameters by maximum likelihood ChinaT_SNE <- mle(ChinaT,Model="SKNormal") # Display model estimates print(coef(ChinaT_SNE,ParType="Centr")) print(coef(ChinaT_SNE,ParType="Direct")) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ChinaT_NE <- mle(ChinaT) # Display model estimates print(coef(ChinaT_NE)) ## Not run: # Estimate Skew-Normal distribution parameters by maximum likelihood ChinaT_SNE <- mle(ChinaT,Model="SKNormal") # Display model estimates print(coef(ChinaT_SNE,ParType="Centr")) print(coef(ChinaT_SNE,ParType="Direct")) ## End(Not run)
‘ConfMat’ creates confussion matrices from two factor describing, respectively, original classes and predicted classification results
ConfMat(origcl, predcl, otp=c("absandrel","abs","rel"), dec=3)
ConfMat(origcl, predcl, otp=c("absandrel","abs","rel"), dec=3)
origcl |
A factor describing the original classes. |
predcl |
A factor describing the predicted classes. |
otp |
A string describing the output to be displayed and returned. Alternatives are “absandrel” for two confusion matrices, respectively with absolute and relative frequencies, “abs” for a confusion matrix with absolute frequencies, and “rel” for a confusion matrix relative frequencies. |
dec |
The number of decimal digits to display in matrices of relative frequencies. |
When argument ‘otp’ is set to “absandrel” (default), a list with two confusion matrices, respectively with absolute and relative frequencies. When argument ‘otp’ is set to “abs” a confusion matrix with absolute frequencies, and when argument ‘otp’ is set to “rel” a confusion matrix with relative frequencies.
A. Pedro Duarte Silva
lda
, qda
, snda
, Roblda
, Robqda
, DACrossVal
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Linear Discriminant Analysis ChinaT.lda <- lda(ChinaT,ChinaTemp$GeoReg) ldapred <- predict(ChinaT.lda,ChinaT)$class # lda resubstitution confusion matrix ConfMat(ChinaTemp$GeoReg,ldapred) #Quadratic Discriminant Analysis ChinaT.qda <- qda(ChinaT,ChinaTemp$GeoReg) qdapred <- predict(ChinaT.qda,ChinaT)$class # qda resubstitution confusion matrix ConfMat(ChinaTemp$GeoReg,qdapred)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Linear Discriminant Analysis ChinaT.lda <- lda(ChinaT,ChinaTemp$GeoReg) ldapred <- predict(ChinaT.lda,ChinaT)$class # lda resubstitution confusion matrix ConfMat(ChinaTemp$GeoReg,ldapred) #Quadratic Discriminant Analysis ChinaT.qda <- qda(ChinaT,ChinaTemp$GeoReg) qdapred <- predict(ChinaT.qda,ChinaT)$class # qda resubstitution confusion matrix ConfMat(ChinaTemp$GeoReg,qdapred)
ConfTests contains a list of the results of statistical likelihood-ratio tests that evaluate the goodness-of-fit of restricted models against more general ones. Currently, the models implemented are those based on the Normal and Skew-Normal distributions, with the four alternative variance-covariance matrix configurations.
TestRes
:List of test results; each element is an object of class LRTest, with the following components:
ChiSq: Value of the Chi-Square statistics corresponding to the performed test.
df: Degrees of freedom of the Chi-Square statistics.
pvalue: p-value of the Chi-Square statistics value, obtained from the Chi-Square distribution with df degrees of freedom.
H0logLik: Logarithm of the Likelihood function under the null hypothesis.
H1logLik: Logarithm of the Likelihood function under the alternative hypothesis.
RestModels
:The restricted model (corresponding to the null hypothesis)
FullModels
:The full model (corresponding to the alternative hypothesis)
signature(object = "ConfTests")
: show S4 method for the ConfTests-class
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
S4 methods for function cor. These methods extract estimates of correlation matrices for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' cor(x) ## S4 method for signature 'IdtSNDE' cor(x) ## S4 method for signature 'IdtNandSNDE' cor(x) ## S4 method for signature 'IdtMxNDE' cor(x) ## S4 method for signature 'IdtMxSNDE' cor(x)
## S4 method for signature 'IdtNDE' cor(x) ## S4 method for signature 'IdtSNDE' cor(x) ## S4 method for signature 'IdtNandSNDE' cor(x) ## S4 method for signature 'IdtMxNDE' cor(x) ## S4 method for signature 'IdtMxSNDE' cor(x)
x |
An object representing a model fitted to interval data. |
For the IdtNDE
, IdtSNDE
and IdtNandSNDE
methods or IdtMxNDE
, IdtMxSNDE
methods with slot “Hmcdt” equal to TRUE: a matrix with the estimated correlations.
For the IdtMxNDE
, and IdtMxSNDE
methods with slot “Hmcdt” equal to FALSE: a three-dimensional array with a matrix with the estimated correlations for each group at each level of the third dimension.
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ChinaT_NE <- mle(ChinaT) # Display correlation estimates print(cor(ChinaT_NE))
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ChinaT_NE <- mle(ChinaT) # Display correlation estimates print(cor(ChinaT_NE))
‘DACrossVal’ evaluates the performance of a Discriminant Analysis training sample algorithm by k-fold Cross-Validation.
DACrossVal(data, grouping, TrainAlg, EvalAlg=EvalClrule, Strfolds=TRUE, kfold=10, CVrep=20, prior="proportions", loo=FALSE, dec=3, ...)
DACrossVal(data, grouping, TrainAlg, EvalAlg=EvalClrule, Strfolds=TRUE, kfold=10, CVrep=20, prior="proportions", loo=FALSE, dec=3, ...)
data |
Matrix, data frame or Interval Data object of observations. |
grouping |
Factor specifying the class for each observation. |
TrainAlg |
A function with the training algorithm. It should return an object that can be used as input to the argument of ‘EValAlg’. |
EvalAlg |
A function with the evaluation algorithm. By default set to ‘EvalClrule’ which returns a list with components “err” (estimates of error rates by class) and “Nk” (number of out-sample observations by class). This default can be used for all ‘TrainAlg’ arguments that return an object with a predict method returning a list with a ‘class’ component (a factor) containing the classification results. |
Strfolds |
Boolean flag indicating if the folds should be stratified according to the original class proportions (default), or randomly generated from the whole training sample, ignoring class membership. |
kfold |
Number of training sample folds to be created in each replication. |
CVrep |
Number of replications to be performed. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
loo |
A boolean flag indicating if a leave-one-out strategy should be employed. When set to “TRUE” overrides the kfold and CVrep arguments. |
dec |
The number of decimal digits to display in confusion matrices of relative frequencies. |
... |
Further arguments to be passed to ‘TrainAlg’ and ‘EvalAlg’. |
A three dimensional array with the number of tested observations, and estimated classification errors for each combination of fold and replication tried. The array dimensions are defined as follows:
The first dimension runs through the different fold-replication combinations.
The second dimension represents the classes.
The third dimension has two named levels representing respectively the number of observations tested (“Nk”), and the estimated classification errors (“Clerr”).
A. Pedro Duarte Silva
## Not run: # Compare performance of linear and quadratic discriminant analysis with # Covariance cases C1 and c4 on the ChinaT data set by 5-fold cross-validation # replicated twice # Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Classical (configuration 1) Linear Discriminant Analysis CVldaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=1,kfold=5,CVrep=2) summary(CVldaC1[,,"Clerr"]) # Linear Discriminant Analysis with covariance case 3 CVldaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=3,kfold=5,CVrep=2) summary(CVldaC4[,,"Clerr"]) # Classical (configuration 1) Quadratic Discriminant Analysis CVqdaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=1,kfold=5,CVrep=2) summary(CVqdaC1[,,"Clerr"]) # Quadratic Discriminant Analysis with covariance case 3 CVqdaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=3,kfold=5,CVrep=2) summary(CVqdaC4[,,"Clerr"]) ## End(Not run)
## Not run: # Compare performance of linear and quadratic discriminant analysis with # Covariance cases C1 and c4 on the ChinaT data set by 5-fold cross-validation # replicated twice # Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Classical (configuration 1) Linear Discriminant Analysis CVldaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=1,kfold=5,CVrep=2) summary(CVldaC1[,,"Clerr"]) # Linear Discriminant Analysis with covariance case 3 CVldaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=3,kfold=5,CVrep=2) summary(CVldaC4[,,"Clerr"]) # Classical (configuration 1) Quadratic Discriminant Analysis CVqdaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=1,kfold=5,CVrep=2) summary(CVqdaC1[,,"Clerr"]) # Quadratic Discriminant Analysis with covariance case 3 CVqdaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=3,kfold=5,CVrep=2) summary(CVqdaC4[,,"Clerr"]) ## End(Not run)
This function will create a control object of class EMControl
containing the control parameters
for the EM algorithm used in estimation of Gaussian mixtures by function Idtmclust
.
EMControl(nrep=0, maxiter=1000, convtol=0.01, protol=1e-3, seed=NULL, pertubfct=1, k2max=1e6, MaxVarGRt=1e6)
EMControl(nrep=0, maxiter=1000, convtol=0.01, protol=1e-3, seed=NULL, pertubfct=1, k2max=1e6, MaxVarGRt=1e6)
nrep |
Number of replications (different randomly generated starting points) of the EM algorithm. |
maxiter |
Maximum number of iterations in each replication of the EM algorithm. |
convtol |
Numeric tolerance for testing the convergence of the EM algorithm. Convergence is assumed when the log-likelihood changes less than convtol. |
protol |
Numeric tolerance for the mixture proportions. Proportions below protol, considered to be zero, are not allowed. |
seed |
Starting value for random generator. |
pertubfct |
Perturbation factor used to control the degree similarity between the alternative randomly generated starting points of the EM algorithm. Increasing (decreasing) the value of pertubfct increases (decreases) the expected difference between the starting points generated. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Solutions in which any component has correlation matrix with condition number above k2max, are considered to be spurious solutions and are eliminated from the EM search. |
MaxVarGRt |
Maximal allowed ratio of variances across components. Solutions in which any variable has a ratio between its maximal and minimal (across components) variances above MaxVarGRt, are considered to be spurious solutions and are eliminated from the EM search. |
An EMControl
object
This class contains the control parameters for the EM algorithm used in estimation of Gaussian mixtures by function Idtmclust
. .
Objects can be created by calls of the form new("EMControl", ...)
or by calling the constructor-function EMControl
.
nrep
Number of replications (different randomly generated starting points) of the EM algorithm.
maxiter
Maximum number of iterations in each replication of the EM algorithm.
convtol
Numeric tolerance for testing the convergence of the EM algorithm. Convergence is assumed when the log-likelihood changes less than convtol.
protol
Numeric tolerance for the mixture proportions. Proportions below protol, considered to be zero, are not allowed.
seed
Starting value for random generator.
“extmatrix” is a simple extension of the base matrix class, that that accepts NULL objects as members.
Class matrix
, directly.
Performs maximum trimmed likelihood estimation by the fasttle algorithm
fasttle(Sdt, CovCase=1:4, SelCrit=c("BIC","AIC"), alpha=control@alpha, nsamp = control@nsamp, seed=control@seed, trace=control@trace, [email protected], ncsteps=control@ncsteps, getalpha=control@getalpha, rawMD2Dist=control@rawMD2Dist, MD2Dist=control@MD2Dist, eta=control@eta, multiCmpCor=control@multiCmpCor, getkdblstar=control@getkdblstar, outlin=control@outlin, trialmethod=control@trialmethod, m=control@m, reweighted = control@reweighted, k2max = control@k2max, otpType=control@otpType, control=RobEstControl(), ...)
fasttle(Sdt, CovCase=1:4, SelCrit=c("BIC","AIC"), alpha=control@alpha, nsamp = control@nsamp, seed=control@seed, trace=control@trace, use.correction=control@use.correction, ncsteps=control@ncsteps, getalpha=control@getalpha, rawMD2Dist=control@rawMD2Dist, MD2Dist=control@MD2Dist, eta=control@eta, multiCmpCor=control@multiCmpCor, getkdblstar=control@getkdblstar, outlin=control@outlin, trialmethod=control@trialmethod, m=control@m, reweighted = control@reweighted, k2max = control@k2max, otpType=control@otpType, control=RobEstControl(), ...)
Sdt |
An IData object representing interval-valued units. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
alpha |
Numeric parameter controlling the size of the subsets over which the trimmed likelihood is maximized; roughly alpha*nrow(Sdt) observations are used for computing the trimmed likelihood. Note that when argument ‘getalpha’ is set to “TwoStep” the final value of ‘alpha’ is estimated by a two-step procedure and the value of argument ‘alpha’ is only used to specify the size of the samples used in the first step. Allowed values are between 0.5 and 1. |
nsamp |
Number of subsets used for initial estimates. |
seed |
Initial seed for random generator, like |
trace |
Logical (or integer) indicating if intermediate results should be printed; defaults to |
use.correction |
whether to use finite sample correction factors; defaults to |
ncsteps |
The maximum number of concentration steps used each iteration of the fasttle algorithm. |
getalpha |
Argument specifying if the ‘alpha’ parameter (roughly the percentage of the sample used for computing the trimmed likelihood) should be estimated from the data, or if the value of the argument ‘alpha’ should be used instead. When set to “TwoStep”, ‘alpha’ is estimated by a two-step procedure with the value of argument ‘alpha’ specifying the size of the samples used in the first step. Otherwise, the value of argument ‘alpha’ is used directly. |
rawMD2Dist |
The assumed reference distribution of the raw MCD squared distances, which is used to find to cutoffs defining the observations kept in one-step reweighted MCD estimates. Alternatives are ‘ChiSq’,‘HardRockeAsF’ and ‘HardRockeAdjF’, respectivelly for the usual Chi-square, and the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005). |
MD2Dist |
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-square, or the Beta and F distributions proposed by Cerioli (2010). |
eta |
Nominal size for the null hypothesis that a given observation is not an outlier. Defines the raw MCD Mahalanobis distances cutoff used to choose the observations kept in the reweightening step. |
multiCmpCor |
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level. |
getkdblstar |
Argument specifying the size of the initial small (in order to minimize the probability of outliers) subsets. If set to the string “Twopplusone” (default) the initial sets have twice the number of interval-value variables plus one (i.e., they are the smaller samples that lead to a non-singular covariance estimate). Otherwise, an integer with the size of the initial sets. |
outlin |
The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges. |
trialmethod |
The method to find a trial subset used to initialize each replication of the fasttle algorithm. The current options are “simple” (default) that simply selects ‘kdblstar’ observations at random, and “Poolm” that divides the original sample into ‘m’ non-overlaping subsets, applies the ‘simple trial’ and the refinement methods to each one of them, and merges the results into a trial subset. |
m |
Number of non-overlaping subsets used by the trial method when the argument of ‘trialmethod’ is set to 'Poolm'. |
reweighted |
Should a (Re)weighted estimate of the covariance matrix be used in the computation of the trimmed likelihood or just a “raw” covariance estimate; default is (Re)weighting. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
otpType |
The amount of output returned by fasttle. Current options are “SetMD2andEst” (default) which returns an ‘IdtSngNDRE’ object with the fasttle estimates, |
control |
a list with estimation options - this includes those above provided in the function specification. See
|
... |
Further arguments to be passed to internal functions of |
An object of class IdtE
with the fasttle estimates, the value of the comparison criterion used to select the covariance configurations, the robust squared Mahalanobis distances, and optionally (if argument ‘otpType’ is set to true) performance statistics concerning the algorithm execution.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Hadi, A. S. and Luceno, A. (1997), Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms.
Computational Statistics and Data Analysis 25(3), 251–272.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances.
Journal of Computational and Graphical Statistics 14, 910–927.
Todorov V. and Filzmoser P. (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software 32(3), 1–47.
fulltle
, RobEstControl
, getIdtOutl
, IdtSngNDRE
## Not run: # Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighted # MCD estimate, and the classical 97.5% chi-square quantile cut-offs. Chinafasttle1 <- fasttle(ChinaT) cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle1) # Estimate parameters by the fast trimmed maximum likelihood estimator, using # the triming parameter that maximizes breakdown, and a reweighted MCD estimate # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. Chinafasttle2 <- fasttle(ChinaT,alpha=0.5,getalpha=FALSE,rawMD2Dist="HardRockeAdjF") cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle2) # Estimate parameters by the fast trimmed maximum likelihood estimator, using a two-step procedure # to select the triming parameter, a reweighed MCD estimate based on Hardin and Rocke adjusted # F distributions, and 95% quantiles, and the Cerioli Beta and F distributions together # with Cerioli iterated procedure to identify outliers in the first step. Chinafasttle3 <- fasttle(ChinaT,rawMD2Dist="HardRockeAdjF",eta=0.05,MD2Dist="CerioliBetaF", multiCmpCor="iterstep") cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle3) ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighted # MCD estimate, and the classical 97.5% chi-square quantile cut-offs. Chinafasttle1 <- fasttle(ChinaT) cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle1) # Estimate parameters by the fast trimmed maximum likelihood estimator, using # the triming parameter that maximizes breakdown, and a reweighted MCD estimate # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. Chinafasttle2 <- fasttle(ChinaT,alpha=0.5,getalpha=FALSE,rawMD2Dist="HardRockeAdjF") cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle2) # Estimate parameters by the fast trimmed maximum likelihood estimator, using a two-step procedure # to select the triming parameter, a reweighed MCD estimate based on Hardin and Rocke adjusted # F distributions, and 95% quantiles, and the Cerioli Beta and F distributions together # with Cerioli iterated procedure to identify outliers in the first step. Chinafasttle3 <- fasttle(ChinaT,rawMD2Dist="HardRockeAdjF",eta=0.05,MD2Dist="CerioliBetaF", multiCmpCor="iterstep") cat("China maximum trimmed likelihood estimation results =\n") print(Chinafasttle3) ## End(Not run)
Performs maximum trimmed likelihood estimation by an exact algorithm (full enumeratiom of all k-trimmed subsets)
fulltle(Sdt, CovCase = 1:4, SelCrit = c("BIC", "AIC"), alpha = 0.75, use.correction = TRUE, getalpha = "TwoStep", rawMD2Dist = c("ChiSq", "HardRockeAsF", "HardRockeAdjF"), MD2Dist = c("ChiSq", "CerioliBetaF"), eta = 0.025, multiCmpCor = c("never", "always", "iterstep"), outlin = c("MidPandLogR", "MidP", "LogR"), reweighted = TRUE, k2max=1e6, force = FALSE, ...)
fulltle(Sdt, CovCase = 1:4, SelCrit = c("BIC", "AIC"), alpha = 0.75, use.correction = TRUE, getalpha = "TwoStep", rawMD2Dist = c("ChiSq", "HardRockeAsF", "HardRockeAdjF"), MD2Dist = c("ChiSq", "CerioliBetaF"), eta = 0.025, multiCmpCor = c("never", "always", "iterstep"), outlin = c("MidPandLogR", "MidP", "LogR"), reweighted = TRUE, k2max=1e6, force = FALSE, ...)
Sdt |
An IData object representing interval-valued units. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
alpha |
Numeric parameter controlling the size of the subsets over which the trimmed likelihood is maximized; roughly alpha*nrow(Sdt) observations are used for computing the trimmed likelihood. Note that when argument ‘getalpha’ is set to “TwoStep” the final value of ‘alpha’ is estimated by a two-step procedure and the value of argument ‘alpha’ is only used to specify the size of the samples used in the first step. Allowed values are between 0.5 and 1. |
use.correction |
whether to use finite sample correction factors; defaults to |
getalpha |
Argument specifying if the ‘alpha’ parameter (roughly the percentage of the sample used for computing the trimmed likelihood) should be estimated from the data, or if the value of the argument ‘alpha’ should be used instead. When set to “TwoStep”, ‘alpha’ is estimated by a two-step procedure with the value of argument ‘alpha’ specifying the size of the samples used in the first step. Otherwise, the value of argument ‘alpha’ is used directly. |
rawMD2Dist |
The assumed reference distribution of the raw MCD squared distances, which is used to find to cutoffs defining the observations kept in one-step reweighted MCD estimates. Alternatives are ‘ChiSq’, ‘HardRockeAsF’ and ‘HardRockeAdjF’, respectivelly for the usual Chi-square, and the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005). |
MD2Dist |
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-square, and the Beta and F distributions proposed by Cerioli (2010). |
eta |
Nominal size of the null hypothesis that a given observation is not an outlier. Defines the raw MCD Mahalanobis distances cutoff used to choose the observations kept in the reweightening step. |
multiCmpCor |
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level. |
outlin |
The type of outliers to be consideres. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges. |
reweighted |
should a (Re)weighted estimate of the covariance matrix be used in the computation of the trimmed likelihood or just a “raw” covariance estimate; default is (Re)weighting. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
force |
A boolean flag indicating whether, for moderate or large data sets the algorithm should proceed anyway, regardless of an expected long excution time, due to exponential explosions in the number of different subsets that need to be avaluated by fulltle. |
... |
Further arguments to be passed to internal functions of ‘fulltle’. |
An object of class IdtE
with the fulltle estimates, the value of the comparison criterion used to select the covariance configurations and the robust squared Mahalanobis distances.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Hadi, A. S. and Luceno, A. (1997), Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms.
Computational Statistics and Data Analysis 25(3), 251–272.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances.
Journal of Computational and Graphical Statistics 14, 910–927.
## Not run: # Create an Interval-Data object containing the intervals for characteristics # of 27 cars models. CarsIdt <- IData(Cars[1:8],VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration")) # Estimate parameters by the full trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighed # MCD estimate, and the classical 97.5% chi-square quantile cut-offs. CarsTE1 <- fulltle(CarsIdt) cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE1) # Estimate parameters by the full trimmed maximum likelihood estimator, using # the triming parameter that maximizes breakdown, and a reweighed MCD estimate # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. CarsTE2 <- fulltle(CarsIdt,alpha=0.5,getalpha=FALSE,rawMD2Dist="HardRockeAdjF") cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE2) # Estimate parameters by the full trimmed maximum likelihood estimator, using # a two-step procedure to select the trimming parameter, and a reweighed MCD estimate # based on Hardin and Rocke adjusted F distributions, 95% quantiles, and # the Cerioli Beta and F distributions together with his iterated procedure # to identify outliers in the first step. CarsTE3 <- fulltle(CarsIdt,rawMD2Dist="HardRockeAdjF",eta=0.05,MD2Dist="CerioliBetaF", multiCmpCor="iterstep") cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE3) ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals for characteristics # of 27 cars models. CarsIdt <- IData(Cars[1:8],VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration")) # Estimate parameters by the full trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighed # MCD estimate, and the classical 97.5% chi-square quantile cut-offs. CarsTE1 <- fulltle(CarsIdt) cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE1) # Estimate parameters by the full trimmed maximum likelihood estimator, using # the triming parameter that maximizes breakdown, and a reweighed MCD estimate # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. CarsTE2 <- fulltle(CarsIdt,alpha=0.5,getalpha=FALSE,rawMD2Dist="HardRockeAdjF") cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE2) # Estimate parameters by the full trimmed maximum likelihood estimator, using # a two-step procedure to select the trimming parameter, and a reweighed MCD estimate # based on Hardin and Rocke adjusted F distributions, 95% quantiles, and # the Cerioli Beta and F distributions together with his iterated procedure # to identify outliers in the first step. CarsTE3 <- fulltle(CarsIdt,rawMD2Dist="HardRockeAdjF",eta=0.05,MD2Dist="CerioliBetaF", multiCmpCor="iterstep") cat("Cars data -- normal maximum trimmed likelihood estimation results:\n") print(CarsTE3) ## End(Not run)
Identifies outliers in a data set of Interval-valued variables
getIdtOutl(Sdt, IdtE=NULL, muE=NULL, SigE=NULL, eta=0.025, Rewind=NULL, m=length(Rewind), RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"), multiCmpCor=c("never","always","iterstep"), outlin=c("MidPandLogR","MidP","LogR"))
getIdtOutl(Sdt, IdtE=NULL, muE=NULL, SigE=NULL, eta=0.025, Rewind=NULL, m=length(Rewind), RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"), multiCmpCor=c("never","always","iterstep"), outlin=c("MidPandLogR","MidP","LogR"))
Sdt |
An IData object representing interval-valued entities. |
IdtE |
Ao object of class |
muE |
Vector with the mean estimates used to find Mahalanobis distances. When specified, it overrides the mean estimate supplied in “IdtE”. |
SigE |
Matrix with the covariance estimates used to find Mahalanobis distances. When specified, it overrides the covariance estimate supplied in “IdtE”. |
eta |
Nominal size of the null hypothesis that a given observation is not an outlier. |
Rewind |
A vector with the subset of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Only used when the ‘RefDist’ argument is set to “CerioliBetaF.” |
m |
Number of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Not used when the ‘RefDist’ argument is set to “ChiSq.” |
multiCmpCor |
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level. |
RefDist |
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq”,“HardRockeAsF”, “HardRockeAdjF” and “CerioliBetaF”, respectivelly for the usual Chi-squared, the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005), and the Beta and F distributions proposed by Cerioli (2010). |
outlin |
The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges. |
A vector with the indices of the entities identified as outliers.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910–927.
## Not run: # Create an Interval-Data object containing the intervals for characteristics # of 27 cars models. CarsIdt <- IData(Cars[1:8],VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration")) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighed # MCD estimate, and the classical 97.5% chi-squared quantile cut-offs. Carstle1 <- fulltle(CarsIdt) # Get and display the outliers using the classical 97.5% chi-squared quantile cut-offs. CarsOtl1 <- getIdtOutl(CarsIdt,Carstle1) print(CarsOtl1) plot(CarsOtl1) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, and a reweighed # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. Carstle2 <- fulltle(CarsIdt,rawMD2Dist="HardRockeAdjF") # Get and display the outliers using the 97.5 CarsTtl2 <- getIdtOutl(CarsIdt,Carstle2,RefDist="CerioliBetaF") print(CarsTtl2) plot(CarsTtl2) ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals for characteristics # of 27 cars models. CarsIdt <- IData(Cars[1:8],VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration")) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, a reweighed # MCD estimate, and the classical 97.5% chi-squared quantile cut-offs. Carstle1 <- fulltle(CarsIdt) # Get and display the outliers using the classical 97.5% chi-squared quantile cut-offs. CarsOtl1 <- getIdtOutl(CarsIdt,Carstle1) print(CarsOtl1) plot(CarsOtl1) # Estimate parameters by the fast trimmed maximum likelihood estimator, # using a two-step procedure to select the trimming parameter, and a reweighed # based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions. Carstle2 <- fulltle(CarsIdt,rawMD2Dist="HardRockeAdjF") # Get and display the outliers using the 97.5 CarsTtl2 <- getIdtOutl(CarsIdt,Carstle2,RefDist="CerioliBetaF") print(CarsTtl2) plot(CarsTtl2) ## End(Not run)
IData creates IData objects from data frames of interval bounds or MidPoint/LogRange values of the interval-valued observations.
IData(Data, Seq = c("LbUb_VarbyVar", "MidPLogR_VarbyVar", "AllLb_AllUb", "AllMidP_AllLogR"), VarNames=NULL, ObsNames=row.names(Data), NbMicroUnits=integer(0))
IData(Data, Seq = c("LbUb_VarbyVar", "MidPLogR_VarbyVar", "AllLb_AllUb", "AllMidP_AllLogR"), VarNames=NULL, ObsNames=row.names(Data), NbMicroUnits=integer(0))
Data |
a data frame or matrix of interval bounds or MidPoint/LogRange values. |
Seq |
the format of ‘Data’ data frame. Available options are: |
VarNames |
An optional vector of names to be assigned to the Interval-Valued Variables. |
ObsNames |
An optional vector of names assigned to the individual observations. |
NbMicroUnits |
An integer vector with the number of micro data units by interval-valued observation (or an empty vector, if not applicable) |
Objects of class IData
describe a data set of ‘NObs’ observations on ‘NIVar’ Interval-valued variables. This function creates an interval-data object from a data-frame with either the lower and upper bounds of the observed intervals or by their midpoints and log-ranges.
ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) cat("Summary of the ChinaT IData object:\n") ; print(summary(ChinaT)) cat("ChinaT first ant last three observations:\n") print(head(ChinaT,n=3)) cat("\n...\n") print(tail(ChinaT,n=3))
ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) cat("Summary of the ChinaT IData object:\n") ; print(summary(ChinaT)) cat("ChinaT first ant last three observations:\n") print(head(ChinaT,n=3)) cat("\n...\n") print(tail(ChinaT,n=3))
A data-array of interval-valued data is an array where each of the NObs rows, corresponding to each entity under analysis, contains the observed intervals of the NIVar descriptive variables.
MidP
:A data-frame of the midpoints of the observed intervals
LogR
:A data-frame of the logarithms of the ranges of the observed intervals
ObsNames
:An optional vector of names assigned to the individual observations.
VarNames
:An optional vector of names to be assigned to the Interval-valued Variables.
NObs
:Number of entities under analysis (cases)
NIVar
:Number of interval variables
NbMicroUnits
:An integer vector with the number of micro data units by interval-valued observation (or an empty vector, if not applicable)
signature(object = "IData")
: show S4 method for the IData-class.
signature(x = "IData")
: returns the number of statistical units (observations).
signature(x = "IData")
: returns the number of of Interval-valued variables.
signature(x = "IData")
: returns a vector with the of number statistical units as first element, and the number of Interval-valued variables as second element.
signature(x = "IData")
: returns the row (entity) names for an object of class IData.
signature(x = "IData")
: returns column (variable) names for an object of class IData.
signature(x = "IData")
: returns column (variable) names for an object of class IData.
signature(Sdt = "IData")
: returns a data frame with MidPoints for an object of class IData.
signature(Sdt = "IData")
: returns a data frame with LogRanges for an object of class IData.
signature(Sdt = "IData")
: returns an data frame with Ranges for an object of class IData.
signature(Sdt = "IData")
: returns an integer vector with the number of micro data units by interval-valued observation for an object of class IData.
signature(x = "IData")
: head S4 method for the IData-class.
signature(x = "IData")
: tail S4 method for the IData-class.
signature(x = "IData")
: plot S4 methods for the IData-class.
signature(x = "IData")
: Maximum likelihood estimation.
signature(x = "IData")
: Fast trimmed maximum likelihood estimation.
signature(x = "IData")
: Exact trimmed maximum likelihood estimation.
signature(x = "IData")
: Robust estimation of distribution mixtures for interval-valued data.
signature(x = "IData")
: MANOVA tests on the interval-valued data.
signature(x = "IData")
: Linear Discriminant Analysis using maximum likelihood parameter estimates of Gaussian mixtures.
signature(x = "IData")
: Quadratic Discriminant Analysis using maximum likelihood parameter estimates of Gaussian mixtures.
signature(x = "IData")
: Linear Discriminant Analysis using robust estimates of location and scatter.
signature(x = "IData")
: Quadratic Discriminant Analysis using robust estimates of location and scatter.
signature(x = "IData")
: Discriminant Analysis using maximum likelihood parameter estimates of SkewNormal mixtures.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Noirhomme-Fraiture, M., Brito, P. (2011), Far Beyond the Classical Data Models: Symbolic Data Analysis. Statistical Analysis and Data Mining 4(2), 157–170.
IData
, AgrMcDt
, mle
, fasttle
, fulltle
, RobMxtDEst
,
MANOVA
, lda
, qda
, Roblda
, Robqda
IdtE contains estimation results for the models assumed for single distributions, or mixtures of distributions, underlying data sets of interval-valued entities.
ModelNames
:The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration (Case 1 through Case 4)
ModelType
:Indicates the model; currently, Gaussian or Skew-Normal distributions are implemented
ModelConfig
:Configuration of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Number of interval variables
SelCrit
:The model selection criterion; currently, AIC and BIC are implemented
logLiks
:The logarithms of the likelihood function for the different cases
AICs
:Value of the AIC criterion
BICs
:Value of the BIC criterion
BestModel
:Bestmodel indicates the best model according to the chosen selection criterion
SngD
:Boolean flag indicating whether a single or a mixture of distribution were estimated
signature(Sdt = "IdtE")
: Selects the best model according to the chosen selection criterion (currently, AIC or BIC)
signature(object = "IdtE")
: show S4 method for the IDtE-class
signature(object = "IdtE")
: summary S4 method for the IDtE-class
signature(Sdt = "IdtE")
: Performs statistical likelihood-ratio tests that evaluate the goodness-of-fit of a nested model against a more general one.
signature(Sdt = "IdtE")
: extracts the standard deviation estimates from objects of class IdtE.
signature(Sdt = "IdtE")
: extracts the value of the Akaike Information Criterion from objects of class IdtE.
signature(Sdt = "IdtE")
: extracts the value of the Bayesian Information Criterion from objects of class IdtE.
signature(Sdt = "IdtE")
: extracts the value of the maximised log-likelihood from objects of class IdtE.
signature(x = "IdtE")
: extracts the mean vector estimate from objects of class IdtE
signature(x = "IdtE")
: extracts the variance-covariance matrix estimate from objects of class IdtE
signature(x = "IdtE")
: extracts the correlation matrix estimate from objects of class IdtE
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
mle
, fasttle
, fulltle
, MANOVA
, RobMxtDEst
,
IData
Idtlda contains the results of Linear Discriminant Analysis for the interval data
prior
:Prior probabilities of class membership; if unspecified, the class proportions for the training set are used; if present, the probabilities should be specified in the order of the factor levels.
means
:Matrix with the mean vectors for each group
scaling
:Matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.
N
:Number of observations
CovCase
:Configuration case of the variance-covariance matrix: Case 1 through Case 4
signature(object = "Idtlda")
: Classifies interval-valued observations in conjunction with lda.
signature(object = "Idtlda")
: show S4 method for the IDdtlda-class
signature(object = "Idtlda")
: Returns the configuration case of the variance-covariance matrix
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
qda
, MANOVA
, Roblda
, Robqda
, snda
, IData
IdtMANOVA extends LRTest
directly, containing the results of MANOVA tests on the interval-valued data. This class is not used directly, but is the basis for different specializations according to the model assumed for the
distribution in each group. In particular, the following specializations of IdtMANOVA are currently implemented:
IdtClMANOVA
extends IdtMANOVA, assuming a classical (i.e., homoscedastic gaussian) setup.
IdtHetNMANOVA
extends IdtMANOVA, assuming a heteroscedastic gaussian set-up.
IdtLocSNMANOVA
extends IdtMANOVA, assuming a Skew-Normal location model set-up.
IdtLocNSNMANOVA
extends IdtMANOVA, assuming either a homoscedastic gaussian or Skew-Normal location model set-up.
IdtGenSNMANOVA
extends IdtMANOVA, assuming a Skew-Normal general model set-up.
IdtGenNSNMANOVA
extends IdtMANOVA, assuming either a heteroscedastic gaussian or Skew-Normal general model set-up.
NIVar
:Number of interval variables.
grouping
:Factor indicating the group to which each observation belongs to.
H0res
:Model estimates under the null hypothesis.
H1res
:Model estimates under the alternative hypothesis.
ChiSq
:Inherited from class LRTest
. Value of the Chi-Square statistics corresponding to the performed test.
df
:Inherited from class LRTest
. Degrees of freedom of the Chi-Square statistics.
pvalue
:Inherited from class LRTest
. p-value of the Chi-Square statistics value, obtained from the Chi-Square distribution with df degrees of freedom.
H0logLik
:Inherited from class LRTest
. Logarithm of the Likelihood function under the null hypothesis.
H1logLik
:Inherited from class LRTest
. Logarithm of the Likelihood function under the alternative hypothesis.
signature(object = "IdtMANOVA")
: show S4 method for the IdtMANOVA-classes.
signature(object = "IdtMANOVA")
: summary S4 method for the IdtMANOVA-classes.
signature(object = "IdtMANOVA")
: retrieves the model estimates under the null hypothesis.
signature(object = "IdtMANOVA")
: retrieves the model estimates under the alternative hypothesis.
signature(x = "IdtClMANOVA")
: Linear Discriminant Analysis using the estimated model parameters.
signature(x = "IdtLocNSNMANOVA")
: Linear Discriminant Analysis using the estimated model parameters.
signature(x = "IdtHetNMANOVA")
: Quadratic Discriminant Analysis using the estimated model parameters.
signature(x = "IdtGenNSNMANOVA")
: Quadratic Discriminant Analysis using the estimated model parameters.
signature(x = "IdtLocNSNMANOVA")
: Discriminant Analysis using maximum likelihood parameter estimates of SkewNormal mixtures assuming a "location" model (i.e., groups differ only in location parameters).
signature(x = "IdtGenSNMANOVA")
: Discriminant Analysis using maximum likelihood parameter estimates of SkewNormal mixtures assuming a general model (i.e., groups differ in all parameters).
signature(x = "IdtGenNSNMANOVA")
: Discriminant Analysis using maximum likelihood parameter estimates of SkewNormal mixtures assuming a general model (i.e., groups differ in all parameters).
Class LRTest
, directly.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012): "Modelling Interval Data with Normal and Skew-Normal Distributions". Journal of Applied Statistics, Volume 39, Issue 1, 3-20.
IdtMclust contains the results of fitting mixtures of Gaussian distributions to interval data represented by objects of class IData
.
call
:The matched call that created the IdtMclust object
data
:The IData data object
NObs
:Number of entities under analysis (cases)
NIVar
:Number of interval variables
SelCrit
:The model selection criterion; currently, AIC and BIC are implemented
Hmcdt
:Indicates whether the optimal model corresponds to a homoscedastic (TRUE) or a hetereocedasic (FALSE) setup
BestG:
The optimal number of mixture components.
BestC:
The configuration case of the variance-covariance matrix in the optimal model
logLiks
:The logarithms of the likelihood function for the different models tried
logLik
:The logarithm of the likelihood function for the optimal model
AICs
:The values of the AIC criterion for the different models tried
aic
:The value of the AIC criterion for the he optimal model
BICs
:The values of the BIC criterion for the different models tried
bic
:The value of the BIC criterion for the he optimal model
parameters
A list with the following components:
A vector whose kth component is the mixing proportion for the kth component of the mixture model.
The mean for each component. If there is more than one component, this is a matrix whose kth column is the mean of the kth component of the mixture model.
A three-dimensional array with the covariance estimates. If Hmcdt is FALSE (heteroscedastic setups) the third dimension levels run through the BestG mixture components, with one different covariance matrix for each level. Otherwise (homoscedastic setups), there is only one covariance matrix and the size of the third dimension equals one.
z:
A matrix whose [i,k]th entry is the probability that observation i in the test data belongs to the kth class.
classification:
The classification corresponding to z
, i.e. map(z)
.
allres:
A list with the detailed results for all models fitted.
signature(object = "IdtMclust")
: show S4 method for the IdtMclust-class
signature(object = "IdtMclust")
: summary S4 method for the IdtMclust-class
signature(x = "IdtMclust")
: retrieves the value of the parameter estimates for the obtained partition
signature(x = "IdtMclust")
: retrieves the value of the estimated mixing proportions for the obtained partition
signature(x = "IdtMclust")
: retrieves the value of the component means for the obtained partition
signature(x = "IdtMclust")
: retrieves the value of the estimated covariance matrices for the obtained partition
signature(x = "IdtMclust")
: retrieves the value of the estimated correlation matrices
signature(x = "IdtMclust")
: retrieves the individual class assignments for the obtained partition
signature(x = "IdtMclust")
: retrieves a string specifying the criterion used to find the best model and partition
signature(x = "IdtMclust")
: returns TRUE if an homecedastic model has been assumed, and FALSE otherwise
signature(x = "IdtMclust")
: returns the number of components selectd
signature(x = "IdtMclust")
: retruns the covariance configuration selected
signature(x = "IdtMclust")
: retrieves the estimates of the individual posterir probabilities for the obtained partition
signature(x = "IdtMclust")
: returns the value of the BIC criterion
signature(x = "IdtMclust")
: returns the value of the AIC criterion
signature(x = "IdtMclust")
: returns the value of the log-likelihood
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Brito, P., Duarte Silva, A. P. and Dias, J. G. (2015), Probabilistic Clustering of Interval Data. Intelligent Data Analysis 19(2), 293–313.
Idtmclust
, plotInfCrt
, pcoordplot
Performs Gaussian model based clustering for interval data
Idtmclust(Sdt, G = 1:9, CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","HomandHet"), control=EMControl())
Idtmclust(Sdt, G = 1:9, CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","HomandHet"), control=EMControl())
Sdt |
An IData object representing interval-valued entities. |
G |
An integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
control |
A list of control parameters for EM. The defaults are set by the call |
Mxt |
The type of Gaussian mixture assumed by Idtmclust. Alternatives are “Hom” (default) for homoscedastic mixtures, “Het” for heteroscedastic mixtures, and “HomandHet” for both homoscedastic and heteroscedastic mixtures. |
An object of class IdtMclust
providing the optimal (according to BIC) mixture model estimation.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Brito, P., Duarte Silva, A. P. and Dias, J. G. (2015), Probabilistic Clustering of Interval Data. Intelligent Data Analysis 19(2), 293–313.
Fraley, C., Raftery, A. E., Murphy, T. B. and Scrucca, L. (2012), mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. Technical Report No. 597, Department of Statistics, University of Washington.
IdtMclust
, EMControl
, EMControl
, plotInfCrt
, pcoordplot
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) print(LbyPIdt) #Fit homoscedastic Gaussian mixtures with up to nine components mclustres <- Idtmclust(LbyPIdt) plotInfCrt(mclustres,legpos="bottomright") print(mclustres) #Display the results of the best mixture according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) #Repeat the analysus with both homoscedastic and heteroscedastic mixtures up to six components mclustres1 <- Idtmclust(LbyPIdt,G=1:6,Mxt="HomandHet") plotInfCrt(mclustres1,legpos="bottomright") print(mclustres1) #Display the results of the best heteroscedastic mixture according to the BIC summary(mclustres1,parameters=TRUE,classification=TRUE,model="HetG2C2") ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) print(LbyPIdt) #Fit homoscedastic Gaussian mixtures with up to nine components mclustres <- Idtmclust(LbyPIdt) plotInfCrt(mclustres,legpos="bottomright") print(mclustres) #Display the results of the best mixture according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) #Repeat the analysus with both homoscedastic and heteroscedastic mixtures up to six components mclustres1 <- Idtmclust(LbyPIdt,G=1:6,Mxt="HomandHet") plotInfCrt(mclustres1,legpos="bottomright") print(mclustres1) #Display the results of the best heteroscedastic mixture according to the BIC summary(mclustres1,parameters=TRUE,classification=TRUE,model="HetG2C2") ## End(Not run)
IdtMxE extends the IdtE
class, assuming that the data can be characterized by a mixture of distributions, for instances considering partitions of entities into different groups.
grouping
:Factor indicating the group to which each observation belongs to
ModelNames
:Inherited from class IdtE
. The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; currently, Gaussian or Skew-Normal distributions are implemented.
ModelConfig
:Inherited from class IdtE
. Configuration of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Bestmodel indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to FALSE in objects of class "IdtMxE"
Ngrps
:Number of mixture components
Class IdtE
, directly.
No methods defined with class "IdtMxE" in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IdtE
, IdtSngDE
, IData
, MANOVA
, RobMxtDEst
IdtMxNandSNDE contains the results of a mixture model estimation; Normal an Skew-Normal models are considered, with the four different possible variance-covariance configurations.
NMod
:Estimates of the mixture model for the Gaussian case
SNMod
:Estimates of the mixture model for the Skew-Normal case
grouping
:Inherited from class IdtMxE
. Factor indicating the group to which each observation belongs to
ModelNames
:Inherited from class IdtE
. The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; currently, Gaussian or Skew-Normal distributions are implemented
ModelConfig
:Inherited from class IdtE
. Configuration case of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to FALSE in objects of class IdtMxNandSNDE
Ngrps
:Inherited from class IdtMxE
. Number of mixture components
Class IdtMxE
, directly.
Class IdtE
, by class IdtMxE
, distance 2.
No methods defined with class IdtMxNandSNDE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IdtE
, IdtMxE
, IdtSngNandSNDE
, MANOVA
, RobMxtDEst
, IData
IdtMxNDE contains the results of a mixture Normal model maximum likelihood parameter estimation, with the four different possible variance-covariance configurations.
Hmcdt
:Indicates whether we consider an homocedastic (TRUE) or a hetereocedasic model (FALSE)
mleNmuE
:Matrix with the maximum likelihood mean vectors estimates by group (each row refers to a group)
mleNmuEse
:Matrix with the maximum likelihood means' standard errors by group (each row refers to a group)
CovConfCases
:List of the considered configurations
grouping
:Inherited from class IdtMxE
. Factor indicating the group to which each observation belongs to
ModelNames
:Inherited from class IdtE
. The model acronym formed by a "N", indicating a Normal model, followed by the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; always set to "Normal" in objects of the IdtMxNDE class
ModelConfig
:Inherited from class IdtE
. Configuration case of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to FALSE in objects of class IdtMxNDE
Ngrps
:Inherited from class IdtMxE
. Number of mixture components
Class IdtMxE
, directly.
Class IdtE
, by class IdtMxE
, distance 2.
signature(x = "IdtMxtNDE")
: Linear Discriminant Analysis using the estimated model parameters.
signature(x = "IdtMxtNDE")
: Quadratic Discriminant Analysis using the estimated model parameters.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IdtE
, IdtMxE
, IdtMxNDRE
, IdtSngNDE
, IData
, MANOVA
IdtMxNDRE contains the results of a mixture Normal model robust parameter estimation, with the four different possible variance-covariance configurations.
Hmcdt
:Indicates whether we consider an homocedastic (TRUE) or a hetereocedasic model (FALSE)
RobNmuE
:Matrix with the robust mean vectors estimates by group (each row refers to a group)
CovConfCases
:List of the considered configurations
grouping
:Inherited from class IdtMxE
. Factor indicating the group to which each observation belongs to
ModelNames
:Inherited from class IdtE
. The model acronym formed by a "N", indicating a Normal model, followed by the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; always set to "Normal" in objects of the IdtMxNDRE class
ModelConfig
:Inherited from class IdtE
. Configuration case of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to FALSE in objects of class IdtMxNDRE
Ngrps
:Inherited from class IdtMxE
. Number of mixture components
rawSet
A vector with the trimmed subset elements used to compute the raw (not reweighted) MCD covariance estimate for the chosen configuration.
RewghtdSet
A vector with the final trimmed subset elements used to compute the fasttle estimates.
RobMD2
A vector with the robust squared Mahalanobis distances used to select the trimmed subset.
cnp2
A vector of length two containing the consistency correction factor and the finite sample correction factor of the final estimate of the covariance matrix.
raw.cov
A matrix with the raw MCD estimator used to compute the robust squared Mahalanobis distances of RobMD2.
raw.cnp2
A vector of length two containing the consistency correction factor and the finite sample correction factor of the raw estimate of the covariance matrix.
PerfSt
A a list with the following components:
RepSteps: A list with one component by Covariance Configuration, containing a vector with the number of refinement steps performed by the fasttle algorithm by replication.
RepLogLik: A list with one component by Covariance Configuration, containing a vector with the best log-likelihood found be fasttle algorithm by replication.
StpLogLik: A list with one component by Covariance Configuration, containing a matrix with the evolution of the log-likelihoods found be fasttle algorithm by replication and refinement step.
Class IdtMxE
, directly.
Class IdtE
, by class IdtMxE
, distance 2.
No methods defined with class IdtMxNDRE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
IdtE
, IdtMxE
, IdtMxNDE
, IdtSngNDRE
, RobMxtDEst
, IData
IdtMxSNDE contains the results of a mixture model estimation for the Skew-Normal model, with the four different possible variance-covariance configurations.
Hmcdt
:Indicates whether we consider an homoscedastic location model (TRUE) or a general model (FALSE)
CovConfCases
:List of the considered configurations
grouping
:Inherited from class IdtMxE
. Factor indicating the group to which each observation belongs to
ModelNames
:Inherited from class IdtE
. The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; currently, Gaussian or Skew-Normal distributions are implemented
ModelConfig
:Inherited from class IdtE
. Configuration case of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to FALSE in objects of class IdtMxSNDE
Ngrps
:Inherited from class IdtMxE
. Number of mixture components
Class IdtMxE
, directly.
Class IdtE
, by class IdtMxE
, distance 2.
No methods defined with class IdtMxSNDE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IdtE
, IdtMxE
, IdtSngSNDE
, MANOVA
, IData
IdtNandSNDE is a union of classes IdtSngNandSNDE
and IdtMxNandSNDE
, used for storing the estimation results of Normal and Skew-Normal modelisations for Interval Data.
signature(coef = "IdtNandSNDE")
: extracts parameter estimates from objects of class IdtNandSNDE
signature(x = "IdtNandSNDE")
: extracts standard errors from objects of class IdtNandSNDE
signature(x = "IdtNandSNDE")
: extracts an estimate of the variance-covariance matrix of the parameters estimators for objects of class IdtNandSNDE
signature(x = "IdtNandSNDE")
: extracts the mean vector estimate from objects of class IdtNandSNDE
signature(x = "IdtNandSNDE")
: extracts the variance-covariance matrix estimate from objects of class IdtNandSNDE
signature(x = "IdtNandSNDE")
: extracts the correlation matrix estimate from objects of class IdtNandSNDE
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IData
, mle
, fasttle
, fulltle
,
MANOVA
, RobMxtDEst
, IdtSngNandSNDE
, IdtMxNandSNDE
IdtNDE is a a union of classes IdtSngNDE
, IdtSngNDRE
, IdtMxNDE
and IdtMxNDRE
, used for storing the estimation results of Normal modelizations for Interval Data.
signature(coef = "IdtNDE")
: extracts parameter estimates from objects of class IdtNDE
signature(x = "IdtNDE")
: extracts standard errors from objects of class IdtNDE
signature(x = "IdtNDE")
: extracts an estimate of the variance-covariance matrix of the parameters estimators for objects of class IdtNDE
signature(x = "IdtNDE")
: extracts the mean vector estimate from objects of class IdtNDE
signature(x = "IdtNDE")
: extracts the variance-covariance matrix estimate from objects of class IdtNDE
signature(x = "IdtNDE")
: extracts the correlation matrix estimate from objects of class IdtNDE
signature(Idt = "IdtNDE")
: extracts the standard deviation estimates from objects of class IdtNDE.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IdtSngNDE
, IdtSngNDRE
, IdtMxNDE
, IdtMxNDRE
, IdtSNDE
, IData
, mle
, fasttle
, fulltle
, MANOVA
, RobMxtDEst
A description of interval-valued variable outliers found by the MAINT.Data function getIdtOutl
.
outliers
:A vector of indices of the interval data units flaged as outliers.
MD2
:A vector of squared robust Mahalanobis distances for all interval data units.
Nominal size of the null hypothesis that a given observation is not an outlier.
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-squared, and the Beta and F distributions proposed by Cerioli (2010).
Whether a multicomparison correction of the nominal size (eta) for the outliers tests was performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)).
Number of original observations in the original data set.
Number of total numerical variables (MidPoints and/or LogRanges) that may be responsible for the outliers.
Size of the subsets over which the trimmed likelihood was maximized when computing the robust Mahalanobis distances.
)
A logical vector indicanting which of the data units belong to the final trimmed subsetused to compute the tle estimates.
)
signature(object = "IdtOutl")
: show S4 method for the IdtOutl-class.
signature(x = "IdtOutl")
: plot S4 methods for the IdtOutl-class.
signature(x = "IdtOutl")
: retrieves the vector of squared robust Mahalanobis distances for all data units.
signature(x = "IdtOutl")
: retrieves the nominal size of the null hypothesis used to flag observations as outliers.
signature(x = "IdtOutl")
: retrieves the assumed reference distributions used to find cutoffs defining the observations assumed as outliers.
signature(x = "IdtOutl")
: retrieves the multicomparison correction used when flaging observations as outliers.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Plots robust Mahalanobis distances and outlier cut-offs for an object describing potential outliers in a interval-valued data set
## S4 method for signature 'IdtOutl,missing' plot(x, scale=c("linear","log"), RefDist=getRefDist(x), eta=geteta(x), multiCmpCor=getmultiCmpCor(x), ...)
## S4 method for signature 'IdtOutl,missing' plot(x, scale=c("linear","log"), RefDist=getRefDist(x), eta=geteta(x), multiCmpCor=getmultiCmpCor(x), ...)
x |
An IData object of class IdtOutl describing potential interval-valued ouliters. |
scale |
The scale of the axis for the robust Mahalanobis distances. |
RefDist |
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-squared, and the Beta and F distributions proposed by Cerioli (2010). By default uses the one selected in the creation of the object ‘x’. |
eta |
Nominal size of the null hypothesis that a given observation is not an outlier. By default uses the one selected in the creation of the object ‘x’. |
multiCmpCor |
Whether a multicomparison correction of the nominal size (eta) for the outliers tests was performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)). By default uses the one selected in the creation of the object ‘x’. |
... |
Further arguments to be passed to methods. |
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Journal of Computational and Graphical Statistics 14, 910–927.
Idtqda contains the results of Quadratic Discriminant Analysis for the interval data
prior
:Prior probabilities of class membership; if unspecified, the class proportions for the training set are used; if present, the probabilities should be specified in the order of the factor levels.
means
:Matrix with the mean vectors for each group
scaling
:A three-dimensional array. For each group, g, scaling[,,g] is a matrix which transforms interval-valued observations so that within-groups covariance matrix is spherical.
ldet
:Vector of half log determinants of the dispersion matrix.
lev
:Levels of the grouping factor
CovCase
:Configuration case of the variance-covariance matrix: Case 1 through Case 4
signature(object = "Idtqda")
: Classifies interval-valued observations in conjunction with qda.
signature(object = "Idtqda")
: show S4 method for the Idtqda-class
signature(object = "Idtqda")
: Returns the configuration case of the variance-covariance matrix
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
IdtSNDE is a class union of classes IdtSngSNDE
and IdtMxSNDE
, used for storing the estimation results of Skew-Normal modelizations for Interval Data.
signature(coef = "IdtSNDE")
: extracts parameter estimates from objects of class IdtSNDE
signature(x = "IdtSNDE")
: extracts standard errors from objects of class IdtSNDE
signature(x = "IdtSNDE")
: extracts an asymptotic estimate of the variance-covariance matrix of the paramenters estimators for objects of class IdtSNDE
signature(x = "IdtSNDE")
: extracts the mean vector estimate from objects of class IdtSNDE
signature(x = "IdtSNDE")
: extracts the variance-covariance matrix estimate from objects of class IdtSNDE
signature(x = "IdtSNDE")
: extracts the correlation matrix estimate from objects of class IdtSNDE
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IData
, mle
, MANOVA
, IdtSngSNDE
, IdtMxSNDE
, IdtNDE
IdtSNgenda contains the results of discriminant analysis for the interval data, based on a general Skew-Normal model.
prior
:Prior probabilities of class membership; if unspecified, the class proportions for the training set are used; if present, the probabilities should be specified in the order of the factor levels.
ksi
:Matrix with the direct location parameter ("ksi") estimates for each group.
eta
:Matrix with the direct scaled sekwness parameter ("eta") estimates for each group.
scaling
:For each group g, scaling[,,g] is a matrix which transforms interval-valued observations so that in each group the scale-association matrix ("Omega") is spherical.
mu
:Matrix with the centred location parameter ("mu") estimates for each group.
gamma1
:Matrix with the centred sekwness parameter ("gamma1") estimates for each group.
ldet
:Vector of half log determinants of the dispersion matrix.
lev
:Levels of the grouping factor.
CovCase
:Configuration case of the variance-covariance matrix: Case 1 through Case 4
signature(object = "IdtSNgenda")
: Classifies interval-valued observations in conjunction with snda.
signature(object = "IdtSNgenda")
: show S4 method for the IdtSNgenda-class
signature(object = "IdtSNgenda")
: Returns the configuration case of the variance-covariance matrix
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
IdtSngNandSNDE contains the results of a single class model estimation for the Normal and the Skew-Normal distributions, with the four different possible variance-covariance configurations.
NMod
:Estimates of the single class model for the Gaussian case
SNMod
:Estimates of the single class model for the Skew-Normal case
ModelNames
:Inherited from class IdtE
. The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; currently, Gaussian or Skew-Normal distributions are implemented
ModelConfig
:Inherited from class IdtE
. Configuration of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Bestmodel indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to TRUE in objects of class IdtSngNandSNDE
Class IdtSngDE
, directly.
Class IdtE
, by class IdtSngDE
, distance 2.
No methods defined with class IdtSngNandSNDE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IData
, IdtMxNandSNDE
, mle
, fasttle
, fulltle
Contains the results of a single class maximum likelihood estimation for the Normal distribution, with the four different possible variance-covariance configurations.
mleNmuE
:Vector with the maximum likelihood mean vectors estimates
mleNmuEse
:Vector with the maximum likelihood means' standard errors
CovConfCases
:List of the considered configurations
ModelNames
:Inherited from class IdtE
. The model acronym formed by a "N", indicating a Normal model, followed by the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; always set to "Normal" in objects of the IdtSngNDE class
ModelConfig
:Inherited from class IdtE
. Configuration of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Bestmodel indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to TRUE in objects of class IdtSngNDE
Class IdtSngDE
, directly.
Class IdtE
, by class IdtSngDE
, distance 2.
No methods defined with class IdtSngNDE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
IData
, mle
, IdtSngNDRE
, IdtSngSNDE
, IdtMxNDE
Contains the results of a single class robust estimation for the Normal distribution, with the four different possible variance-covariance configurations.
RobNmuE
:Matrix with the maximum likelihood mean vectors estimates
CovConfCases
:List of the considered configurations
ModelNames
:Inherited from class IdtE
. The model acronym formed by a "N", indicating a Normal model, followed by the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; always set to "Normal" in objects of the IdtSngNDRE class
ModelConfig
:Inherited from class IdtE
. Configuration of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Bestmodel indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to TRUE in objects of class IdtSngNDRE
rawSet
A vector with the trimmed subset elements used to compute the raw (not reweighted) MCD covariance estimate for the chosen configuration.
RewghtdSet
A vector with the final trimmed subset elements used to compute the tle estimates.
RobMD2
A vector with the robust squared Mahalanobis distances used to select the trimmed subset.
cnp2
A vector of length two containing the consistency correction factor and the finite sample correction factor of the final estimate of the covariance matrix.
raw.cov
A matrix with the raw MCD estimator used to compute the robust squared Mahalanobis distances of RobMD2.
raw.cnp2
A vector of length two containing the consistency correction factor and the finite sample correction factor of the raw estimate of the covariance matrix.
PerfSt
A a list with the following components:
RepSteps: A list with one component by Covariance Configuration, containing a vector with the number of refinement steps performed by the fasttle algorithm by replication.
RepLogLik: A list with one component by Covariance Configuration, containing a vector with the best log-likelihood found be fasttle algorithm by replication.
StpLogLik: A list with one component by Covariance Configuration, containing a matrix with the evolution of the log-likelihoods found be fasttle algorithm by replication and refinement step.
Class IdtSngDE
, directly.
Class IdtE
, by class IdtSngDE
, distance 2.
No methods defined with class IdtSngNDRE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
IData
, fasttle
, fulltle
, IdtSngNDE
, IdtMxNDRE
Contains the results of a single class maximum likelihood estimation for the Skew-Normal distribution, with the four different possible variance-covariance configurations.
CovConfCases
:List of the considered configurations
ModelNames
:The model acronym, indicating the model type (currently, N for Normal and SN for Skew-Normal), and the configuration Case (C1 to C4) for the covariance matrix
ModelNames
:Inherited from class IdtE
. The model acronym formed by a "SN", indicating a skew-Normal model, followed by the configuration (Case 1 through Case 4)
ModelType
:Inherited from class IdtE
. Indicates the model; always set to "SkewNormal" in objects of the IdtSngSNDE class
ModelConfig
:Inherited from class IdtE
. Configuration case of the variance-covariance matrix: Case 1 through Case 4
NIVar
:Inherited from class IdtE
. Number of interval variables
SelCrit
:Inherited from class IdtE
. The model selection criterion; currently, AIC and BIC are implemented
logLiks
:Inherited from class IdtE
. The logarithms of the likelihood function for the different cases
AICs
:Inherited from class IdtE
. Value of the AIC criterion
BICs
:Inherited from class IdtE
. Value of the BIC criterion
BestModel
:Inherited from class IdtE
. Indicates the best model according to the chosen selection criterion
SngD
:Inherited from class IdtE
. Boolean flag indicating whether a single or a mixture of distribution were estimated. Always set to TRUE in objects of class IdtSngSNDE
Class IdtSngDE
, directly.
Class IdtE
, by class IdtSngDE
, distance 2.
No methods defined with class IdtSngSNDE in the signature.
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
mle
, IData
, IdtSngNDE
, IdtMxSNDE
IdtSNlocda contains the results of Discriminant Analysis for the interval data, based on a location Skew-Normal model.
prior
:Prior probabilities of class membership; if unspecified, the class proportions for the training set are used; if present, the probabilities should be specified in the order of the factor levels.
ksi
:Matrix with the direct location parameter ("ksi") estimates for each group.
eta
:Vector with the direct scaled skewness parameter ("eta") estimates.
scaling
:Matrix which transforms observations to discriminant functions, normalized so that the within groups scale-association matrix ("Omega") is spherical.
mu
:Matrix with the centred location parameter ("mu") estimates for each group.
gamma1
:Vector with the centred skewness parameter ("gamma1") estimates.
N
:Number of observations.
CovCase
:Configuration case of the variance-covariance matrix: Case 1 through Case 4
signature(object = "IdtSNlocda")
: Classifies interval-valued observations in conjunction with snda.
signature(object = "IdtSNlocda")
: show S4 method for the IDdtlda-class
signature(object = "IdtSNlocda")
: Returns the configuration case of the variance-covariance matrix
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
lda performs linear discriminant analysis of Interval Data based on classic estimates of a mixture of Gaussian models.
## S4 method for signature 'IData' lda(x, grouping, prior="proportions", CVtol=1.0e-5, egvtol=1.0e-10, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtMxtNDE' lda(x, prior="proportions", selmodel=BestModel(x), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtClMANOVA' lda( x, prior="proportions", selmodel=BestModel(H1res(x)), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtLocNSNMANOVA' lda( x, prior="proportions", selmodel=BestModel(H1res(x)@NMod), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... )
## S4 method for signature 'IData' lda(x, grouping, prior="proportions", CVtol=1.0e-5, egvtol=1.0e-10, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtMxtNDE' lda(x, prior="proportions", selmodel=BestModel(x), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtClMANOVA' lda( x, prior="proportions", selmodel=BestModel(H1res(x)), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtLocNSNMANOVA' lda( x, prior="proportions", selmodel=BestModel(H1res(x)@NMod), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... )
x |
An object of class |
grouping |
Factor specifying the class for each observation. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
egvtol |
Tolerance level for the eigenvalues of the product of the inverse within by the between covariance matrices. When a eigenvalue has an absolute value below egvtol, it is considered to be zero. |
subset |
An index vector specifying the cases to be used in the analysis. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
silent |
A boolean flag indicating whether a warning message should be printed if the method fails. |
selmodel |
Selected model from a list of candidate models saved in object x. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
... |
Other named arguments. |
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
qda
, snda
, Roblda
, Robqda
, IData
, IdtMxtNDE
, IdtClMANOVA
,
IdtLocNSNMANOVA
, qda
, ConfMat
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Linear Discriminant Analysis ChinaT.lda <- lda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- linear discriminant analysis results:\n") print(ChinaT.lda) ldapred <- predict(ChinaT.lda,ChinaT)$class cat("lda Prediction results:\n") print(ldapred ) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,ldapred) ## Not run: #Estimate error rates by ten-fold cross-validation replicated 20 times CVlda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=CovCase(ChinaT.lda)) summary(CVlda[,,"Clerr"]) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Linear Discriminant Analysis ChinaT.lda <- lda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- linear discriminant analysis results:\n") print(ChinaT.lda) ldapred <- predict(ChinaT.lda,ChinaT)$class cat("lda Prediction results:\n") print(ldapred ) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,ldapred) ## Not run: #Estimate error rates by ten-fold cross-validation replicated 20 times CVlda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,CovCase=CovCase(ChinaT.lda)) summary(CVlda[,,"Clerr"]) ## End(Not run)
This data set consist of the lower and upper bounds of the intervals for four interval characteristics of the loans aggregated by their purpose. The original microdata is available at the Kaggle Data Science platform and consists of 887 383 loan records characterized by 75 descriptors. Among the large set of variables available, we focus on borrowers' income and account and loan information aggregated by the 14 loan purposes, wich are considered as the units of interest.
data(LoansbyPurpose_minmaxDt)
data(LoansbyPurpose_minmaxDt)
A data frame containing 14 observations on the following 8 variables.
The minimum, for the current loan purpose, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The maximum, for the current loan purpose, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The minimum, for the current loan purpose, of natural logarithm of the total credit revolving balance.
The maximum, for the current loan purpose, of natural logarithm of the total credit revolving balance.
The minimum, for the current loan purpose, of the number of open credit lines in the borrower's credit file.
The maximum, for the current loan purpose, of the number of open credit lines in the borrower's credit file.
The minimum, for the current loan purpose, of the total number of credit lines currently in the borrower's credit file.
The maximum, for the current loan purpose, of the total number of credit lines currently in the borrower's credit file.
https:www.kaggle.com/wendykan/lending-club-loan-data
This data set consist of the lower and upper bounds of the intervals for four interval characteristics for 35 risk levels (from A1 to G5) of loans. The original microdata is available at the Kaggle Data Science platform and consists of 887 383 loan records characterized by 75 descriptors. Among the large set of variables available, we focus on borrowers' income and account and loan information aggregated by the 35 risk levels wich are considered as the units of interest.
data(LoansbyRiskLvs_minmaxDt)
data(LoansbyRiskLvs_minmaxDt)
A data frame containing 35 observations on the following 8 variables.
The minimum, for the current risk category, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The maximum, for the current risk category, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The minimum, for the current risk category, of the interest rate on the loan.
The maximum, for the current risk category, of the interest rate on the loan.
The minimum, for the current risk category, of the number of open credit lines in the borrower's credit file.
The maximum, for the current risk category, of the number of open credit lines in the borrower's credit file.
The minimum, for the current risk category, of the total number of credit lines currently in the borrower's credit file.
The maximum, for the current risk category, of the total number of credit lines currently in the borrower's credit file.
https:www.kaggle.com/wendykan/lending-club-loan-data
This data set consist of the ten and ninety per cent quantiles of the intervals for four interval characteristics for 35 risk levels (from A1 to G5) of loans. The original microdata is available at the Kaggle Data Science platform and consists of 887 383 loan records characterized by 75 descriptors. Among the large set of variables available, we focus on borrowers' income and account and loan information aggregated by the 35 risk levels wich are considered as the units of interest.
data(LoansbyRiskLvs_qntlDt)
data(LoansbyRiskLvs_qntlDt)
A data frame containing 35 observations on the following 8 variables.
The ten percent quantile, for the current risk category, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The ninety percent quantile, for the current risk category, of natural logarithm of the self-reported annual income provided by the borrower during registration.
The ten percent quantile, for the current risk category, of the interest rate on the loan.
The ninety percent quantile, for the current risk category, of the interest rate on the loan.
The ten percent quantile, for the current risk category, of the number of open credit lines in the borrower's credit file.
The ninety percent quantile, for the current risk category, of the number of open credit lines in the borrower's credit file.
The ten percent quantile, for the current risk category, of the total number of credit lines currently in the borrower's credit file.
The ninety percent quantile, for the current risk category, of the total number of credit lines currently in the borrower's credit file.
https:www.kaggle.com/wendykan/lending-club-loan-data
LRTest contains the results of likelihood ratio tests
ChiSq
:Value of the Chi-Square statistics corresponding to the performed test
df
:Degrees of freedom of the Chi-Square statistics
pvalue
:p-value of the Chi-Square statistics value, obtained from the Chi-Square distribution with df degrees of freedom
H0logLik
:Logarithm of the Likelihood function under the null hypothesis
H1logLik
:Logarithm of the Likelihood function under the alternative hypothesis
signature(object = "LRTest")
: show S4 method for the LRTest-class
Pedro Duarte Silva <[email protected]>
Paula Brito <mpbrito.fep.up.pt>
Function MANOVA performs MANOVA tests based on likelihood ratios allowing for both Gaussian and Skew-Normal distributions and homoscedastic or heteroscedastic setups. Methods H0res and H1res retrieve the model estimates under the null and alternative hypothesis, and method show displays the MANOVA results.
MANOVA(Sdt, grouping, Model=c("Normal","SKNormal","NrmandSKN"), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","Loc","Gen"), CVtol=1.0e-5, k2max=1e6, OptCntrl=list(), onerror=c("stop","warning","silentNull"), ...) ## S4 method for signature 'IdtMANOVA' H0res(object) ## S4 method for signature 'IdtMANOVA' H1res(object) ## S4 method for signature 'IdtMANOVA' show(object)
MANOVA(Sdt, grouping, Model=c("Normal","SKNormal","NrmandSKN"), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","Loc","Gen"), CVtol=1.0e-5, k2max=1e6, OptCntrl=list(), onerror=c("stop","warning","silentNull"), ...) ## S4 method for signature 'IdtMANOVA' H0res(object) ## S4 method for signature 'IdtMANOVA' H1res(object) ## S4 method for signature 'IdtMANOVA' show(object)
object |
An object representing a MANOVA analysis on interval-valued units. |
Sdt |
An IData object representing interval-valued units. |
grouping |
Factor indicating the group to which each observation belongs to. |
Model |
The joint distribution assumed for the MidPoint and LogRanges. Current alternatives are “Normal” for Gaussian distributions, “SKNormal” for Skew-Normal and “NrmandSKN” for both Gaussian and Skew-Normal distributions. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
Mxt |
Indicates the type of mixing distributions to be considered. Current alternatives are “Hom” (homoscedastic) and “Het” (heteroscedastic) for Gaussian models, “Loc” (location model – groups differ only on their location parameters) and “Gen” “Loc” (general model – groups differ on all parameters) for Skew-Normal models. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
OptCntrl |
List of optional control parameters to be passed to the optimization routine. See the documentation of RepLOptim for a description of the available options. |
onerror |
Indicates whether an error in the optimization algorithm should stop the current call, generate a warning, or return silently a NULL object. |
... |
Other named arguments. |
An object of class IdtMANOVA, containing the estimation and test results.
#Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) #Classical (homoscedastic) MANOVA tests ManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg) cat("China, MANOVA by geografical regions results =\n") print(ManvChina) #Heteroscedastic MANOVA tests HetManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Mxt="Het") cat("China, heterocedastic MANOVA by geografical regions results =\n") print(HetManvChina) #Skew-Normal based MANOVA assuming the the groups differ only according to location parameters ## Not run: SKNLocManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Model="SKNormal",Mxt="Loc") cat("China, Skew-Normal MANOVA (location model) by geografical regions results =\n") print(SKNLocManvChina) #Skew-Normal based MANOVA assuming the the groups may differ in all parameters SKNGenManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Model="SKNormal",Mxt="Gen") cat("China, Skew-Normal MANOVA (general model) by geografical regions results =\n") print(SKNGenManvChina) ## End(Not run)
#Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) #Classical (homoscedastic) MANOVA tests ManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg) cat("China, MANOVA by geografical regions results =\n") print(ManvChina) #Heteroscedastic MANOVA tests HetManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Mxt="Het") cat("China, heterocedastic MANOVA by geografical regions results =\n") print(HetManvChina) #Skew-Normal based MANOVA assuming the the groups differ only according to location parameters ## Not run: SKNLocManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Model="SKNormal",Mxt="Loc") cat("China, Skew-Normal MANOVA (location model) by geografical regions results =\n") print(SKNLocManvChina) #Skew-Normal based MANOVA assuming the the groups may differ in all parameters SKNGenManvChina <- MANOVA(ChinaT,ChinaTemp$GeoReg,Model="SKNormal",Mxt="Gen") cat("China, Skew-Normal MANOVA (general model) by geografical regions results =\n") print(SKNGenManvChina) ## End(Not run)
Function MANOVAPermTest performs a MANOVA permutation test allowing for both Gaussian and Skew-Normal distributions and homoscedastic or heteroscedastic setups.
MANOVAPermTest(MANOVAres, Sdt, grouping, nrep=200, Model=c("Normal","SKNormal","NrmandSKN"), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","Loc","Gen"), CVtol=1.0e-5, k2max=1e6, OptCntrl=list(), onerror=c("stop","warning","silentNull"), ...)
MANOVAPermTest(MANOVAres, Sdt, grouping, nrep=200, Model=c("Normal","SKNormal","NrmandSKN"), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Hom","Het","Loc","Gen"), CVtol=1.0e-5, k2max=1e6, OptCntrl=list(), onerror=c("stop","warning","silentNull"), ...)
MANOVAres |
An object representing a MANOVA analysis on interval-valued entities. |
Sdt |
An IData object representing interval-valued entities. |
grouping |
Factor indicating the group to which each observation belongs to. |
nrep |
Number of random generated permutations used to approximate the null distribution of the likelihood ratio statistic. |
Model |
The joint distribution assumed for the MidPoint and LogRanges. Current alternatives are “Normal” for Gaussian, distributions, “SKNormal” for Skew-Normal and “NrmandSKN” for both Gaussian and Skew-Normal distributions. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
Mxt |
Indicates the type of mixing distributions to be considered. Current alternatives are “Hom” (homocedastic) and “Het” (heteroscedastic) for Gaussian models, “Loc” (location model – groups differ only on their location parameters) and “Gen” “Loc” (general model – groups differ on all parameters) for Skew-Normal models. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
OptCntrl |
List of optional control parameters to passed to the optimization routine. See the documentation of RepLOptim for a description of the available options. |
onerror |
Indicates whether an error in the optimization algorithm should stop the current call, generate a warning, or return silently a NULL object. |
... |
Other named arguments. |
Function MANOVAPermTest performs a MANOVA permutation test allowing for both Gaussian and Skew-Normal distributions and homoscedastic or heteroscedastic setups. This test is implemented by simulating the null distribution of the MANOVA likelihood ratio statistic, using many random permutations of the observation group labels. It is intended as an alternative of the classical Chi-squares based MANOVA likelihood ratio tests, when small sample sizes cast doubt on the applicability of the Chi-squared distribution. We note that this test may be computationally intensive, in particular when used for the Skw-Normal model.
the p-value of the MANOVA permutation test.
## Not run: #Perform a MANOVA of the AbaloneIdt data set, comparing the Abalone variable means # according to their age # Create an Interval-Data object containing the Length, Diameter, Height, Whole weight, # Shucked weight, Viscera weight (VW), and Shell weight (SeW) of 4177 Abalones, # aggregated by sex and age. # Note: The original micro-data (imported UCI Machine Learning Repository Abalone dataset) # is given in the AbaDF data frame, and the corresponding values of the sex by age combinations # is represented by the AbUnits factor. AbaloneIdt <- AgrMcDt(AbaDF,AbUnits) # Create a factor with three levels (Young, Adult and Old) for Abalones with respectively # less than 10 rings, between 11 and 18 rings, and more than 18 rings. Agestrg <- substring(rownames(AbaloneIdt),first=3) AbalClass <- factor(ifelse(Agestrg=="1-3"|Agestrg=="4-6"| Agestrg=="7-9","Young", ifelse(Agestrg=="10-12"|Agestrg=="13-15"| Agestrg=="16-18","Adult","Old") ) ) #Perform a classical MANOVA, computing the p-value from the asymptotic Chi-squared distribution # of the Wilk's lambda statistic MANOVAres <- MANOVA(AbaloneIdt,AbalClass) summary(MANOVAres) #Find a finite sample p-value of the test statistic, using a permutation test. MANOVAPermTest(MANOVAres,AbaloneIdt,AbalClass) ## End(Not run)
## Not run: #Perform a MANOVA of the AbaloneIdt data set, comparing the Abalone variable means # according to their age # Create an Interval-Data object containing the Length, Diameter, Height, Whole weight, # Shucked weight, Viscera weight (VW), and Shell weight (SeW) of 4177 Abalones, # aggregated by sex and age. # Note: The original micro-data (imported UCI Machine Learning Repository Abalone dataset) # is given in the AbaDF data frame, and the corresponding values of the sex by age combinations # is represented by the AbUnits factor. AbaloneIdt <- AgrMcDt(AbaDF,AbUnits) # Create a factor with three levels (Young, Adult and Old) for Abalones with respectively # less than 10 rings, between 11 and 18 rings, and more than 18 rings. Agestrg <- substring(rownames(AbaloneIdt),first=3) AbalClass <- factor(ifelse(Agestrg=="1-3"|Agestrg=="4-6"| Agestrg=="7-9","Young", ifelse(Agestrg=="10-12"|Agestrg=="13-15"| Agestrg=="16-18","Adult","Old") ) ) #Perform a classical MANOVA, computing the p-value from the asymptotic Chi-squared distribution # of the Wilk's lambda statistic MANOVAres <- MANOVA(AbaloneIdt,AbalClass) summary(MANOVAres) #Find a finite sample p-value of the test statistic, using a permutation test. MANOVAPermTest(MANOVAres,AbaloneIdt,AbalClass) ## End(Not run)
S4 methods for function mean. These methods extract estimates of mean vectors for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' mean(x) ## S4 method for signature 'IdtSNDE' mean(x) ## S4 method for signature 'IdtNandSNDE' mean(x) ## S4 method for signature 'IdtMxNDE' mean(x) ## S4 method for signature 'IdtMxSNDE' mean(x)
## S4 method for signature 'IdtNDE' mean(x) ## S4 method for signature 'IdtSNDE' mean(x) ## S4 method for signature 'IdtNandSNDE' mean(x) ## S4 method for signature 'IdtMxNDE' mean(x) ## S4 method for signature 'IdtMxSNDE' mean(x)
x |
An object representing a model fitted to interval data. |
For the IdtNDE
, IdtSNDE
and IdtNandSNDE
methods or IdtMxNDE
, IdtMxSNDE
methods with slot “Hmcdt” equal to TRUE: a matrix with the estimated correlations.
For the IdtMxNDE
, and IdtMxSNDE
methods with slot “Hmcdt” equal to FALSE: a three-dimensional array with a matrix with the estimated correlations for each group at each level of the third dimension.
Performs maximum likelihood estimation for parametric models of interval data
## S4 method for signature 'IData' mle(Sdt, Model="Normal", CovCase="AllC", SelCrit=c("BIC","AIC"), k2max=1e6, OptCntrl=list(), ...)
## S4 method for signature 'IData' mle(Sdt, Model="Normal", CovCase="AllC", SelCrit=c("BIC","AIC"), k2max=1e6, OptCntrl=list(), ...)
Sdt |
An IData object representing interval-valued units. |
Model |
The joint distribution assumed for the MidPoint and LogRanges. Current alternatives are “Normal” for Gaussian distributions, “SNNormal” for Skew-Normal and “NrmandSKN” for both Gaussian and Skew-Normal distributions. |
CovCase |
Configuration of the variance-covariance matrix: The string “AllC” for all possible configurations (default), or a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
OptCntrl |
List of optional control parameters to be passed to the optimization routine. See the documentation of RepLOptim for a description of the available options. |
... |
Other named arguments. |
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012): "Modelling Interval Data with Normal and Skew-Normal Distributions". Journal of Applied Statistics, Volume 39, Issue 1, 3-20.
# Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate parameters by maximum likelihood assuming a Gaussian distribution ChinaE <- mle(ChinaT) cat("China maximum likelhiood estimation results =\n") print(ChinaE) cat("Standard Errors of Estimators:\n") print(stdEr(ChinaE))
# Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate parameters by maximum likelihood assuming a Gaussian distribution ChinaE <- mle(ChinaT) cat("China maximum likelhiood estimation results =\n") print(ChinaE) cat("Standard Errors of Estimators:\n") print(stdEr(ChinaE))
A interval-valued data set containing 142 units and four interval-valued variables (dep_delay, arr_delay, air_time and distance), created from from the flights data set in the R package nycflights13 (on-time data for all flights that departed the JFK, LGA or EWR airports in 2013), after removing all rows with missing observations, and aggregating by month and carrier.
data(nycflights)
data(nycflights)
FlightsDF: A data frame containing the original 327346 valid (i.e. with non missing values) flights from the nycflights13 package, described by the 4 variables: dep_delay, arr_delay, air_time and distance.
FlightsUnits: A factor with 327346 observations and 142 levels, indicating the month by carrier combination to which each orginal flight belongs to.
FlightsIdt: An IData object with 142 observations and 4 interval-valued variables, describing the intervals formed by agregating the FlightsDF microdata by the 0.05 and 0.95 quantiles of the subsamples formed by FlightsUnits factor.
Method pcoordplot displays a parallel coordinates plot, representing the results stored in an IdtMclust-method object.
## S4 method for signature 'IdtMclust' pcoordplot(x,title="Parallel Coordinate Plot", Seq=c("AllMidP_AllLogR","MidPLogR_VarbyVar"), model ="BestModel", legendpar=list(), ...)
## S4 method for signature 'IdtMclust' pcoordplot(x,title="Parallel Coordinate Plot", Seq=c("AllMidP_AllLogR","MidPLogR_VarbyVar"), model ="BestModel", legendpar=list(), ...)
x |
An object of type “IdtMclust” representing the the clusterig results of an Interval-valued data set obtainde by the function “IdtMclust”. |
title |
The title of the plot. |
Seq |
The ordering of the coordinates in the plot. Available options are: |
model |
A character vector specifying the the model whose solution is to be displayed. |
legendpar |
A named list with graphical parameters for the plot legend. Currently only the base R ‘cex.main’ and ‘cex.lab’ parameters are implemented. |
... |
Graphical arguments to be passed to methods |
IdtMclust, Idtmclust
, plotInfCrt
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) #Fit homoscedastic Gaussian mixtures with up to ten components mclustres <- Idtmclust(LbyPIdt,G=1:10) plotInfCrt(mclustres,legpos="bottomright") #Display the results of the best mixture according to the BIC pcoordplot(mclustres) pcoordplot(mclustres,model="HomG6C1") pcoordplot(mclustres,model="HomG4C1") ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) #Fit homoscedastic Gaussian mixtures with up to ten components mclustres <- Idtmclust(LbyPIdt,G=1:10) plotInfCrt(mclustres,legpos="bottomright") #Display the results of the best mixture according to the BIC pcoordplot(mclustres) pcoordplot(mclustres,model="HomG6C1") pcoordplot(mclustres,model="HomG4C1") ## End(Not run)
S4 methods for function plot. As in the generic plot S3 ‘graphics’ method, these methods plot Interval-valued data contained in IData objects.
## S4 method for signature 'IData,IData' plot(x, y, type=c("crosses","rectangles"), append=FALSE, ...) ## S4 method for signature 'IData,missing' plot(x, casen=NULL, layout=c("vertical","horizontal"), append=FALSE, ...)
## S4 method for signature 'IData,IData' plot(x, y, type=c("crosses","rectangles"), append=FALSE, ...) ## S4 method for signature 'IData,missing' plot(x, casen=NULL, layout=c("vertical","horizontal"), append=FALSE, ...)
x |
An object of type IData representing the values of an Interval-value variable. |
y |
An object of type IData representing the values of a second Interval-value variable, to be displayed along y (vertical) coordinates. |
type |
What type of plot should de drawn. Alternatives are "crosses" (default) and "rectangles". |
append |
A boolean flag indicating if the interval-valued variables should be displayed in a new plot, or added to an existing plot. |
casen |
An optional character string with the case names. |
layout |
The axes along which the interval-valued variables be displayed. Alternatives are "vertical" (default) and "horizontal". |
... |
Graphical arguments to be passed to methods. |
## Not run: # Create an Interval-Data object containing the Length, Diameter, Height, Whole weight, # Shucked weight, Viscera weight (VW), and Shell weight (SeW) of 4177 Abalones, # aggregated by sex and age. # Note: The original micro-data (imported UCI Machine Learning Repository Abalone dataset) # is given in the AbaDF data frame, and the corresponding values of the sex by age combinations # is represented by the AbUnits factor. AbaloneIdt <- AgrMcDt(AbaDF,AbUnits) # Dispaly a plot of the Length versus the Whole_weight interval variables plot(AbaloneIdt[,"Length"],AbaloneIdt[,"Whole_weight"]) plot(AbaloneIdt[,"Length"],AbaloneIdt[,"Whole_weight"],type="rectangles") # Display the Abalone lengths using different colors to distinguish the Abalones age # (measured by the number of rings) # Create a factor with three levels (Young, Adult and Old) for Abalones with # respectively less than 10 rings, between 11 and 18 rings, and more than 18 rings. Agestrg <- substring(rownames(AbaloneIdt),first=3) AbalClass <- factor(ifelse(Agestrg=="1-3"|Agestrg=="4-6"| Agestrg=="7-9","Young", ifelse(Agestrg=="10-12"|Agestrg=="13-15"| Agestrg=="16-18","Adult","Old") ) ) plot(AbaloneIdt[AbalClass=="Young","Length"],col="blue",layout="horizontal") plot(AbaloneIdt[AbalClass=="Adult","Length"],col="green",layout="horizontal",append=TRUE) plot(AbaloneIdt[AbalClass=="Old","Length"],col="red",layout="horizontal",append=TRUE) legend("bottomleft",legend=c("Young","Adult","Old"),col=c("blue","green","red"),lty=1) ## End(Not run)
## Not run: # Create an Interval-Data object containing the Length, Diameter, Height, Whole weight, # Shucked weight, Viscera weight (VW), and Shell weight (SeW) of 4177 Abalones, # aggregated by sex and age. # Note: The original micro-data (imported UCI Machine Learning Repository Abalone dataset) # is given in the AbaDF data frame, and the corresponding values of the sex by age combinations # is represented by the AbUnits factor. AbaloneIdt <- AgrMcDt(AbaDF,AbUnits) # Dispaly a plot of the Length versus the Whole_weight interval variables plot(AbaloneIdt[,"Length"],AbaloneIdt[,"Whole_weight"]) plot(AbaloneIdt[,"Length"],AbaloneIdt[,"Whole_weight"],type="rectangles") # Display the Abalone lengths using different colors to distinguish the Abalones age # (measured by the number of rings) # Create a factor with three levels (Young, Adult and Old) for Abalones with # respectively less than 10 rings, between 11 and 18 rings, and more than 18 rings. Agestrg <- substring(rownames(AbaloneIdt),first=3) AbalClass <- factor(ifelse(Agestrg=="1-3"|Agestrg=="4-6"| Agestrg=="7-9","Young", ifelse(Agestrg=="10-12"|Agestrg=="13-15"| Agestrg=="16-18","Adult","Old") ) ) plot(AbaloneIdt[AbalClass=="Young","Length"],col="blue",layout="horizontal") plot(AbaloneIdt[AbalClass=="Adult","Length"],col="green",layout="horizontal",append=TRUE) plot(AbaloneIdt[AbalClass=="Old","Length"],col="red",layout="horizontal",append=TRUE) legend("bottomleft",legend=c("Young","Adult","Old"),col=c("blue","green","red"),lty=1) ## End(Not run)
Method plotInfCrt displays a plot representing the values of an appropriate information criterion (currently either BIC or AIC) for the models whose results are stored in an IdtMclust-method object. A supplementary short output message prints the values of the chosen criterion for the 'nprin' best models.
## S4 method for signature 'IdtMclust' plotInfCrt(object, crt=object@SelCrit, legpos="right", nprnt=5, legendout=TRUE, outlegsize="adjstoscreen", outlegdisp="adjstoscreen", legendpar=list(), ...)
## S4 method for signature 'IdtMclust' plotInfCrt(object, crt=object@SelCrit, legpos="right", nprnt=5, legendout=TRUE, outlegsize="adjstoscreen", outlegdisp="adjstoscreen", legendpar=list(), ...)
object |
An object of type “IdtMclust” representing the the clusterig results of an Interval-valued data set obtained by the function “IdtMclust”. |
crt |
The information criteria whose values are to be displayed. |
legpos |
Legend position. Alternatives are “right” (default), “left”, “bottomright”, “bottomleft”, “topright” and “topleft” . |
nprnt |
Number of solutions for which the value of the information criterio should be printed in an suplmentary short output message. |
legendout |
A boolean flag indicating if the legend should be placed outside (default) or inside the main plot. |
outlegsize |
The size (in inches) to be reserved for a legend placed outside the main plot, or the string “adjstoscreen” (default) for an automatic adjustment of the plot and legend sizes. |
outlegdisp |
The displacement (as a percentage of the main plot size) of the outer margin for a legend placed outside the main plot, or the string “adjstoscreen” (default) for an automatic adjustment of the legend position. |
legendpar |
A named list with graphical parameters for the plot legend. |
... |
Graphical arguments to be passed to methods. |
IdtMclust, Idtmclust
, pcoordplot
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) #Fit homoscedastic and heteroscedastic mixtures up to Gaussian mixtures with up to seven components mclustres <- Idtmclust(LbyPIdt,G=1:7,Mxt="HomandHet") #Compare de model fit according to the BIC plotInfCrt(mclustres,legpos="bottomleft") #Display the results of the best three mixtures according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) summary(mclustres,parameters=TRUE,classification=TRUE,model="HetG2C2") summary(mclustres,parameters=TRUE,classification=TRUE,model="HomG6C1") pcoordplot(mclustres,model="HomG6C1") ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals of loan data # (from the Kaggle Data Science platform) aggregated by loan purpose LbyPIdt <- IData(LoansbyPurpose_minmaxDt, VarNames=c("ln-inc","ln-revolbal","open-acc","total-acc")) #Fit homoscedastic and heteroscedastic mixtures up to Gaussian mixtures with up to seven components mclustres <- Idtmclust(LbyPIdt,G=1:7,Mxt="HomandHet") #Compare de model fit according to the BIC plotInfCrt(mclustres,legpos="bottomleft") #Display the results of the best three mixtures according to the BIC summary(mclustres,parameters=TRUE,classification=TRUE) pcoordplot(mclustres) summary(mclustres,parameters=TRUE,classification=TRUE,model="HetG2C2") summary(mclustres,parameters=TRUE,classification=TRUE,model="HomG6C1") pcoordplot(mclustres,model="HomG6C1") ## End(Not run)
qda performs quadratic discriminant analysis of Interval Data based on classic estimates of a mixture of Gaussian models.
## S4 method for signature 'IData' qda( x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtMxtNDE' qda(x, prior="proportions", selmodel=BestModel(x), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtHetNMANOVA' qda( x, prior="proportions", selmodel=BestModel(H1res(x)), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenNSNMANOVA' qda( x, prior="proportions", selmodel=BestModel(H1res(x)@NMod), silent=FALSE, k2max=1e6, ... )
## S4 method for signature 'IData' qda( x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtMxtNDE' qda(x, prior="proportions", selmodel=BestModel(x), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtHetNMANOVA' qda( x, prior="proportions", selmodel=BestModel(H1res(x)), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenNSNMANOVA' qda( x, prior="proportions", selmodel=BestModel(H1res(x)@NMod), silent=FALSE, k2max=1e6, ... )
x |
An object of class |
grouping |
Factor specifying the class for each observation. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
subset |
An index vector specifying the cases to be used in the analysis. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
silent |
A boolean flag indicating wether a warning message should be printed if the method fails. |
selmodel |
Selected model from a list of candidate models saved in object x. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
... |
Other named arguments. |
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
lda
, snda
, Roblda
, Robqda
, IData
, IdtMxtNDE
, IdtHetNMANOVA
,
IdtGenNSNMANOVA
, ConfMat
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Quadratic Discriminant Analysis ChinaT.qda <- qda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- qda discriminant analysis results:\n") print(ChinaT.qda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.qda,ChinaT)$class) ## Not run: #Estimate error rates by ten-fold cross-validation replicated 20 times CVqda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=CovCase(ChinaT.qda)) summary(CVqda[,,"Clerr"]) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Quadratic Discriminant Analysis ChinaT.qda <- qda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- qda discriminant analysis results:\n") print(ChinaT.qda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.qda,ChinaT)$class) ## Not run: #Estimate error rates by ten-fold cross-validation replicated 20 times CVqda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,CovCase=CovCase(ChinaT.qda)) summary(CVqda[,,"Clerr"]) ## End(Not run)
p-quantiles of the Hardin and Rocke (2005) scaled F distribution for squared Mahalanobis distances based on raw MCD covariance estimators
qHardRoqF(p, nobs, nvar, h=floor((nobs+nvar+1)/2), adj=TRUE, lower.tail=TRUE, log.p=FALSE)
qHardRoqF(p, nobs, nvar, h=floor((nobs+nvar+1)/2), adj=TRUE, lower.tail=TRUE, log.p=FALSE)
p |
Vector of probabilities. |
nobs |
Number of observations used in the computation of the raw MCD Mahalanobis squared distances. |
nvar |
Number of variables used in the computation of the raw MCD Mahalanobis squared distances. |
h |
Number of observations kept in the computation of the raw MCD estimate. |
adj |
logical; if TRUE (default) returns the quantile of the adjusted distribution. Otherwise returns the quantile of the asymptotic distribution. |
lower.tail |
logical; if TRUE (default), probabilities are P(X <= x) otherwise, P(X > x) |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The quantile of the appropriate scaled F distribution.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances.
Journal of Computational and Graphical Statistics 14, 910–927.
‘RepLOptim’ Tries to minimize a function calling local optimizers several times from different random starting points.
RepLOptim(start, parsd, fr, gr=NULL, inphess=NULL, ..., method="nlminb", lower=NULL, upper=NULL, rethess=FALSE, parmstder=FALSE, control=list())
RepLOptim(start, parsd, fr, gr=NULL, inphess=NULL, ..., method="nlminb", lower=NULL, upper=NULL, rethess=FALSE, parmstder=FALSE, control=list())
start |
Vector of starting points used in the first call of the local optimizer. |
parsd |
Vector of standard deviations for the parameter distribution generating starting points for the local optimizer. |
fr |
The function to be minimized. If method is neither “nlminb” or “L-BFGS-B”, fr should accept a lbound and an ubound arguments for the parameter bounds, and should enforce these bounds before calling the local optimization routine. |
gr |
A function to return the gradient for the “nlminb”, “BFGS”, “CG” and L-BFGS-B methods. If it is ‘NULL’, a finite-difference approximation will be used. For the “SANN” method it specifies a function to generate a new candidate point. If it is ‘NULL’ a default Gaussian Markov kernel is used. |
inphess |
A function to return the hessian for the “nlminb” method. Must return a square matrix of order ‘length(parmean)’ with the different hessian elements in its lower triangle. It is ignored if method component of the control list is not set to its “nlminb” default. |
... |
Further arguments to be passed to ‘fr’, ‘gr’ and ‘inphess’. |
method |
The method to be used. See ‘Details’. |
lower |
Vector of parameter lower bounds. Set to ‘-Inf’ (no bounds) by default. |
upper |
Vector of parameter upper bounds. Set to ‘Inf’ (no bounds) by default. |
rethess |
Boolean flag indicating whether a numerically evaluated hessian matrix at the optimum should be computed and returned. Not available for the “nlminb” method. |
parmstder |
Boolean flag indicating whether parameter assymptotic standard errors based on the inverse hessian approximation to the Fisher information matrix should be computed and returned. Only available if hessian is set to TRUE and if a local miminum with a positive-definite hessian was indeed found. This requirement may fail if ‘nrep’ and ‘niter’ (and maybe ‘neval’) are not large enough, and for non-trivial problems of moderate or high dimensionality may never be satisfied because of numerical difficulties. |
control |
A list of control parameters. See below for details. |
‘RepLOptim’ Tries to minimize a function by calling local optimizers several times from different starting points. The starting point used in the first call the the local optimizer is the value of the argument ‘start’. Subsquent calls use starting points generated from uniform distributions of independent variates with means equal to the current best parameter values and standard deviations equal to the values of the argument ‘parsd’. If parameter bounds are specified and the uniform limits implied by ‘parsd’ violate those bounds, these limits are replaced by the corresponding bounds.
The choice of the local optimizer is made by value of the ‘method’ argument. This argument can be a function object implementing the optimizer or a string describing an available R method. In the latter case current alternatives are: “nlminb” (default) for the ‘nlminb’ port routine, “nlm” for the ‘nlm’ function and “Nelder-Mead”, “L-BFGS-B”, “CG”, “L-BFGS-B” and “SANN” for the corresponding methods of the ‘optim’ function.
Arguments for controling the behaviour of the local optimizer can be specified as components of control
list. This list can include any of the following components:
Maximum time of repetions of the same minimum objective value, before RepLOptim is stoped and the current best solution is returned. By default set to 2.
Maximum number of times the local optimizer is called without improvements in the minimum objective value, before RepLOptim is stopped and the current best solution is returned. By default set to 50.
Maximum number of times the local optimizer is called and returns a valid solution before RepLOptim is stoped and the current best solution is returned. By default set to 250.
Total maximum number of replications (including those leading to non-valid solutions) performed. By default equals ten times the value of maxreplic. Ignored when objbnd is set to ‘Inf’.
Maximum number of iterations performed in each call to the local optimizer. By default set to 500 except with the “SANN” mehtod, when by default is set to 1500.
Maximum number of function evaluations (nlminb method only) performed in each call to the nlminb optimizer. By defaults set to 1000.
The relative convergence tolerance of the local optimizer. The local optimizer stops if it is unable to reduce the value by a factor of ‘RLOtol *(abs(val) + reltol)’ at a step. Ignored when method is set to “nlm”. By default set to the square root of the computer precision, i.e. to ‘sqrt(.Machine$double.eps)’.
Numerical tolerance used to ensure that the hessian is non-singular. If the last eigenvalue of the hessian is positive but the ratio between it and the first eigenvalue is below HesEgtol the hessian is considered to be semi-definite and the parameter assymptotic standard errors are not computed. By default set to the square root of the computer precision, i.e. to ‘sqrt(.Machine$double.eps)’.
Upper bound for the objective. Only solutions leading to objective values below objbnd are considered as valid.
A list with the following components:
par |
The best result found for the parameter vector. |
val |
The best value (minimum) found for the function fr. |
vallist |
A vector with the best values found for each starting point. |
iterations |
Number the iterations performed by the local optimizer in the call that generated the best result. |
vallis |
A vector with the best values found for each starting point. |
counts |
number of times the function fr was evaluated in the call that generated the result returned. |
convergence |
Code with the convergence status returned by the local optimizer. |
message |
Message generated by the local optimizer. |
hessian |
Numerically evaluated hessian of fr at the result returned. Only returned when the parameter hessian is set to TRUE. |
hessegval |
Eigenvalues of the hessian matrix. Used to confirm if a local minimum was indeed found. Only returned when the parameter hessian is set to TRUE. |
stderrors |
Assymptotic standard deviations of the parameters based on the observed information matrix. Only returned when the parse parameter is set to true and the hessian is indeed positive definite. |
A. Pedro Duarte Silva
Roblda and Robqda perform linear and quadratic discriminant analysis of Interval Data based on robust estimates of location and scatter.
## S4 method for signature 'IData' Roblda( x, grouping, prior="proportions", CVtol=1.0e-5, egvtol=1.0e-10, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, CovEstMet=c("Pooled","Globdev"), SngDMet=c("fasttle","fulltle"), k2max=1e6, Robcontrol=RobEstControl(), ... ) ## S4 method for signature 'IData' Robqda( x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, SngDMet=c("fasttle","fulltle"), k2max=1e6, Robcontrol=RobEstControl(), ... )
## S4 method for signature 'IData' Roblda( x, grouping, prior="proportions", CVtol=1.0e-5, egvtol=1.0e-10, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, CovEstMet=c("Pooled","Globdev"), SngDMet=c("fasttle","fulltle"), k2max=1e6, Robcontrol=RobEstControl(), ... ) ## S4 method for signature 'IData' Robqda( x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, SngDMet=c("fasttle","fulltle"), k2max=1e6, Robcontrol=RobEstControl(), ... )
x |
An object of class |
grouping |
Factor specifying the class for each observation. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
egvtol |
Tolerance level for the eigenvalues of the product of the inverse within by the between covariance matrices. When a eigenvalue has an absolute value below egvtol, it is considered to be zero. |
subset |
An index vector specifying the cases to be used in the analysis. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
silent |
A boolean flag indicating wether a warning message should be printed if the method fails. |
CovEstMet |
Method used to estimate the common covariance matrix in |
SngDMet |
Algorithm used to find the robust estimates of location and scatter. Alternatives are “fasttle” (default) and “fulltle”. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
Robcontrol |
A control object (S4) of class |
... |
Other named arguments. |
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
lda
, qda
, snda
, IData
, RobEstControl
,codeConfMat
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Robust Linear Discriminant Analysis ## Not run: ChinaT.rlda <- Roblda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- robust lda discriminant analysis results:\n") print(ChinaT.rlda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rlda,ChinaT)$class) #Estimate error rates by ten-fold cross-validation with 5 replications CVrlda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Roblda,CovCase=CovCase(ChinaT.rlda), CVrep=5) summary(CVrlda[,,"Clerr"]) #Robust Quadratic Discriminant Analysis ChinaT.rqda <- Robqda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- robust qda discriminant analysis results:\n") print(ChinaT.rqda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rqda,ChinaT)$class) #Estimate error rates by ten-fold cross-validation with 5 replications CVrqda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Robqda,CovCase=CovCase(ChinaT.rqda), CVrep=5) summary(CVrqda[,,"Clerr"]) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) #Robust Linear Discriminant Analysis ## Not run: ChinaT.rlda <- Roblda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- robust lda discriminant analysis results:\n") print(ChinaT.rlda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rlda,ChinaT)$class) #Estimate error rates by ten-fold cross-validation with 5 replications CVrlda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Roblda,CovCase=CovCase(ChinaT.rlda), CVrep=5) summary(CVrlda[,,"Clerr"]) #Robust Quadratic Discriminant Analysis ChinaT.rqda <- Robqda(ChinaT,ChinaTemp$GeoReg) cat("Temperatures of China -- robust qda discriminant analysis results:\n") print(ChinaT.rqda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rqda,ChinaT)$class) #Estimate error rates by ten-fold cross-validation with 5 replications CVrqda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Robqda,CovCase=CovCase(ChinaT.rqda), CVrep=5) summary(CVrqda[,,"Clerr"]) ## End(Not run)
This function will create a control object of class RobEstControl
containing the control parameters for the robust estimation functions fasttle
,
RobMxtDEst
, Roblda
and Robqda
.
RobEstControl(alpha=0.75, nsamp=500, seed=NULL, trace=FALSE, use.correction=TRUE, ncsteps=200, getalpha="TwoStep", rawMD2Dist="ChiSq", MD2Dist="ChiSq", eta=0.025, multiCmpCor="never", getkdblstar="Twopplusone", outlin="MidPandLogR", trialmethod="simple", m=1, reweighted=TRUE, k2max=1e6, otpType="SetMD2andEst")
RobEstControl(alpha=0.75, nsamp=500, seed=NULL, trace=FALSE, use.correction=TRUE, ncsteps=200, getalpha="TwoStep", rawMD2Dist="ChiSq", MD2Dist="ChiSq", eta=0.025, multiCmpCor="never", getkdblstar="Twopplusone", outlin="MidPandLogR", trialmethod="simple", m=1, reweighted=TRUE, k2max=1e6, otpType="SetMD2andEst")
alpha |
Numeric parameter controlling the size of the subsets over which the trimmed likelihood is maximized; roughly alpha*nrow(Sdt) observations are used for computing the trimmed likelihood. Allowed values are between 0.5 and 1. Note that when argument ‘getalpha’ is set to “TwoStep” the final value of ‘alpha’ is estimated by a two-step procedure and the value of argument ‘alpha’ is only used to specify the size of the samples used in the first step. |
nsamp |
Number of subsets used for initial estimates. |
seed |
Starting value for random generator. |
trace |
Whether to print intermediate results. |
use.correction |
Whether to use finite sample correction factors. |
ncsteps |
The maximum number of concentration steps used each iteration of the fasttle algorithm. |
getalpha |
Argument specifying if the ‘alpha’ parameter (roughly the percentage of the sample used for computing the trimmed likelihood) should be estimadted from the data, or if the value of the argument ‘alpha’ should be used instead. When set to “TwoStep”, ‘alpha’ is estimated by a two-step procedure with the value of argument ‘alpha’ specifying the size of the samples used in the first step. Otherwise the value of argument ‘alpha’ is used directly. |
rawMD2Dist |
The assumed reference distribution of the raw MCD squared distances, which is used to find to cutoffs defining the observations kept in one-step reweighted MCD estimates. Alternatives are ‘ChiSq’,‘HardRockeAsF’ and ‘HardRockeAdjF’, respectivelly for the usual Chi-squared, and the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005). |
MD2Dist |
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-squared, the Beta and F distributions proposed by Cerioli (2010). |
eta |
Nominal size of the null hypothesis that a given observation is not an outlier. Defines the raw MCD Mahalanobis distances cutoff used to choose the observations kept in the reweightening step. |
multiCmpCor |
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at ‘eta’. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – as sugested by Cerioli (2010), make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers were detected stop. Otherwise, make a second step testing for outliers at ‘eta’. |
getkdblstar |
Argument specifying the size of the initial small (in order to minimize the probability of outliers) subsets. If set to the string “Twopplusone” (default) the initial sets have twice the number of interval-value variables plus one which are they are the smaller samples that lead to a non-singular covaraince estimate). Otherwise, an integer with the size of the initial sets. |
outlin |
The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges. |
trialmethod |
The method to find a trial subset used to initialize each replication of the fasttle algorithm. The current options are “simple” (default) that simply selects ‘kdblstar’ observations at random, and “Poolm” that divides the original sample into ‘m’ non-overlaping subsets, applies the ‘simple trial’ and the refinement methods to each one of them, and merges the results into a trial subset. |
m |
Number of non-overlaping subsets used by the trial method when the argument of ‘trialmethod’ is set to 'Poolm'. |
reweighted |
Should a (Re)weighted estimate of the covariance matrix be used in the computation of the trimmed likelihood or just a “raw” covariance estimate; default is (Re)weighting. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
otpType |
The amount of output returned by fasttle. |
A RobEstControl
object
Brito, P., Duarte Silva, A. P. (2012): "Modelling Interval Data with Normal and Skew-Normal Distributions". Journal of Applied Statistics, Volume 39, Issue 1, 3-20.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators.
Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances.
Journal of Computational and Graphical Statistics 14, 910–927.
Todorov V. and Filzmoser P. (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software 32(3), 1–47.
RobEstControl
, fasttle
, RobMxtDEst
, Roblda
, Robqda
This class extends the CovControlMcd
class
and contains control parameters for the robust estimation of parametric interval data models.
Objects can be created by calls of the form new("RobEstControl", ...)
or by calling the constructor-function RobEstControl
.
alpha
:Inherited from class "CovControlMcd"
. Numeric parameter controlling the size of the subsets over which the trimmed likelihood is maximized; roughly alpha*nrow(Sdt) observations are used for computing the trimmed likelihood. Allowed values are between 0.5 and 1.
Note that when argument ‘getalpha’ is set to “TwoStep” the final value of ‘alpha’ is estimated by a two-step procedure and the value of argument ‘alpha’ is only used to specify the size of the samples used in the first step.
nsamp
:Inherited from class "CovControlMcd"
. Number of subsets used for initial estimates.
scalefn
:Inherited from class "CovControlMcd"
and not used in the package ‘Maint.Data.’
maxcsteps
:Inherited from class "CovControlMcd"
and not used in the package ‘Maint.Data.’
seed
:Inherited from class "CovControlMcd"
. Starting value for random generator. Default is seed = NULL.
use.correction
:Inherited from class "CovControlMcd"
. Whether to use finite sample correction factors.
Default is use.correction=TRUE
.
trace
, tolSolve
:Inherited from class "CovControl"
.
ncsteps
:The maximum number of concentration steps used each iteration of the fasttle algorithm.
getalpha
:Argument specifying if the ‘alpha’ parameter (roughly the percentage of the sample used for computing the trimmed likelihood) should be estimated from the data, or if the value of the argument ‘alpha’ should be used instead. When set to “TwoStep”, ‘alpha’ is estimated by a two-step procedure with the value of argument ‘alpha’ specifying the size of the samples used in the first step. Otherwise, with the value of argument ‘alpha’ is used directly.
rawMD2Dist
:The assumed reference distribution of the raw MCD squared distances, which is used to find to cutoffs defining the observations kept in one-step reweighted MCD estimates. Alternatives are ‘ChiSq’,‘HardRockeAsF’ and ‘HardRockeAdjF’, respectivelly for the usual Chi-squared, and the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005).
MD2Dist
:The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-squared, and the Beta and F distributions proposed by Cerioli (2010).
eta
:Nominal size of the null hypothesis that a given observation is not an outlier. Defines the raw MCD Mahalanobis distances cutoff used to choose the observations kept in the reweightening step.
multiCmpCor
:Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at ‘eta’. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – as suggested by Cerioli (2010), make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers were detected stop. Otherwise, make a second step testing for outliers at ‘eta’.
getkdblstar
:Argument specifying the size of the initial small (in order to minimize the probability of outliers) subsets. If set to the string “Twopplusone” (default) the initial sets have twice the number of interval-value variables plus one (i.e., they are the smaller samples that lead to a non-singular covariance estimate). Otherwise, an integer with the size of the initial sets.
k2max
:Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results.
outlin
:The type of outliers to be consideres. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.
trialmethod
:The method to find a trial subset used to initialize each replication of the fasttle algorithm. The current options are “simple” (default) that simply selects ‘kdblstar’ observations at random, and “Poolm” that divides the original sample into ‘m’ non-overlaping subsets, applies the ‘simple trial’ and the refinement methods to each one of them, and merges the results into a trial subset.
m
:Number of non-overlaping subsets used by the trial method when the argument of ‘trialmethod’ is set to 'Poolm'.
reweighted
:Should a (Re)weighted estimate of the covariance matrix be used in the computation of the trimmed likelihood or just a “raw” covariance estimate; default is (Re)weighting.
otpType
:The amount of output returned by fasttle. Current options are “OnlyEst” (default) where only an ‘IdtE’ object with the fasttle estimates is returned, “SetMD2andEst” which returns a list with an ‘IdtE’ object of fasttle estimates, a vector with the final trimmed subset elements used to compute these estimates and the corresponding robust squared Mahalanobis distances, and “SetMD2EstandPrfSt” wich returns a list with the previous three components plust a list of some performance statistics concerning the algorithm execution.
Class CovControlMcd
, directly.
Class CovControl
by CovControlMcd, distance 2.
No methods defined with class "RobEstControl" in the signature.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147–156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances.
Journal of Computational and Graphical Statistics 14, 910–927.
Todorov V. and Filzmoser P. (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software 32(3), 1–47.
RobEstControl
, fasttle
, RobMxtDEst
, Roblda
, Robqda
RobMxtDEst estimates mixtures of distribution for interval-valued data using robust methods.
## S4 method for signature 'IData' RobMxtDEst(Sdt, grouping, Mxt=c("Hom","Het"), CovEstMet=c("Pooled","Globdev"), CovCase=1:4, SelCrit=c("BIC","AIC"), Robcontrol=RobEstControl(), l1medpar=NULL, ...)
## S4 method for signature 'IData' RobMxtDEst(Sdt, grouping, Mxt=c("Hom","Het"), CovEstMet=c("Pooled","Globdev"), CovCase=1:4, SelCrit=c("BIC","AIC"), Robcontrol=RobEstControl(), l1medpar=NULL, ...)
Sdt |
An IData object representing interval-valued entities. |
grouping |
Factor indicating the group to which each observation belongs to. |
Mxt |
Indicates the type of mixing distributions to be considered. Current alternatives are “Hom” (homocedastic) and “Het” (hetereocedasic). |
CovEstMet |
Method used to estimate the common covariance matrix. Alternatives are “Pooled” (default) for a pooled average of the the robust within-groups covariance estimates, and “Globdev” for a global estimate based on all deviations from the groups multivariate l_1 medians. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
Robcontrol |
A control object (S4) of class |
l1medpar |
List of named arguments to be passed to the function |
... |
Other named arguments. |
An object of class IdtMxNDRE, containing the estimation results.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.
Todorov V. and Filzmoser P. (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software 32(3), 1–47.
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ## Not run: # Estimate robustly an homoscedastic mixture, with mixture components defined by regions ChinaHomMxtRobE <- RobMxtDEst(ChinaT,ChinaTemp$GeoReg) print(ChinaHomMxtRobE) # Estimate robustly an heteroscedastic mixture, with mixture components defined by regions ChinaHetMxtRobE <- RobMxtDEst(ChinaT,ChinaTemp$GeoReg,Mxt="Het") print(ChinaHetMxtRobE) ## End(Not run)
# Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) ## Not run: # Estimate robustly an homoscedastic mixture, with mixture components defined by regions ChinaHomMxtRobE <- RobMxtDEst(ChinaT,ChinaTemp$GeoReg) print(ChinaHomMxtRobE) # Estimate robustly an heteroscedastic mixture, with mixture components defined by regions ChinaHetMxtRobE <- RobMxtDEst(ChinaT,ChinaTemp$GeoReg,Mxt="Het") print(ChinaHetMxtRobE) ## End(Not run)
snda performs discriminant analysis of Interval Data based on estimates of mixtures of Skew-Normal models
## S4 method for signature 'IData' snda(x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Loc","Gen"), k2max=1e6, ... ) ## S4 method for signature 'IdtLocSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtLocNSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)@SNMod), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenNSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)@SNMod), silent=FALSE, k2max=1e6, ... )
## S4 method for signature 'IData' snda(x, grouping, prior="proportions", CVtol=1.0e-5, subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), Mxt=c("Loc","Gen"), k2max=1e6, ... ) ## S4 method for signature 'IdtLocSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtLocNSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)@SNMod), egvtol=1.0e-10, silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)), silent=FALSE, k2max=1e6, ... ) ## S4 method for signature 'IdtGenNSNMANOVA' snda( x, prior="proportions", selmodel=BestModel(H1res(x)@SNMod), silent=FALSE, k2max=1e6, ... )
x |
An object of class |
grouping |
Factor specifying the class for each observation. |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
CVtol |
Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant. |
subset |
An index vector specifying the cases to be used in the analysis. |
CovCase |
Configuration of the variance-covariance matrix: a set of integers between 1 and 4. |
SelCrit |
The model selection criterion. |
Mxt |
Indicates the type of mixing distributions to be considered. Current alternatives are “Loc” (location model – groups differ only on the location parameters of a Skew-Normal model) and “Gen” (general model – groups differ on all parameters of a Skew-Normal models). |
silent |
A boolean flag indicating whether a warning message should be printed if the method fails. |
selmodel |
Selected model from a list of candidate models saved in object x. |
egvtol |
Tolerance level for the eigenvalues of the product of the inverse within by the between covariance matrices. When a eigenvalue has an absolute value below egvtol, it is considered to be zero. |
k2max |
Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results. |
... |
Other named arguments. |
Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715–726.
Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.
Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.
lda
, qda
, Roblda
, Robqda
, IData
, IdtLocSNMANOVA
, IdtLocNSNMANOVA
, IdtGenSNMANOVA
,IdtGenSNMANOVA
, ConfMat
, ConfMat
## Not run: # Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) # Skew-Normal based discriminant analysis, asssuming that the different regions differ # only in location parameters ChinaT.locsnda <- snda(ChinaT,ChinaTemp$GeoReg,Mxt="Loc") cat("Temperatures of China -- SkewNormal location model discriminant analysis results:\n") print(ChinaT.locsnda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.locsnda,ChinaT)$class) #Estimate error rates by three-fold cross-validation without replication CVlocsnda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=snda,Mxt="Loc", CovCase=CovCase(ChinaT.locsnda),kfold=3,CVrep=1) summary(CVlocsnda[,,"Clerr"]) # Skew-Normal based discriminant analysis, asssuming that the different regions may differ # in all SkewNormal parameters ChinaT.gensnda <- snda(ChinaT,ChinaTemp$GeoReg,Mxt="Gen") cat("Temperatures of China -- SkewNormal general model discriminant analysis results:\n") print(ChinaT.gensnda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.gensnda,ChinaT)$class) #Estimate error rates by three-fold cross-validation without replication CVgensnda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=snda,Mxt="Gen", CovCase=CovCase(ChinaT.gensnda),kfold=3,CVrep=1) summary(CVgensnda[,,"Clerr"]) ## End(Not run)
## Not run: # Create an Interval-Data object containing the intervals for 899 observations # on the temperatures by quarter in 60 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4")) # Skew-Normal based discriminant analysis, asssuming that the different regions differ # only in location parameters ChinaT.locsnda <- snda(ChinaT,ChinaTemp$GeoReg,Mxt="Loc") cat("Temperatures of China -- SkewNormal location model discriminant analysis results:\n") print(ChinaT.locsnda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.locsnda,ChinaT)$class) #Estimate error rates by three-fold cross-validation without replication CVlocsnda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=snda,Mxt="Loc", CovCase=CovCase(ChinaT.locsnda),kfold=3,CVrep=1) summary(CVlocsnda[,,"Clerr"]) # Skew-Normal based discriminant analysis, asssuming that the different regions may differ # in all SkewNormal parameters ChinaT.gensnda <- snda(ChinaT,ChinaTemp$GeoReg,Mxt="Gen") cat("Temperatures of China -- SkewNormal general model discriminant analysis results:\n") print(ChinaT.gensnda) cat("Resubstition confusion matrix:\n") ConfMat(ChinaTemp$GeoReg,predict(ChinaT.gensnda,ChinaT)$class) #Estimate error rates by three-fold cross-validation without replication CVgensnda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=snda,Mxt="Gen", CovCase=CovCase(ChinaT.gensnda),kfold=3,CVrep=1) summary(CVgensnda[,,"Clerr"]) ## End(Not run)
S4 methods for function stdEr. As in the generic stdEr S3 ‘miscTools’ method, these methods extract standard errors of the parameter estimates, for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' stdEr(x, selmodel=BestModel(x), ...) ## S4 method for signature 'IdtSNDE' stdEr(x, selmodel=BestModel(x), ...) ## S4 method for signature 'IdtNandSNDE' stdEr(x, selmodel=BestModel(x), ...)
## S4 method for signature 'IdtNDE' stdEr(x, selmodel=BestModel(x), ...) ## S4 method for signature 'IdtSNDE' stdEr(x, selmodel=BestModel(x), ...) ## S4 method for signature 'IdtNandSNDE' stdEr(x, selmodel=BestModel(x), ...)
x |
An object representing a model fitted to interval data. |
selmodel |
Selected model from a list of candidate models saved in object x. |
... |
Additional arguments for method functions. |
A vector of the estimated standard deviations of the parameter estimators.
summary methods for the classe IdtMclust defined in Package ‘MAINT.Data’.
## S4 method for signature 'IdtMclust' summary(object, parameters = FALSE, classification = FALSE, model = "BestModel", ShowClassbyOBs = FALSE, ...)
## S4 method for signature 'IdtMclust' summary(object, parameters = FALSE, classification = FALSE, model = "BestModel", ShowClassbyOBs = FALSE, ...)
object |
An object of class |
parameters |
A boolean flag indicating if the parameter estimates of the optimal mixture should be displayed |
classification |
A boolean flag indicating if the crisp classification resulting from the optimal mixture should be displayed |
model |
A character vector specifying the the model whose solution is to be displayed. |
ShowClassbyOBs |
A boolean flag indicating if class membership should shown by observation or by class (default) |
.
... |
Other named arguments. |
Idtmclust
, IdtMclust
, plotInfCrt
, pcoordplot
Performs statistical likelihood-ratio tests that evaluate the goodness-of-fit of a nested model against a more general one.
testMod(ModE,RestMod=ModE@ModelConfig[2]:length(ModE@ModelConfig),FullMod="Next")
testMod(ModE,RestMod=ModE@ModelConfig[2]:length(ModE@ModelConfig),FullMod="Next")
ModE |
An object of class |
RestMod |
Indices of the restricted models being evaluated in the NULL hypothesis |
FullMod |
Either indices of the general models being evaluated in the alternative hypothesis or the strings "Next" (default) or "All". In the former case a Restricted model is always compared against the most parsimonious alternative that encompasses it, and in latter all possible comparisons are performed |
An object of class ConfTests with the results of the tests performed
# Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate by maximum likelihood the parameters of Gaussian models # for the Winter (1st and 4th) quarter intervals ChinaWTE <- mle(ChinaT[,c(1,4)]) cat("China maximum likelhiood estimation results for Winter quarters:\n") print(ChinaWTE) # Perform Likelihood-Ratio tests comparing models with consecutive nested Configuration testMod(ChinaWTE) # Perform Likelihood-Ratio tests comparing all possible models testMod(ChinaWTE,FullMod="All") # Compare model with covariance Configuration case 3 (MidPoints independent of LogRanges) # against model with covariance Configuration 1 (unrestricted covariance) testMod(ChinaWTE,RestMod=3,FullMod=1)
# Create an Interval-Data object containing the intervals of temperatures by quarter # for 899 Chinese meteorological stations. ChinaT <- IData(ChinaTemp[1:8]) # Estimate by maximum likelihood the parameters of Gaussian models # for the Winter (1st and 4th) quarter intervals ChinaWTE <- mle(ChinaT[,c(1,4)]) cat("China maximum likelhiood estimation results for Winter quarters:\n") print(ChinaWTE) # Perform Likelihood-Ratio tests comparing models with consecutive nested Configuration testMod(ChinaWTE) # Perform Likelihood-Ratio tests comparing all possible models testMod(ChinaWTE,FullMod="All") # Compare model with covariance Configuration case 3 (MidPoints independent of LogRanges) # against model with covariance Configuration 1 (unrestricted covariance) testMod(ChinaWTE,RestMod=3,FullMod=1)
S4 methods for function var. These methods extract estimates of variance-covariance matrices for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' var(x) ## S4 method for signature 'IdtSNDE' var(x) ## S4 method for signature 'IdtNandSNDE' var(x) ## S4 method for signature 'IdtMxNDE' var(x) ## S4 method for signature 'IdtMxSNDE' var(x)
## S4 method for signature 'IdtNDE' var(x) ## S4 method for signature 'IdtSNDE' var(x) ## S4 method for signature 'IdtNandSNDE' var(x) ## S4 method for signature 'IdtMxNDE' var(x) ## S4 method for signature 'IdtMxSNDE' var(x)
x |
An object representing a model fitted to interval data. |
For the IdtNDE
, IdtSNDE
and IdtNandSNDE
methods or IdtMxNDE
, IdtMxSNDE
methods with slot “Hmcdt” equal to TRUE: a matrix with the estimated covariances.
For the IdtMxNDE
, and IdtMxSNDE
methods with slot “Hmcdt” equal to FALSE: a three-dimensional array with a matrix with the estimated covariances for each group at each level of the third dimension.
S4 methods for function vcov. As in the generic vcov S3 ‘stats’ method, these methods extract variance-covariance estimates of parameter estimators, for the models fitted to Interval Data.
## S4 method for signature 'IdtNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtSNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtNandSNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtMxNDE' vcov(object, selmodel=BestModel(object), group=NULL, ...) ## S4 method for signature 'IdtMxSNDE' vcov(object, selmodel=BestModel(object), group=NULL, ...)
## S4 method for signature 'IdtNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtSNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtNandSNDE' vcov(object, selmodel=BestModel(object), ...) ## S4 method for signature 'IdtMxNDE' vcov(object, selmodel=BestModel(object), group=NULL, ...) ## S4 method for signature 'IdtMxSNDE' vcov(object, selmodel=BestModel(object), group=NULL, ...)
object |
An object representing a model fitted to interval data. |
selmodel |
Selected model from a list of candidate models saved in object. |
group |
The group for each the estimated parameter variance-covariance will be returned. If NULL (default),
“vcov” will return a three-dimensional array with a matrix of the estimated covariances between the parameter estimates for each group at each level of the third dimension.
Note that this argument is only used in heterocedastic models, i.e. in the |
... |
Additional arguments for method functions. |
For the IdtNDE
, IdtSNDE
and IdtNandSNDE
methods or IdtMxNDE
, IdtMxSNDE
methods with slot “Hmcdt” equal to TRUE: a matrix of the estimated covariances between the parameter estimates. For the IdtMxNDE
, and IdtMxSNDE
methods with slot “Hmcdt” equal to FALSE: if argument “group” is set to NULL, a three-dimensional array with a matrix of the estimated covariances between the parameter estimates for each group at each level of the third dimension. If argument “group” is set to an integer, the matrix with the estimated covariances between the parameter estimates, for the group chosen.