Robust Clustering of Banks in Argentina
DOI:
https://doi.org/10.55444/2451.7321.2018.v56.n1.29385Keywords:
robust clustering, projection pursuit, common principal components, robust K-means, influence measures, theory of the firmAbstract
The purpose of this paper is to classify and characterize 64 banks, active as of 2010 in Argentina, by means of robust techniques used on information gathered during the period 2001-2010. Based on the strategy criteria established in (Wang 2007) and (Werbin 2010), seven variables were selected. In agreement with bank theory, four “natural” clusters were obtained, named “Personal”, “Commercial”, “Typical” and “Other banks”. In order to understand this grouping, projection pursuit based robust principal component analysis was conducted on the whole set showing that essentially three variables can be attributed the formation of different clusters. In order to reveal each group inner structure, we used R package mclust to fit a finite Gaussian mixture to the data. This revealed approximately a similar component structure, granting a common principal components analysis as in (Boente and Rodrigues, 2002). This allowed us to identify three variables which suffice for grouping and characterizing each cluster. Boente’s influence measures were used to detect extreme cases in the common principal components analysis.
Downloads
References
Boente, G., and L. Orellana (2001). “A robust approach to common principal components”. Statistics in Genetics and in the Environmental Sciences. Ed. by Birkhauser Basel et al., pp. 117–147.
Boente, G., A. M. Pires, and I. M.. Rodrigues (2002). “Influence functions and outlier detection under the common principal components model: A robust approach”. Biometrika 89.4, pp. 861–875.
Boente, G., A. M. Pires, and I. M. Rodrigues (2010). “Detecting influential observations in principal components and common principal components”. Computational Statistics and Data Analysis 54, pp. 2967–2975.
Chen, Edwin (2010). R Implementation of gap-statistics. [Retrieved from https://github.com/echen/gap-statistic/blob/master/gap-statistic.R].
Croux, C. personal website. [Retrieved from http://www.econ.kuleuven.be/public/NDBAE06/programs/#pca].
Croux, C., P. Filzmoser, and M. R. Oliveira (2005). “Algorithms for projection-pursuit robust principal component analysis”. Department of Decision Sciences an Information Management (KBI) KBI 0624.
Croux, C., and A. Ruiz-Gazen (1996). “A fast algorithm for robust principal components based on projection pursuit”. Compstat: Proceedings in Computational Statistics. Ed. by A. Prat. Physica-Verlag, Heidelberg, pp. 211-217.
Croux, C., and A. Ruiz-Gazen(2005). “High breakdown estimators for principal components: The projection-pursuit approach revisited”. Journal of Multivariate Analysis 95, pp. 206–226.
Ding, Chris, and Xiaofeng He (2004). “K-means clustering via principal component analysis”. Proceedings of the 21 St International Conference on Machine Learning. Canada, 2004: Banff.
Ercan, H and S. Sayaseng (2016). “The cluster analysis of the banking sector in Europe”. Economics and Management of Global Value Chains, pp. 111–127.
Farnè, M. and A. Vouldis (2017). “Business models of the banks in the euro area”. Working Paper Series European Central Bank 2070.
Filzmoser, P. et al. (2012). Package “pcaPP”.Peter Filzmoser, Heinrich Fritz and Klaudius Kalcher. [Retrieved from http://www.statistik.tuwien.ac.at/public/filz/]
Flury, B. K. (1984). “Common principal components in K Groups”.J. Amer. Statist. Assoc. 79, pp. 892–898.
Flury, B. K. (1988). Common principal components and related multivariate models, Wiley, New York.
Fraley, C., and A. Raftery (2007). “Model-based methods of classification: Using mclust software in chemometrics”. Journal of Statistical Software 18.6. [Retrieved from: http://www.jstatsoft.org/].
Fraley, C., A. Raftery, and Scrucca L. R package mclust Mantainer Luca Scrucca luca@stat.unipg.it.
Gordaliza, A. (1991a). “Best approximations to random variables based on trimming procedures”. Journal of Approximation Theory 64.2, pp. 162–180.
Gordaliza, A. (1991b). “On the breakdown point of multivariate location estimators based on trimming procedures”. Statistics & Probability Letters 11.5, pp. 387–394.
Hartigan, J. A. (1975). Clustering Algorithms. Inc: John Wiley & Sons.
Hennig, C. R Package fpc. c.hennig@ucl.ac.uk ucakche/ [Retrieved from http://www.homepages.ucl.ac.uk/].
Kassani, S.H., P. H. Kassani, and S. E. Najafi (2015). “Introducing a hybrid model of DEA and data mining in evaluating efficiency. Case study: Bank Branches”. Academic Journal of Research in Economics and Management 3.2, pp. 72–80.
Kondo, Y. (2011). “Robustification of the sparse K-means clustering algorithm”. University of British Columbia.
Kondo, Yumi, Matias Salibian-Barrera, and Ruben Zamar (2016). “RSKC: An R package for a robust and sparse K-means clustering algorithm”. Journal of Statistical Software 72.5, pp. 1–26. [Retrieved from https://doi.org/10.18637/jss.v072.i05].
Li, G., and Z. Chen (1985). “Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte Carlo”. Journal of the American Statistical Association 80.391, pp. 759–766.
Lloyd, S. P. (1982). “Least squares quantization in PCM”. IEEE Transactions on Information Theory 28.2, pp. 129–136.
Sørensen, C. K. and J. M. Puigvert Gutiérrez (2006). “Euro area banking sector integration using hierarchical cluster analysis techniques”. Working Paper Series EuropeanCentral Bank 627.
Tibshirani, R., G. Whalter, and T. Hastie (2001). “Estimating the number of clusters in a data set via the gap statistic”. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 Part 2, pp. 411–423.
Wang, D. (2007). “Three Essays on Bank Technology, cost Structure, and Performance”. PhD Dissertation. State University of New York at Binghamton.
Werbin, Eliana (2010). “Los determinantes de la rentabilidad de los bancos en Argentina (2005 – 2007)”. PhD thesis, Universidad Nacional de Córdoba.
Witten, D. M., and R. A Tibshirani (2010). “Framework for feature selection in clustering”. Journal of the American Statistical Association 105.490, pp. 713–726.
Downloads
Published
Issue
Section
License
Copyright (c) 2018 José M. Vargas, Margarita Díaz, Fernando García
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who have publications with this journal agree to the following terms:
Authors retain their copyright and grant the journal the right of first publication of their work, which is simultaneously subject to the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License that allows third parties to share the work provided that its author and first publication in this journal are indicated.
Authors may adopt other non-exclusive licensing arrangements for distribution of the published version of the work (e.g. depositing it in an institutional telematic archive or publishing it in a monographic volume) as long as the initial publication in this journal is indicated.
Authors are allowed and encouraged to disseminate their work via the Internet (e.g. in institutional telematic archives or on their website) before and during the submission process, which can lead to interesting exchanges and increase citations of the published work. (See The Open Access Effect)