Sains Malaysiana 50(6)(2021):
1787-1798
http://doi.org/10.17576/jsm-2021-5006-24
Comparative Study of
Clustering-Based Outliers Detection Methods in Circular-Circular Regression
Model
(Kajian Perbandingan Kaedah Penetapan Titik Terpencil Berasaskan
Kelompok dalam Model Pendaftaran Lingkaran)
SITI ZANARIAH SATARI1*,
NUR FARAIDAH MUHAMMAD DI1*, YONG ZULINA ZUBAIRI2 &
ABDUL GHAPOR HUSSIN3
1Centre for Mathematical Sciences College of Computing & Applied Sciences, Universiti Malaysia Pahang, 26300 Kuantan, Pahang Darul Makmur,
Malaysia
2Centre for Foundation Studies in Sciences,
University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia
3Faculty of Defence Sciences and
Technology, National Defence University of Malaysia, Sungai Besi Camp, 57000
Kuala Lumpur, Federal Territory, Malaysia
Diserahkan: 7 Mei 2019/Diterima: 14 Oktober
2020
ABSTRACT
This
paper is a comparative study of several algorithms for detecting multiple
outliers in circular-circular regression model based on the clustering
algorithms. Three measures of similarity based on the circular distance were
used to obtain a cluster tree using the agglomerative hierarchical methods. A
stopping rule for the cluster tree based on the mean direction and circular
standard deviation of the tree height was used as the cutoff point and
classifier to the cluster group that exceeded the stopping rule as potential
outliers. The performances of the algorithms have been demonstrated using the
simulation studies that consider several outlier scenarios with a certain
degree of contamination. Application to real data using wind data and a
simulated data set are given for illustrative purposes. Thus, it has been found
that Satari’s algorithm (S-SL algorithm) performs well for any values of sample
size n and error concentration parameter. The algorithms are good in
identifying outliers which are not limited to one or few outliers only, but the
presence of multiple outliers at one time.
Keywords:
Circular distance; circular-circular regression model; clustering; outliers;
stopping rule
ABSTRAK
Kertas
ini membincangkan kajian perbandingan beberapa algoritma yang mengesan titik
terpencil berganda dalam model regresi bulatan berdasarkan algoritma
berkelompok. Tiga ukuran persamaan berasaskan jarak bulatan telah digunakan
bagi mendapatkan pokok kelompok menggunakan algoritma aglomeratif hierarki.
Satu nilai potongan untuk pokok kelompok berdasarkan min terarah dan sisihan
piawai bulatan bagi ketinggian pokok tersebut telah digunakan bagi mengkelaskan
kumpulan kelompok yang melebihi titik potongan ini sebagai titik terpencil.
Prestasi algoritma ini telah diuji dalam kajian simulasi yang mengambil kira
beberapa senario titik terpencil dengan tahap berbeza. Untuk tujuan illustrasi,
satu aplikasi data sebenar menggunakan data angin dan satu set data simulasi
telah diberikan. Kami mendapati algoritma Satari (Algoritma S-SL) adalah baik
untuk sebarang nilai saiz sampel dan parameter menumpu. Algoritma tersebut
adalah baik dalam mengenal pasti titik terpencil atau berganda pada satu masa.
Kata
kunci: Algoritma berkelompok; jarak bulatan; model regresi bulatan; nilai potongan;
titik terpencil
RUJUKAN
Abuzaid, A.H. 2010. Some problems of
outliers in circular data. University of Malaya. Ph.D. Thesis (Unpublished).
Abuzaid,
A.H., Hussin, A.G. & Mohamed, I.B. 2013. Detection of outliers in simple
circular regression models using the mean circular error statistic. Journal of Statistical Computation and
Simulation 83(2): 269-277.
Abuzaid,
A.H., Mohamed, I.B. & Hussin, A.G. 2012a. Boxplot for circular variables. Computational Statistics 27(3): 381-392.
Abuzaid,
A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2012b. Statistics for a new
test of discordance in circular data. Communications
in Statistics-Simulation and Computation 41(10): 1882-1890.
Abuzaid,
A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2011. COVRATIO statistic for
simple circular-circular regression model. Chiang
Mai Journal of Science 38(3): 321-330.
Abuzaid,
A.H., Hussin, A.G. & Mohamed, I.B. 2009. Identifying single outlier in
linear circular-circular regression model based on circular distance. Journal of Applied Probability &
Statistics 3(1): 107-117.
Adnan,
R. & Mohamad, M.N. 2003. Multiple outliers detection procedures in linear regression. Matematika 19(1): 29-45.
Alkasadi,
N.A., Ibrahim, S., Ramli, M.F. & Yusoff, M.I. 2016. A comparative study of
outlier detection procedures in multiple circular regression. In AIP Conference Proceedings 1775(1):
1-7.
Blashfield,
R.K. & Morey, L.C. 1980. A comparison of four clustering methods using MMPI
Monte Carlo data. Applied
Psychological Measurement 4(1): 57-64.
Di,
N.F.M. & Satari, S.Z. 2017. The effect of different distance measures in
detecting outliers using clustering-based algorithm for circular regression
model. In AIP Conference Proceedings 1842(1):
1-13.
Di, N.F.M.,
Satari, S.Z. & Zakaria, R. 2017. Detection of different outlier scenarios
in circular regression model using single-linkage method. Journal of Physics: Conference Series 890(1): 1-5.
Caires, S. &
Wyatt, L.R. 2003. A linear functional relationship model for circular data with
an application to the assessment of ocean wave measurements. Journal of Agricultural, Biological, and
Environmental Statistics 8(2): 153-169.
Chang-Chien,
S.J., Hung, W.L. & Yang, M.S. 2012. On mean shift-based clustering for
circular data. Soft Computing 16(6):
1043-1060.
Downs, T.D. &
Mardia, K.V. 2002. Circular regression. Biometrika 89(3): 683-698.
Fisher,
N.I. 1995. Statistical Analysis of
Circular Data. Cambridge: Cambridge University Press.
Gan,
G., Ma, C. & Wu, J. 2007. Data
Clustering: Theory, Algorithms, and Applications. United States of America:
SIAM.
Hartigan,
J.A. 1975. Clustering Algorithm. New
York: John Wiley & Sons Inc.
Hussin,
A.G. & Abuzaid, A.H. 2012. Detection of outliers in functional relationship
model for circular variables via complex form. Pakistan Journal of Statistics 28(2): 205-216.
Hussin,
A.G., Abuzaid, A.H., Mohamed, I. & Rambli, A. 2013. Detection of outliers
in the complex linear regression model. Sains
Malaysiana 42(6): 869-874.
Hussin,
A.G., Abuzaid, A., Zulkifli, F. & Mohamed, I. 2010. Asymptotic covariance
and detection of influential observations in a linear relationship model for
circular data with application to the measurements of wind directions. ScienceAsia 36(2010): 249-253.
Hussin,
A.G., Fieller, N.R. & Stillman, E.C. 2004. Linear regression model for
circular variables with application to directional data. Journal of Applied Science and Technology 9(1): 1-6.
Ibrahim,
S. 2013. Some outlier
problems in a circular-circular regression model. University of Malaya.
Ph.D. Thesis (Unpublished).
Ibrahim,
S., Rambli, A., Hussin, A.G. & Mohamed, I. 2013. Outlier detection in a
circular-circular regression model using COVRATIO statistic. Communications in Statistics-Simulation and
Computation 42(10): 2270-2280.
Jammalamadaka,
S.R. & Sengupta, A. 2001. Topics In Circular Statistics. Singapore: World Scientific.
Jammalamadaka,
S.R. & Sarma, Y.R. 1993. Circular regression. In Statistical Sciences and Data Analysis,
edited by Matusita, K. Puri, M.L. & Hayakawa, T. Utrecht:
VSP. pp. 109-128.
Milligan,
G.W. & Cooper, M.C. 1985. An examination of procedures for determining the
number of clusters in a data set. Psychometrika 50(2): 159-179.
Mojena,
R. 1977. Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal 20(4): 359-363.
Rambli,
A. 2011. Outlier detection in circular
data and circular-circular-circular regression model. University of Malaya.
M.Sc. Thesis (Unpublished).
Rambli,
A., Abuzaid, A.H., Mohamed, I.B. & Hussin, A.G. 2016. Procedure for
detecting outliers in a circular-circular regression model. PloS ONE 11(4): e0153074.
Rambli,
A., Yunus, R.M., Mohamed, I. & Hussin, A.G. 2015. Outlier detection in a
circular-circular regression model. Sains
Malaysiana 44(7): 1027-1032.
Rambli,
A., Mohamed, I., Abuzaid, A.H. & Hussin, A.G. 2010. Identification of
influential observations in circular-circular regression model. In Proceedings of the Regional Conference on
Statistical Sciences (RCSS’10). pp. 195-203.
Satari, S.Z., Di,
N.F.M. & Zakaria, R. 2017. The multiple outliers detection using agglomerative hierarchical methods in circular regression
model. Journal of Physics: Conference
Series 890(1): 1-5.
Satari, S.Z.
2015. Parameter estimation and outlier detection for some types of circular
model. University of Malaya. Ph.D. Thesis (Unpublished).
Sebert, D.M., Montgomery, D.C. &
Rollier, D.A. 1998. A clustering algorithm for identifying multiple outliers in
linear regression. Computational
Statistics and Data Analysis 27(4): 461-484.
*Pengarang untuk surat-menyurat; email: zanariah@ump.edu.my
|