Afnan Al-Subaihin
Empirical comparison of text-based mobile apps similarity measurement techniques
Al-Subaihin, Afnan; Sarro, Federica; Black, Sue; Capra, Licia
Abstract
Context: Code-free software similarity detection techniques have been used to support different software engineering tasks, including clustering mobile applications (apps). The way of measuring similarity may affect both the efficiency and quality of clustering solutions. However, there has been no previous comparative study of feature extraction methods used to guide mobile app clustering. Objective: In this paper, we investigate different techniques to compute the similarity of apps based on their textual descriptions and evaluate their effectiveness using hierarchical agglomerative clustering. Method: To this end we carry out an empirical study comparing five different techniques, based on topic modelling and keyword feature extraction, to cluster 12,664 apps randomly sampled from the Google Play App Store. The comparison is based on three main criteria: silhouette width measure, human judgement and execution time. Results: The results of our study show that using topic modelling, in addition to collocation-based and dependency-based feature extractors perform similarly in detecting app-feature similarity. However, dependency-based feature extraction performs better than any other in finding application domain similarity (ρ = 0.7,p − value < 0.01). Conclusions: Current categorisation in the app store studied does not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.
Citation
Al-Subaihin, A., Sarro, F., Black, S., & Capra, L. (2019). Empirical comparison of text-based mobile apps similarity measurement techniques. Empirical Software Engineering, 24(6), 3290-3315. https://doi.org/10.1007/s10664-019-09726-5
Journal Article Type | Article |
---|---|
Online Publication Date | Jun 24, 2019 |
Publication Date | Dec 31, 2019 |
Deposit Date | Jul 12, 2019 |
Publicly Available Date | Dec 6, 2019 |
Journal | Empirical Software Engineering |
Print ISSN | 1382-3256 |
Electronic ISSN | 1573-7616 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 24 |
Issue | 6 |
Pages | 3290-3315 |
DOI | https://doi.org/10.1007/s10664-019-09726-5 |
Public URL | https://durham-repository.worktribe.com/output/1297535 |
Files
Published Journal Article (Advance online publication)
(909 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
Advance online publication © The Author(s) 2019. Open Access.
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Published Journal Article
(932 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Self-Regulated Sample Diversity in Large Language Models
(2024)
Presentation / Conference Contribution
Digital Inclusion in Nothern England: Training Women from Underrepresented Communities in Tech: A Data Analytics Case Study
(2020)
Presentation / Conference Contribution
Clustering Mobile Apps Based on Mined Textual Features
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search