Skip to main content

Research Repository

Advanced Search

Author profiling: Gender prediction from Tweets and images: Notebook for PAN at CLEF 2018

HaCohen-Kerner, Yaakov; Yigal, Yair; Shayovitz, Elyashiv; Miller, Daniel; Breckon, Toby

Author profiling: Gender prediction from Tweets and images: Notebook for PAN at CLEF 2018 Thumbnail


Authors

Yaakov HaCohen-Kerner

Yair Yigal

Elyashiv Shayovitz

Daniel Miller



Abstract

Author profiling deals with identification of various details about the author of the text (e.g., age, cultural background, gender, native language, personality). In this paper, we describe the participation of our teams (yigall8 and millerl8, both teams contain the same people, but in another order) in the PAN 2018 shared task on author profiling, identifying authors' gender where for each author, 100 tweets and 10 images are provided. The authors were grouped by the language of their tweets: English, Spanish, and Arabic. In this paper, we describe our pre-processing, feature sets, machine learning methods and accuracy results. The best results using the textual features were achieved using the MLP method after applying the L normalization and using 9, 000 word unigrams for English, 10, 000 word unigrams and one orthographic feature for Spanish, and 7, 000 word unigrams and one orthographic feature for Arabic. We also tried various additional feature sets, including style-based feature sets. In most of the cases, these features did not improve the results and in a few cases even hurt the results. The best result (61.54%) for the visual features was obtained by the LR method using all the features (SIFT & Color & VGG) and the best basic feature set is the VGG. The best result for the combined features was achieved using modeL2 (millerl8) with 0.75 as a weight to the best textual model and a weight of 0.25 for NN Classifier (Keras) using only the 1000 VGG features.

Citation

HaCohen-Kerner, Y., Yigal, Y., Shayovitz, E., Miller, D., & Breckon, T. (2018, October). Author profiling: Gender prediction from Tweets and images: Notebook for PAN at CLEF 2018. Presented at CEUR Workshop Proceedings, Torino, Italy

Presentation Conference Type Conference Paper (published)
Conference Name CEUR Workshop Proceedings
Start Date Oct 22, 2018
Acceptance Date Jan 1, 2018
Publication Date Jan 1, 2018
Deposit Date Feb 23, 2025
Publicly Available Date Feb 27, 2025
Print ISSN 1613-0073
Peer Reviewed Peer Reviewed
Volume 2125
Public URL https://durham-repository.worktribe.com/output/3536319
Publisher URL https://ceur-ws.org/Vol-2125/

Files





You might also like



Downloadable Citations