Abstract
To analyze how groups of people self-describe on social media, existing studies separately or partly exploited labels and textual self-descriptions. In this paper, we leveraged both sources of information about the users in an integrated procedure, by training a language model on a sample of text documents with balanced characteristics, and then, by using a list of both domain-specific and statistically relevant words as a guide, similarities between word and document representations were explored to analyze group differences in self-describing. Eventually, a bootstrap procedure was employed to assess the reliability of the results. The methodology proposed was applied on data from StockTwits platform, where people write a bio and declare their experience, approach, and primary holding period in trading. We found that groups of traders differ in semantics. In addition, the relationship between the approach in trading and the holding period still holds in self-descriptions, whereas the experience transversely influences bio writing, with professionals using words strictly specific to the domain of stock trading.