Analyzing Audience Interaction and Evaluating Its Influence on Social Media Mental Health Disclosures

Jump to section

About this article

Abstract

Self-disclosure of mental illnesses has been shown to offer therapeutic and coping benefits. A critical factor in this process is the audience with whom individuals share their experiences. With the rise of online platforms, more mental illness disclosures occur publicly on social media. However, unlike specialized support groups consisting of empathetic peers, little is known about the effects of sharing with a largely “invisible” audience on general social media platforms. This study focuses on schizophrenia, a highly stigmatized mental illness, and presents the first comprehensive analysis of audience characteristics and their influence on subsequent disclosures of this condition on Twitter. Using a rich dataset spanning one year, including nearly 400 disclosers and approximately 400,000 audience members, we examine temporal patterns of audience engagement. Our findings reveal reciprocity dynamics between disclosers and their audiences. Grounded in Social Penetration Theory, and operationalizing disclosure intimacy, we employ an auto-regressive time series model that demonstrates how audience engagement and content patterns can predict shifts in the intimacy level of disclosures. The paper concludes by exploring the implications for designing socially supportive online environments conducive to mental health disclosures, especially for stigmatized conditions.

Introduction

In recent years, social media platforms have increasingly become spaces where individuals share their experiences with mental illnesses, engage in self-expression, raise awareness, combat stigma, find solidarity, and build communities. Self-disclosure, defined as the act of revealing personal information to others, is a crucial element underpinning this novel use of social media (Archer, 1980). It serves as a foundation for individuals to communicate aspects of their identity, feelings, behaviors, and life experiences. This is especially significant for those facing stigmatized conditions such as mental health disorders, where self-disclosure functions as a common coping strategy (Joinson, 2001).

Various motivations influence why people choose to disclose personal information. A well-established reason involves the need for “sympathetic others,” as theorized by Goffman (2009): individuals who share the same stigmatized identity, have undergone similar experiences, and who affirm the discloser’s sense of being human and fundamentally ‘normal’ despite external prejudices and personal doubts. On different social media platforms, the nature of these sympathetic audiences can vary. For example, on Reddit, dedicated support communities are often comprised of peers and experts who share similar mental health challenges. On Facebook, the audience is more likely to consist of social connections rooted in offline relationships. However, recent research has revealed a phenomenon called “broadcasting self-disclosures,” which describes the act of sharing personal and sensitive information on public platforms such as Twitter to an ill-defined and less tangible audience (Bazarova & Choi, 2014). Unlike tightly-knit online support groups, the audience on general social media platforms may be large, heterogeneous, and ‘invisible’—including individuals from diverse backgrounds, with varied interests, identities, and social media purposes. This audience also often consists of weak social ties, i.e., people the discloser may not know personally or encounter outside the platform (Kwak et al., 2010).

The idea of disclosing stigmatized and sensitive mental health information to such a broad and anonymous audience can initially seem perplexing. Yet, prior studies (De Choudhury et al., 2017; Ernala et al., 2017) demonstrate that this behavior is fairly widespread, suggesting that individuals derive specific social or psychological advantages from sharing with such an audience. This leads to important questions about the composition of these audiences, how they interact with disclosures concerning stigmatized mental illnesses, and how their engagement influences the discloser’s ongoing disclosure behaviors on a broadly accessible social media platform.

Motivated by these questions, this paper presents a quantitative study that characterizes the audience and their engagement with stigmatized self-disclosures on Twitter. We focus on schizophrenia, a mental health condition heavily stigmatized and associated with negative stereotypes, discrimination, and social exclusion (Dickerson et al., 2002). Specifically, our research investigates two core questions:
RQ1: What are the patterns through which social media audiences engage with individuals disclosing their schizophrenia diagnosis?
RQ2: How does audience engagement affect future disclosure practices? More precisely, can audience interaction predict the intimacy level of subsequent disclosures?

To answer these questions, we utilize machine learning methods on a clinically validated dataset to identify individuals (disclosers) who have publicly shared their schizophrenia diagnosis on Twitter. The audience is operationalized as users who interact with these disclosures via retweets, likes, or mentions. We then analyze how audience engagement varies over time and how it aligns with the content shared by disclosers (RQ1). Drawing upon Social Penetration Theory (Altman & Taylor, 1973), we measure intimacy in the disclosure content and apply time series forecasting models to assess whether audience engagement metrics predict changes in the intimacy of future disclosures (RQ2).

Our analysis reveals temporal and topical reciprocity in interactions between disclosers and their audiences, similar to patterns observed in online support groups. Key themes within audience engagement include mental health resources, stigma awareness, and expressions of emotional support. The forecasting models demonstrate that variables such as the number of mentions and emotionally supportive engagement strongly predict the intimacy trajectory of future disclosures. These findings contribute new insights into how audiences on public social media platforms can provide meaningful social support to individuals disclosing stigmatized mental health conditions.

 

Background & Related Work Theoretical Framework:

A key objective of this study is to examine the phenomenon of broadcasting self-disclosures as an interpersonal dynamic between those who disclose and their audience. The Social Penetration Theory offers a fitting theoretical lens for this investigation. Developed by Altman and Taylor (1973), this theory conceptualizes self-disclosure as an essential precursor and fundamental element in the formation and development of relationships between individuals. It defines self-disclosure as the process through which individuals reveal varying levels of personal information, ranging from superficial details to deeply intimate content, to others. These layers of disclosure, also described as degrees of social penetration, are characterized along two dimensions: breadth and depth. The dimension most relevant to our focus is depth, which denotes the degree of intimacy involved in the disclosed information. Depth reflects how much a person willingly reveals about private and personal aspects of their life that would typically remain hidden from public view.

Applying Social Penetration Theory to the context of broadcasting self-disclosures, such as in this study, involves understanding how relationships evolve between disclosers and their audiences through these varying degrees of disclosure. This requires investigating reciprocal behaviors between the parties involved, which informs our exploration of RQ1. Further, adapting the dimensions of breadth and depth to the social media environment necessitates analyzing the thematic content of the disclosures. Since this study specifically focuses on disclosures related to a singular topic — mental illness, and schizophrenia in particular — we concentrate on the depth (intimacy) dimension to model the disclosure process addressed in RQ2. Thus, the Social Penetration Theory framework guides our conceptual approach and underpins our analytical methods. An extensive body of research has explored self-disclosure within computer-mediated communication (Joinson, 2001). Insights from this literature, particularly those linking disclosure with trust and group identity (Joinson and Paine, 2007), as well as reducing uncertainty and stigma (Cozby, 1973; Derlega and Berg, 2013), have laid the foundation for contemporary studies on self-disclosure via social media. Enabled by features such as anonymity (De Choudhury and De, 2014; Andalibi et al., 2016) and social connectedness (Bazarova and Choi, 2014), social media platforms have become widely embraced spaces for self-disclosure. Researchers focusing on stigmatized identities and health conditions have increasingly sought to characterize and model self-disclosure behaviors on these platforms (Haimson and Hayes, 2017; Yang et al., 2017), while also examining platform-specific distinctions (De Choudhury et al., 2017). Complementing this quantitative work, several qualitative studies have investigated the motivations, goals, and challenges underlying online self-disclosure (Andalibi et al., 2017).

To date, most research has concentrated on environments where the audience in self-disclosure contexts are “sympathetic others,” as framed by Goffman (2009). However, the dynamics and effects of engaging with audiences on public, general-purpose social media platforms remain insufficiently explored. This study aims to address this gap by focusing on schizophrenia, one of the most heavily stigmatized mental illnesses. Given the nature of this condition, prior clinical research suggests that sufferers particularly benefit from intimate self-disclosure as a therapeutic mechanism (Shimkunas, 1972).
Another relevant research stream investigates online social capital and social support in relation to self-disclosure and overall well-being (Burke et al., 2010). Social capital refers to the resources accessible to an individual through their social network connections, facilitated by both bonding and bridging relationships (Coleman, 1988). Online social networks have been shown to foster the development and maintenance of these forms of social capital (Ellison et al., 2007). Within this context, the concept of “social support” emerges as particularly salient in studies of self-disclosure and stigma. Extensive literature documents the positive effects of social support derived from interpersonal relationships and social networks on health outcomes, psychological well-being, self-esteem, life satisfaction, and reciprocity (Helliwell and Putnam, 2004). Concerning stigmatized mental health experiences, both qualitative and quantitative research identifies social capital and social support as critical components driving self-disclosure goals and outcomes (De Choudhury and De, 2014; Zhang, 2017). Nonetheless, there remain important gaps in understanding how expectations of social support and social capital benefits manifest when the audience for self-disclosures is largely invisible, public, or composed mainly of weak social ties. Additionally, the role that audience members play—through social support provisioning and feedback mechanisms—in encouraging or inhibiting future disclosures of stigmatized conditions has yet to be thoroughly empirically examined. This study extends prior work by offering a comprehensive, data-driven analysis of the audience engaging with disclosures of schizophrenia on Twitter.

 

 

Data

As the initial stage of data acquisition, we accessed a clinician-validated Twitter dataset of schizophrenia self-disclosures originally compiled by Ernala et al. (2017). This dataset contains the public Twitter timelines—totaling 1,940,921 tweets—of 146 users who publicly disclosed their schizophrenia diagnosis for the first time during 2014. Seed queries used to locate these disclosures included first-person diagnostic statements such as “Diagnosed me with schizophrenia/psychosis.” To ensure data quality, we manually filtered out irrelevant or insincere content, including jokes and inappropriate remarks, with guidance from two psychiatrists specializing in schizophrenia care. Following this, we employed the supervised machine learning approach developed by Birnbaum et al. (2017). Their classifier, trained with clinical evaluations as ground truth, uses linguistic features such as n-grams and psycholinguistic markers from the LIWC dictionary (Pennebaker, Francis, and Booth, 2001) to identify genuine schizophrenia self-disclosures on Twitter, achieving an 88% AUC and 80% precision. By accessing the clinical appraisals and applying this classifier to a new sample of 600 Twitter users, we identified 433 users who had authentically disclosed their illness. Combined with the original 146 users, our expanded dataset comprises 579 Twitter users who self-disclosed schizophrenia. For the purposes of this study, we examine the audience engagement patterns surrounding these users’ Twitter content and how such engagement influences their future disclosure behavior. Consequently, our analysis focuses on the year following each user’s initial disclosure. Of the 579 users, 395 had a complete year of Twitter data available. Across this one-year period, these 395 disclosers posted a total of 1,491,623 tweets, averaging 3,776.26 tweets per user and approximately 17.48 tweets per day. A summary of these descriptive statistics is provided in Table 1.

Table 1: Descriptive statistics of disclosers and audience data

Metric Value
Number of disclosers 395
Total tweets by disclosers 1,491,623
Mean tweets per discloser 3,776.26
Mean tweets per day per discloser 17.48
Median tweets per discloser 1,338
Distinct retweet audience 124,630
Distinct favorites audience 169,041
Distinct mentions audience 80,090
Total audience 373,761
Mean distinct audience per discloser 1,218.4

 

Definitions

We now clarify key concepts related to schizophrenia self-disclosures and audience interactions on Twitter. A ‘discloser’ is defined as an individual who publicly reveals their schizophrenia diagnosis on Twitter on a specific day, denoted as day d (the disclosure date). The ‘audience’ consists of Twitter users who engage with the discloser’s posts through platform features such as retweets, favorites (likes), and mentions, during the one-year period following day d. We define ‘audience engagement’ as any such interaction between an audience member and a discloser. The primary indicators of audience engagement considered here are retweets, mentions, and favorites.

Audience and Audience Engagement Data
To collect data on audience engagement, we gathered information on retweets, favorites, and mentions related to the disclosers’ tweets.

Retweets Data.
We identified users who engaged with the disclosers’ tweets by retweeting their content within the year following the disclosure. For each tweet posted by the disclosers in this timeframe, we first recorded the number of retweets it received. Using the official Twitter API, we retrieved the list of unique users who retweeted the tweet. Applying this procedure to all tweets from the 395 disclosers resulted in 124,630 distinct users (the retweet audience) who retweeted disclosers’ content a total of 2,895,118 times.

Favorites Data.
Similarly, we collected data on users who interacted with disclosers’ tweets by favoriting (liking) them during the year after disclosure. For each tweet, we noted the number of favorites it received. We then parsed the JSON objects corresponding to the HTML pop-ups displaying users who favorited each tweet. This process yielded a set of 169,041 unique users (the favorites audience) who liked the disclosers’ content a total of 4,592,890 times.

Mentions Data
For mentions, we gathered data on Twitter users who engaged with disclosers via the mentions (or @-replies) feature. On Twitter, when a user replies to another user—say with username B—the reply tweet automatically includes ‘@B’. Leveraging this convention, we created a list of search queries by prefixing the ‘@’ symbol to each discloser’s username. This enabled us to collect all incoming mention tweets directed to the disclosers, capturing the users who mentioned them along with the textual content of those mentions. Our final mentions dataset comprised 80,090 unique users (the mentions audience) who mentioned the disclosers across 348,456 tweets.

Audience Data
To form the comprehensive audience dataset, we combined the unique users from the three engagement categories: 124,630 from retweets, 169,041 from favorites, and 80,090 from mentions, resulting in a total of 373,761 distinct audience members. On average, each discloser had an audience size of 1,218.4. Figure 1 illustrates the distribution of audience sizes, and Table 1 presents the descriptive statistics.

Figure 1: (a) Distribution of #disclosers over #tweets. (b)Distribution of #disclosers over #distinct audience.

 

RQ1: Characterizing Audience Engagement

Methods
To address RQ1, we aim to characterize audience engagement around disclosers’ Twitter activity by focusing on two aspects: engagement content and engagement markers.

Thematic Representation of Disclosers’ Data
First, we developed a thematic representation of the tweets shared by disclosers over the year following their disclosure date, d. This framework helps us explore the dynamic interaction between disclosers and their audience based on shared content.

We applied topic modeling on the timelines of all 395 disclosers. After preprocessing—removing URLs and stopwords—we used Latent Dirichlet Allocation (LDA) implemented via MALLET (Machine Learning for Language Toolkit). We optimized hyperparameters and extracted 30 topics. For each tweet, we calculated the posterior probability distribution over these topics.

Next, two human raters, experts in social media and mental health, collaboratively performed semi-open coding on the extracted topics to assign semantically meaningful labels. Drawing on their experience with schizophrenia self-disclosures, they examined the top keywords associated with each topic to generate topical descriptors. They then grouped the 30 topics into broader themes, annotating whether each theme pertained to schizophrenia diagnosis and experiences. To analyze temporal trends, we computed z-scores of the average daily probabilities for each theme across all disclosers, enabling us to observe relative fluctuations over time.

Characterizing Engagement Content
Using the same topic modeling and qualitative annotation approach, we characterized the content of the audience’s engagement—specifically, the mention tweets. We built an LDA model with 30 topics based on the text of mention tweets, then determined the topic distribution for each mention. The same two raters analyzed the top keywords per topic to assign interpretable labels and annotate relevance to schizophrenia disclosures. Again, we used z-scores of average daily theme probabilities to capture changes in engagement content themes over time.

Characterizing Engagement Markers
To characterize engagement markers, we analyzed daily counts of retweets, favorites, and mentions received by each discloser throughout the year following their disclosure. For each day ddd (where d=0d = 0d=0 to d=365d = 365d=365), we computed the average number of retweets, favorites, and mentions across all disclosers and converted these averages into z-scores. This normalization facilitated the examination of temporal variation in engagement markers and allowed for relative comparisons. The result was three time series representing retweets, favorites, and mentions.

Discovering Patterns of Audience Engagement
To investigate how engagement markers and content vary in relation to the disclosers’ thematic activity, we classified themes into two groups: those related to schizophrenia diagnosis and experiences, and those unrelated. Using these theme categories, we applied time series analysis methods, such as cross-correlation, to compare z-score distributions of engagement markers and engagement content themes against the themes present in disclosers’ tweets over time. This approach helped uncover patterns and temporal relationships between audience engagement and discloser content.

 

 

Results

Comparing Disclosers’ Themes and Audience’s Themes

We now present findings from the thematic annotation of the audience’s engagement content, interpreted alongside themes identified in the disclosers’ posts. This comparison provides insights into how audience responses align with the content shared by individuals disclosing schizophrenia-related experiences on Twitter.
One of the key overlapping themes between disclosers and their audience is Mental Health Support/Stigma. This theme appears in both datasets and reflects discussions around mental health care and the stigma surrounding conditions like schizophrenia. In the audience’s tweets, we observed terms like ‘hcsmca’, ‘pndhour’, ‘awareness’, and ‘issue’, which point to participation in online communities focused on health care and mental illness awareness. Shared terms such as ‘depression’, ‘anxiety’, ‘meds’, ‘mental health’, and ‘pain’ emphasize the solidarity expressed by the audience. This engagement indicates that audiences often respond to disclosers by offering personal experiences and support resources, helping normalize and validate the mental health challenges discussed.
Another significant theme common to both groups is Functioning, encompassing aspects of everyday life such as ‘people’, ‘life’, ‘work’, ‘money’, ‘love’, and ‘sleep’. This overlap suggests a shared focus on social and cognitive functioning in the context of living with schizophrenia. The theme Appearance also emerges in both datasets, with words like ‘hair’, ‘wear’, ‘clothes’, ‘arms’, and ‘face’—pointing to expressions around personal presentation and identity. Together, these themes suggest that both disclosers and their audience engage in conversations about daily routines and experiences, which may help foster a sense of connection and mutual understanding.
The Emotions theme features prominently in both datasets, but more so in the audience’s content. Common words include ‘love’, ‘happy’, ‘good’, ‘hope’, and ‘beautiful’. The higher prevalence of emotional expression in audience responses indicates a tendency to offer emotional support. This asymmetry aligns with previous literature suggesting that such supportive responses contribute positively to mental health outcomes and can enhance therapeutic benefits for individuals disclosing stigmatized experiences (De Choudhury et al. 2014).
Themes around Sexuality—containing words like ‘girl’, ‘man’, ‘guy’, ‘he’s’, ‘she’s’, ‘cute’, ‘sex’, ‘fuck’—are observed in both disclosers’ and audience content. This suggests that disclosures often prompt audiences to engage with deeply personal topics, including those related to sexual identity and relationships. These discussions may reflect a mutual openness that emerges through the process of stigma-related disclosure.

Despite the thematic reciprocity identified between disclosers and their audiences, a notable divergence becomes apparent when examining the theme labeled “Symptoms.” Among disclosers, this theme is characterized by references such as “r/paranormal,” “r/creepy,” and “ufo,” which reflect language patterns symptomatic of schizophrenia. However, these expressions are largely absent from the themes derived from the audience’s engagement content. This contrast indicates that while disclosers are openly sharing deeply personal and symptomatic experiences of their illness, the audience generally does not reciprocate with similar disclosures. This highlights a fundamental difference between broadcasting personal content on platforms like Twitter and participating in dedicated support communities. On Twitter, the audience is not necessarily composed of individuals with shared experiences, and thus their responses often lack mutual vulnerability or personal reflection.

Nevertheless, juxtaposing the thematic annotations of disclosers and their audience engagement content reveals instances of alignment around shared topics related to schizophrenia. These patterns resonate with Social Penetration Theory, which emphasizes that self-disclosure is nurtured through a gradual and reciprocal exploration of personal experiences between interacting parties. This mutual thematic resonance, albeit limited in depth, reflects a form of relational engagement wherein audience members participate in discourse prompted by the disclosers’ narratives.

Turning attention to the temporal dynamics of this engagement, we explore how themes associated with schizophrenia and other content co-vary over time between disclosers and audiences. Analysis of the time series, as visualized in Figure 2(a–d), reveals a close temporal alignment in schizophrenia-related themes. Cross-correlation results demonstrate that the highest correlation, a value of 0.125, occurs at a lag of minus four days. This positive correlation at a negative lag suggests that as disclosers increasingly discuss their schizophrenia experiences, audience engagement around similar themes tends to follow within a few days. Such a pattern reflects a reciprocal dynamic in the disclosure process, aligning with prior literature that highlights reciprocity as a fundamental principle in self-disclosure. Conversely, when disclosers focus on schizophrenia-related content, there is a concurrent decline in audience engagement around unrelated themes, shown by a maximum negative correlation of -0.125, also at a lag of minus four days. This pattern suggests that the audience may intentionally steer away from other topics as a way of focusing their attention or adjusting their response to the disclosers’ disclosures.

Figure 2: Temporal alignment in schizophrenia-related themes

 

Further insights emerge when examining engagement markers—retweets, favorites, and mentions—to assess how the audience uses Twitter’s functionalities in response to self-disclosures. The temporal distribution of these markers, as shown in Figure 3a, indicates that the day of disclosure marks a peak in mentions, suggesting an immediate influx of direct audience responses.

Figure 3: (a) Engagement markers over time. (b) Intimacy of disclosure, across all 395 disclosers’ data over time. (c) Predicted and original measures of intimacy over time.

 

However, retweets and favorites show subdued activity in this same period. This discrepancy may indicate audience hesitation or a lack of clarity on how to engage with sensitive content through endorsement-based actions like retweets and likes. Compared to mentions, which require the user to consciously draft a response, retweets and favorites are lower-effort actions that may feel inappropriate in the context of personal or stigmatized disclosures.

An analysis of the relationships between engagement markers and the themes of disclosers’ content, shown in Figure 2(e–j), reveals stronger alignment with schizophrenia-related themes than with unrelated content. When disclosers discuss schizophrenia, the maximum correlations with retweets and favorites are -0.09 and -0.08, respectively, both at a positive lag of five days. These negative correlations suggest that increasing disclosures about schizophrenia are followed by a decline in passive forms of audience engagement. This could reflect the audience’s reluctance to publicly amplify stigmatized content or signal agreement in ways that might be socially misinterpreted. In contrast, mentions display a stronger and immediate alignment with schizophrenia-related disclosures, with the highest positive correlation of 0.17 occurring at lag zero. This real-time engagement suggests that mentions are perceived as a more appropriate or empathetic form of response to personal disclosures. When analyzing themes unrelated to schizophrenia, mentions from the audience show a delayed response, with a peak correlation of 0.14 at a lag of minus seven days, further underscoring that the audience is more immediately responsive to personal and health-related content than to general discourse.

 

 

RQ2: How Audience Engagement Predicts Future Intimacy of Disclosures

To investigate our second research question—whether audience engagement, as characterized by engagement markers and content, can predict the future intimacy of disclosures—we begin by detailing how we operationalize the intimacy of disclosures and proceed to evaluate a time series forecasting model designed to predict this measure using prior audience engagement behavior.

Drawing on Social Penetration Theory, which conceptualizes self-disclosure as a process of deepening intimacy in interpersonal relationships, we operationalize the intimacy of disclosure through the concept of “depth of disclosure.” In the context of mental health-related self-disclosures on Twitter, this depth reflects the extent to which individuals share detailed and personal experiences pertaining to their stigmatized condition, such as schizophrenia. Since no standardized ground truth for measuring disclosure intimacy exists—and given that discrete human ratings of individual tweets may lack generalizability across users—we employ a hybrid approach that combines topic modeling with human annotation to assess the intimacy conveyed through tweet content.

First, we apply topic modeling to the disclosers’ tweets to obtain thematic representations of their content, following the procedure established in our previous analyses. Using the most representative keywords per topic, three human annotators evaluated and labeled the level of intimacy associated with each topic. The intimacy levels were defined on a three-point Likert scale—low (1), medium (2), and high (3)—in line with prior work on disclosure depth (Taylor and Altman, 1975). To ensure consistency, annotators initially reviewed a sample of disclosers’ tweets to develop a shared understanding of the content space. They subsequently formulated annotation rules corresponding to each level of intimacy.

Topics rated as having high intimacy (score of 3) typically included deeply personal expressions related to the experience of schizophrenia. These involved symptomatic narratives, references to social support, and discussions around stigma—content that is rarely disclosed on public platforms such as Twitter. Medium intimacy topics (score of 2) reflected more general behavioral expressions and daily functioning activities, such as managing time, navigating social relationships, or planning routines. These themes were personal but did not directly reference the disclosers’ mental health conditions and were more common in general social media discourse. Low intimacy topics (score of 1) were entirely unrelated to mental health disclosures and consisted of casual or topical content, including discussions of entertainment, politics, or other non-personal matters.

Through this operationalization, we create a time series representation of disclosure intimacy for each discloser, enabling us to model the temporal evolution of their disclosure behavior. We then examine whether features derived from audience engagement—both in terms of markers such as retweets, favorites, and mentions, and the topical alignment of audience responses—can serve as predictors for future shifts in the depth of disclosure. This modeling approach allows us to assess the influence of prior audience feedback on the trajectory of personal and intimate disclosures over time, providing insight into the reciprocal dynamics between disclosers and their online audiences.

Following the manual annotation task conducted on the topic model outputs, we observed a high degree of inter-rater reliability among the annotators. The Fleiss’ kappa coefficient was computed to be 0.78, indicating substantial agreement among raters in assigning intimacy scores to topics. From the pool of 30 topics generated from disclosers’ data, the annotation process resulted in the classification of 8 topics as possessing high intimacy (assigned a score of 3), 7 topics as medium intimacy (score of 2), and 15 topics as low intimacy (score of 1). This annotated topic intimacy framework formed the basis for calculating both tweet-level intimacy scores and time series trends across the study period.

  1. Deriving Tweet-Level and Temporal Intimacy Metrics

To estimate the intimacy level expressed in individual tweets authored by the disclosers, we integrated the topic distributions produced by the previously developed topic model (see RQ1) with the corresponding intimacy labels generated during the annotation phase (see RQ2). Specifically, we calculated a single continuous intimacy score for each tweet by taking the weighted sum of the topic probabilities for that tweet, where the weights were the intimacy labels (1, 2, or 3) assigned to each corresponding topic. This method effectively represents each tweet’s intimacy level based on the distributional influence of its underlying topics and their intimacy ratings.

Once tweet-level intimacy scores were derived, we aggregated these scores at a daily resolution for each discloser across the entire analysis timeline. To control for inter-individual differences in overall tweeting behavior and to allow comparability over time, we standardized the daily aggregated intimacy scores using z-scores. These standardized scores thus reflected the relative deviations in intimacy expression per day, per discloser, enabling longitudinal tracking of disclosure intensity in a normalized framework.

III. Forecasting Intimacy of Disclosure from Audience Engagement

In the next phase, we explored whether audience engagement metrics could predict future changes in intimacy of disclosure. We conceptualized this task as a time series forecasting problem, wherein the main response variable was the daily time series of intimacy scores derived previously. Given the temporal nature of both the predictor and response variables, we incorporated an auto-regressive component to account for the influence of historical intimacy levels on future disclosures.

The exogenous predictors in our model comprised several time series representing audience engagement behaviors. These included the number of retweets, favorites, and mentions received by disclosers each day, as well as the thematic distribution of content from audience replies or mentions. Each of these variables was transformed into a z-score time series to ensure consistency in scale and interpretation across disclosers and over time.

  1. Data Preparation and Stationarity Checks

Before applying time series models, it was critical to assess and address the assumption of stationarity—a prerequisite for many forecasting techniques. We performed a sequence of preprocessing steps to this end. Initially, we applied a 14-day moving average smoothing to all time series to visually inspect potential non-stationary behavior, such as trends or seasonal shifts in mean and variance.

To statistically validate stationarity, we employed the Augmented Dickey-Fuller (ADF) test (Dickey and Fuller, 1981) on all variables. The original intimacy of disclosure series exhibited slight fluctuations but failed the ADF test, yielding a test statistic of t = -2.68 with a p-value of 0.07. This indicated the presence of a unit root and hence, non-stationarity. Consequently, we applied a first-order differencing transformation (i.e., calculating Yₜ – Yₜ₋₁ for each observation in the series), after which the series successfully passed the ADF test (t = -9.17, p = 2 × 10⁻¹⁵), confirming that the differenced series was stationary.

A similar process was followed for all exogenous variables. While most audience engagement marker time series (retweets, favorites, and mentions) were stationary in their raw form, several thematic content time series failed the ADF test. Specifically, engagement themes related to “Mental Health Support/Stigma,” “Sexuality,” “Communication,” and “Temporal References” were non-stationary initially and were thus differenced to achieve stationarity.

  1. ARIMAX Model Configuration and Training

Given the confirmed stationarity of all series post-differencing, we employed an AutoRegressive Integrated Moving Average with eXogenous variables (ARIMAX) model to forecast daily intimacy of disclosure. This model allows for the incorporation of both internal temporal dependencies (via autoregressive and moving average terms) and external influences (via audience engagement variables).

We conducted a grid search over various lag parameters to identify the best-fitting ARIMAX configuration. The search spanned autoregressive terms (p) and moving average terms (q) with a maximum lag of 20 days. The differencing order (d) was set to 1 based on earlier transformations. Model selection was based on a combination of log-likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The optimal model was identified as ARIMAX(8,1,3), with p = 8, d = 1, and q = 3. This model yielded a log-likelihood of -351.9, AIC of 751.9, and BIC of 845.0, indicating a strong balance between model complexity and predictive accuracy.

  1. Interpreting Predictor Contributions

To interpret the contribution of exogenous variables in the final ARIMAX model, we analyzed point estimates, 95% confidence intervals, and associated p-values (refer to Table 3). These estimates offer insights into which forms of audience engagement significantly influence future disclosure intimacy.

Among engagement markers, the number of mentions was found to be a significant predictor of future intimacy levels. This finding underscores the importance of interactive audience behaviors, such as replies and direct conversations, which likely create a feedback loop encouraging disclosers to continue revealing personal experiences. Mentions represent high-engagement touchpoints that facilitate information exchange, emotional reinforcement, and mutual storytelling.

Among the thematic content of audience engagement, two themes emerged as statistically significant predictors: ‘Emotions’ and ‘Sexuality’. The ‘Emotions’ theme, characterized by vocabulary such as ‘care’, ‘worry’, ‘trust’, and ‘life’, appears to reinforce disclosure behaviors. Prior research (e.g., Vlahovic et al., 2014) has demonstrated that emotional support is a core pillar of satisfaction in online communities, especially for individuals managing stigmatized conditions. Our findings build upon this by showing that emotional responses not only reflect audience empathy but also shape future intimacy patterns.

The theme of ‘Sexuality’ was also significantly predictive of increased future intimacy. Given the inherently sensitive and personal nature of sexuality, its presence in audience interactions may suggest deeper, more reciprocal engagement. When such topics surface publicly on social media, they could signal a mutual willingness to explore vulnerable aspects of identity and experience. This form of reciprocity might act as a catalyst, motivating disclosers to share more intimate facets of their illness narratives.

VII. Model Validation and Evaluation

To assess model performance, we implemented in-sample rolling forecasts on the final 30 days of the data, serving as a pseudo out-of-sample evaluation period. Since the ARIMAX model predicts the differenced series, comparisons were made against actual differenced intimacy values (see Figure 3c). The predicted trajectory closely aligned with observed values, suggesting the model’s reliability.

Quantitative performance metrics further substantiated the model’s effectiveness. The Root Mean Squared Error (RMSE) was 0.66, the Mean Absolute Error (MAE) was 0.52, and the Symmetric Mean Absolute Percentage Error (SMAPE) was 6.8. These error rates fall within acceptable bounds for social behavioral forecasting and confirm the model’s practical utility.

As a final diagnostic, we tested the model residuals for serial correlation using the Durbin-Watson test. The resulting test statistic was 1.8—close to the ideal value of 2, which indicates the absence of autocorrelation in the residuals. This further validates the robustness and reliability of the forecasting model.

 

Discussion

Theoretical Implications

This study began with a fundamental inquiry into the perplexing phenomenon of stigmatized self-disclosures made publicly to an invisible audience on microblogging platforms like Twitter. By systematically characterizing audience engagement in response to disclosures related to schizophrenia, we uncovered compelling evidence of reciprocity both in terms of topical content and temporal dynamics within the interactions between disclosers and their audience. Our findings reveal that the audience leverages platform functionalities—such as favorites, retweets, and mentions—to engage with disclosers through diverse mechanisms. These interactions encompass providing emotional support, offering practical advice and solidarity, sharing personal experiences and online mental health resources, and discussing everyday aspects of life that extend beyond the illness itself. While these attributes are typically considered hallmark features of tightly-knit online support communities, their emergence on Twitter is particularly noteworthy, given that Twitter inherently lacks many critical community-building elements such as established norms, moderation, and explicit role definitions. Moreover, strong social ties are conventionally regarded as essential for delivering quality support and enhancing psychological well-being (Burke, Marlow, and Lento 2010). Yet, despite Twitter’s relatively loose social network structure (Kwak et al. 2010), our results suggest that it nonetheless fosters meaningful social benefits for individuals managing a highly stigmatized condition like schizophrenia. This challenges traditional assumptions about the prerequisites for effective online social support and underscores the platform’s unique capacity to facilitate positive psychosocial outcomes even in the absence of formal community architecture. Building on this foundation, we examined the extent to which audience engagement influences subsequent disclosure behavior, aiming to understand if disclosers derive interpersonal and social benefits through this public process. Our forecasting model’s results demonstrate that specific audience engagement metrics—including the number of mentions, emotional support, and discussions on sensitive, personal topics—serve as significant predictors of the future intimacy of disclosures. This suggests that disclosure behaviors are not static but evolve in response to audience feedback. These findings align with and extend the conceptualization of social capital in online contexts. The disclosure process appears to support not only bridging social capital—whereby disclosers connect with new acquaintances who provide access to novel information and resources—but also gradually fosters bonding social capital, characterized by reciprocity, mutual support, and companionship (Ellison, Steinfield, and Lampe 2007). Although the precise identities of audience members remain ambiguous—disclosers may not know who comprises their audience but often maintain imagined conceptions of them (Gruzd, Wellman, and Takhteyev 2011)—the consistent reciprocal engagement observed over time validates prior observations about online social platforms facilitating the formation and maintenance of social capital, even in contexts marked by invisibility and anonymity. Nevertheless, extant literature warns of the risks associated with disclosing stigmatized, sensitive issues to invisible and imagined audiences, particularly the phenomenon of context collapse, where diverse social spheres converge unpredictably, potentially jeopardizing the discloser’s self-presentation and willingness to share (Steinfield, Ellison, and Lampe 2008). Contrary to expectations that such risks would induce caution or strategic reticence, our analysis reveals that disclosers persistently engage in intimate, schizophrenia-related exchanges without apparent adoption of counteractive strategies. This continued openness suggests a resilience and commitment to disclosure despite inherent risks, underscoring the complex calculus disclosers perform when managing identity and vulnerability in public digital spaces.

 

Practical Implications

In recent years, technology-driven therapeutic interventions, such as the peer-support platform 7 Cups of Tea (7cups.com) and the Crisis Text Line (crisistextline.org), have gained significant traction. These services provide individuals in distress with opportunities to converse confidentially with trained volunteers or supportive listeners. Additionally, artificial intelligence (AI) conversational agents designed for mental health support are increasingly deployed, offering scalable and accessible assistance. While these platforms hold promise, they also pose unique challenges regarding the accommodation of stigmatized self-disclosures and the facilitation of their expected social benefits. The methodological framework developed in this study for analyzing audience engagement toward mental health disclosures offers a principled approach to understanding the social interactions that underpin these exchanges. This framework can be leveraged to optimize technology-assisted therapeutic tools by informing the design of volunteer training programs and AI conversational agents. A critical component in enhancing these interventions lies in providing volunteers or AI agents with timely, actionable feedback on the quality and nature of their engagement with help seekers. Our forecasting methodology (RQ2) enables the capture of such feedback by identifying how different engagement behaviors impact future disclosure intimacy. Interactive systems could thus be developed to allow volunteers and AI agents to adapt their conversational strategies dynamically in response to the evolving needs and disclosure patterns of help seekers. Moreover, the audience engagement patterns identified in our study (RQ1)—particularly markers signaling reciprocity—can be incorporated into training guidelines or embedded algorithmically within conversational agents. Promoting these positive engagement markers can help sustain meaningful interactions that encourage disclosure and foster social support. Finally, moderation efforts in online support communities and broader social media platforms could adopt similar methodological approaches to encourage audience behaviors that nurture supportive and therapeutic environments. By motivating audiences to engage thoughtfully with vulnerable, self-disclosing individuals, these efforts can contribute to creating safer and more beneficial online spaces for mental health discourse.

 

Limitations and Future Work

While our findings contribute valuable insights, we acknowledge several limitations that suggest promising directions for future research. First, our analyses are constrained by the scope and nature of our data acquisition methods. Specifically, we do not examine the characteristics of the audience itself, including their social media usage patterns or motivations, which could provide deeper context on the nature of reciprocity observed. Understanding the audience’s profile remains an open area for investigation. Second, we recognize that disclosers may have diverse and complex goals beyond seeking social benefits, such as cultivating trust, managing impressions, or obtaining social validation. Our study focused primarily on uncovering evidence for generalized social benefits associated with disclosure to an invisible audience. Future research should endeavor to disentangle these overlapping motivations by examining how different disclosure goals correlate with patterns of audience engagement and intimacy. Third, the social benefits identified—such as reciprocity and support—require further empirical validation using self-reported data, particularly to assess their impact on psychological outcomes for disclosers. Qualitative approaches, including in-depth interviews and ethnographic studies, would be especially valuable in complementing the quantitative analyses presented here, providing richer, contextualized understanding of the disclosure experience. Finally, our operationalization of intimacy focused primarily on the role of active and incoming audience engagement. However, the impact of non-responsive or non-supportive audiences on subsequent disclosure behaviors remains unexplored. Future work could investigate how silence, neglect, or negative feedback influence disclosers’ willingness to continue sharing sensitive information, further elucidating the complex interplay between audience reactions and self-disclosure dynamics.

 

Conclusion

This study contributes some of the earliest empirical insights into the dynamics of audience engagement elicited by public self-disclosures of schizophrenia on Twitter. By systematically characterizing how audiences respond to disclosers’ content, we uncover clear evidence of reciprocal interactions that extend beyond mere passive reception. Specifically, our analysis reveals that audience engagement is multifaceted, encompassing not only quantitative markers such as mentions but also qualitative aspects including emotional support and conversations relating to personal life topics. Importantly, our time series forecasting model highlights that these components of audience engagement are significant predictors of future disclosure behaviors by individuals living with schizophrenia. This finding underscores the active and iterative nature of disclosure processes, where social feedback loops shape the evolution of intimacy over time. Through these reciprocal engagements, disclosers appear to derive meaningful social benefits that contribute to their psychosocial well-being. Our work thus deepens the understanding of how online platforms like Twitter, despite their inherent limitations and lack of formal community structures, can function as vital spaces for support and connection for individuals confronting highly stigmatized mental health conditions. The implications extend beyond academic inquiry, offering valuable guidance for the design and improvement of technology-mediated support environments on the internet. By illuminating the patterns and predictors of beneficial engagement, this research can inform the development of more responsive, empathetic, and effective digital mental health interventions.

 

Acknowledgments

We would like to express our sincere gratitude to the members of the SocWeB Lab for their constructive feedback and insightful discussions during the preparation of this manuscript. Their expertise and encouragement significantly enhanced the quality of our work.

We also acknowledge the financial support provided by the National Institutes of Health under grant number R01GM112697, which facilitated this research. Special thanks to the collaborative efforts of Ernala and De Choudhury, whose contributions were instrumental in bringing this study to fruition.

 

References

Altman, I., and Taylor, D. 1973. Social penetration theory. New York: Holt, Rinehart &\Mnston.

Andalibi, N.; Haimson, O. L.; De Choudhury, M.; and Forte, A. 2016. Understanding social media disclosures of sexual abuse through the lenses of support seeking and anonymity. In CHI.

Andalibi, N.; Ozturk, P.; and Forte, A. 2017. Sensitive self- disclosures, responses, and social support on instagram: the case of #depression. In CSCW.

Archer, R. L. 1980. Self-disclosure. The self in social psychology.

Bazarova, N. N., and Choi, Y. H. 2014. Self-disclosure in so- cial media: Extending the functional approach to disclosure moti- vations and characteristics on social network sites. JCM.

Birnbaum, M. L.; Ernala, S. K.; Rizvi, A. F.; De Choudhury, M.; and Kane, J. M. 2017. A collaborative approach to identifying so- cial media markers of schizophrenia by employing machine learn- ing and clinical appraisals. J. Med. Internet Res.

Burke, M.; Marlow, C.; and Lento, T. 2010. Social network activity and social well-being. In CHI.

Chancellor, S.; Lin, Z.; Goodman, E. L.; Zerwas, S.; and De Choudhury, M. 2016. Quantifying and predicting mental illness severity in online pro-eating disorder communities. In CSCW.

Coleman, J. S. 1988. Social capital in the creation of human capital. Cozby, P. 1973. Self-disclosure: a literature review. Psychol. Bull.

De Choudhury, M., and De, S. 2014. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. In ICWSM.

De Choudhury, M.; Counts, S.; Horvitz, E. J.; and Hoff, A. 2014. Characterizing and predicting postpartum depression from shared facebook data. In CSCW.

De Choudhury, M.; Sharma, S. S.; Logar, T.; Eekhout, W.; and Nielsen, R. C. 2017. Gender and cross-cultural differences in social media disclosures of mental illness. In CSCW.

Derlaga, V. J., and Berg, J. H. 2013. Self-disclosure: Theory, re- search, and therapy.

Dickerson, F. B.; Sommerville, J.; Origoni, A. E.; Ringel, N. B.; and Parente, F. 2002. Experiences of stigma among outpatients with schizophrenia. Schizophrenia bulletin.

Dickey, D. A., and Fuller, W. A. 1981. Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica.

Durbin, J. 1970. Testing for serial correlation in least-squares re- gression when some of the regressors are lagged dependent vari- ables. Econometrica.

Ellison, N. B.; Steinfield, C.; and Lampe, C. 2007. The benefits of facebook “friends:” social capital and college students use of online social network sites. JCMC.

Ernala, S. K.; Rizvi, A. F.; Birnbaum, M. L.; Kane, J. M.; and De Choudhury, M. 2017. Linguistic markers indicating therapeutic outcomes of social media disclosures of schizophrenia. Proc. ACM Hum.-Comput.Interact.

Goffman, E. 2009. Stigma: Notes on the management of spoiled identity.

Gruzd, A.; Wellman, B.; and Takhteyev, Y. 2011. Imagining twitter as an imagined community. Am. Behav. Sci.

Haimson, O. L., and Hayes, G. R. 2017. Changes in social media affect, disclosure, and sociality for a sample of transgender ameri- cans in 2016’s political climate. In ICWSM.

Helliwell, J. F., and Putnam, R. D. 2004. The social context of well-being. Philos. Trans. Royal Soc. B.

Hobfoll, S. E.; Nadler, A.; and Leiberman, J. 1986. Satisfaction with social support during crisis: intimacy and self-esteem as criti- cal determinants. J. Pers. Soc. Psychol.

Jiang, L. C.; Bazarova, N. N.; and Hancock, J. T. 2013. From per- ception to behavior: Disclosure reciprocity and the intensification of intimacy in computer-mediated communication. Commun. Res.

Joinson, A. N., and Paine, C. B. 2007. Self-disclosure, privacy and the internet. The Oxford handbook of Internet psychology 237–252.

Joinson, A. N. 2001. Self-disclosure in computer-mediated com- munication: The role of self-awareness and visual anonymity.

Kwak, H.; Lee, C.; Park, H.; and Moon, S. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, 591–600. ACM.

Pennebaker, J. W.; Francis, M. E.; and Booth, R. J. 2001. Linguistic inquiry and word count: Liwc 2001.

Shimkunas, A. M. 1972. Demand for intimate self-disclosure and pathological verbalization in schizophrenia. J. Abnorm. Psychol.

Sprecher, S.; Treger, S.; Wondra, J. D.; Hilaire, N.; and Wallpe, K. 2013. Taking turns: Reciprocal self-disclosure promotes liking in initial interactions. J. Exp. Soc. Psychol.

Steinfield, C.; Ellison, N. B.; and Lampe, C. 2008. Social capital, self-esteem, and use of online social network sites: A longitudinal analysis. Journal of Applied Developmental Psychology.

Taylor, D. A., and Altman, I. 1975. Self-disclosure as a function of reward-cost outcomes. Sociometry.

Vlahovic, T. A.; Wang, Y.-C.; Kraut, R. E.; and Levine, J. M. 2014. Support matching and satisfaction in an online breast cancer sup- port community. In CHI.

Yang, D.; Yao, Z.; and Kraut, R. E. 2017. Self-disclosure and channel difference in online health support groups. In ICWSM.

Zhang, R. 2017. The stress-buffering effect of self-disclosure on facebook: An examination of stressful life events, social support, and mental health among college students. Comput. Hum. Behav.

Facebook
LinkedIn
WhatsApp
Threads
Email
Telegram
X