LetterHead
.png)
Deep Dive #1: What Goes Into A Letter
Here at the LetterHead Project, we write open letters to organizations and companies to bring attention to mental health. Our letters report findings from our machine learning analysis of data (typically from surveys on mental health) and suggest actionable steps to address the issues we uncover.
Today, in the first Deep Dive of the LetterHead Project, we’re looking under the hood to see what goes into one of our open letters.
From the beginning, Jack and I have always grounded our projects in three ideas. First, whether you look at it from a social, emotional, medical, or economic lens, addressing mental health elevates everyone and everything—an improvement in mental health would mark an achievement comparable only to the Neolithic, Scientific, and Industrial Revolutions. Second, data science, and specifically machine learning, is a tool uniquely capable of quantitatively pinpointing relationships and interactions between characteristics of individuals, their organizations/companies, and individuals’ experience with mental health. Third, science above all must be communicated and translated to action; it cannot exist within a vacuum.
Stemming from these ideas is the next step: identifying a sector where LetterHead can help. We make this decision based on researching different sectors (e.g., tech companies) and gathering information from local sources when possible.
Once a sector is identified, the next step is data collection, arguably the critical step in any scientific study. The advent of big data has made an uncountable number of datasets and surveys publicly accessible, but this also means that there is an abundance of datasets that potentially hold valuable information yet are never analyzed. Our first effort, An Open Letter to Tech Companies, used data from a publicly available survey on the mental health of tech employees, while our upcoming projects will collect original survey data in collaboration with institutions such as hospitals. Particularly important is the design of survey questions, as we want to ensure we are collecting all the relevant features that may affect mental health and thus should be included in predictive modeling.
Of course, datasets are rarely “clean” (i.e., free of errors, typos, or formatting issues), so we invest significant time into preprocessing data to ensure the reliability of our results. This usually involves making sure all variables are encoded in a standardized way (for example, “female” for gender could be referred to as “F,” “f,” “Female,” or “female,” so we’d take all possibilities along with misspellings and standardize their values). We also stratify certain continuous variables into discrete categories (e.g., age groups).
After processing, the next step is to analyze data, which we accomplish with univariate statistics and machine learning. We’ll be dedicating a future post to the technicalities of our analyses and we’ll also write about why we choose machine learning as opposed to other statistical methods. In short, we employ machine learning classifiers as models that reveal which features (i.e., characteristics of an individual or their employee) are most important for predicting issues related to mental health, while also focusing on interpretability.
Then, we take the findings from our analysis and identify “themes”. An example of a theme we uncovered when studying the mental health of tech employees was the impact of stigma and negative views surrounding mental health in deterring employees from seeking help for mental health. These data-driven themes, coupled with extensive research on the sector we are studying, health care, company policy, and other factors, lead us to design actionable steps for organizations and companies within the sector. From there, we aim to work with local organizations and companies to implement our designed solutions to catalyze positive change in how mental health is addressed.
Please reach out to team@letterheadproject.org to get into contact, discuss a project idea, or ask any questions. We look forward to sharing more letters and updates in the coming year.