Keith Harrigian is a PhD Student in Computer Science at Johns Hopkins University. With his advisor Mark Dredze, Keith researches computational and statistical methods for modeling natural language. He is particularly interested in developing robust and stable machine learning models for application in the healthcare domain.

Prior to joining Johns Hopkins, Keith was a Senior Quantitative Analyst at Warner Media where he leveraged machine learning and natural language processing to mine social media data for applications to film and television marketing. He holds a M.S.E in Computer Science from Johns Hopkins University and a B.S. in Mathematics with minors in Physics and Music from Northeastern University. You can learn more about his past and ongoing work here.

Outside of his academic research, Keith leads data science and machine learning efforts at Unforged. If you’re interested in contributing to the future of digital mental wellness for teenagers and adolescents, please reach out. We’re actively looking for passionate developers and data scientists!

Research Alerts

February 2024 - I gave a talk at the Center for Language and Speech Processing (CLSP) about using AI to fight inequity in healthcare. Slides available here.

November 2023 - New paper alert: “An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping” will be presented at ML4H in New Orleans next month. Our study tempers recent claims that language models pretrained on clinical data are necessary for clinical NLP tasks and highlights the importance of not treating clinical language data as a single homogeneous domain. Find the paper here.

October 2023 - I was invited to speak with the Alzheimer’s Association’s AI working group about opportunities and challenges for AI in the fight against healthcare disparities. Slides available here.

July 2023 - New paper alert: “Characterization of Stigmatizing Language in Medical Records” has been accepted to appear in the main proceedings of ACL 2023. We examine what makes harmful language in clinical documentation different from other forms of abusive language in more widely-studied domains. Paper can be found here. Code, data, and models can be found here.

June 2023 - I’ll be giving the closing keynote at the 1st International Workshop on Ethics and Bias of Artificial Intelligence in Clinical Applications. Incredibly grateful for the opportunity! Slides for the presentation on stigmatizing language in medical records can be found here.

June 2023 - I’ve started a 3-month internship at Netflix. I’ll be investigating ways to leverage audio-visual representations to improve audience size forecasting.

March 2023 - New paper alert: “Health Disparities in Lapses in Diabetic Retinopathy Care” has been accepted by Ophthalmology Science. This study uses NLP to augment information contained within the EHR and provides further evidence of (in)equality in the American healthcare system. Find the paper here.

May 2022 - New paper alert: “Then and Now: Quantifying the Longitudinal Validity of Self-Disclosed Depression Diagnoses” has been accepted to CLPsych. Learn about some of the pernicious biases that arise when using diagnosis self-disclosures to annotate mental health status in the paper available here.

March 2022 - New paper alert: Our paper “The Problem of Semantic Shift in Longitudinal Monitoring of Social Media” has been accepted to the 14th ACM Web Science Conference. Read the open-access paper here.

June 2021 - New paper alert: We presented two papers at CLPsych detailing some of the issues with social media data for mental health. Check out our insights regarding fairness of machine learning classifiers as a function of self-reported gender here and insights regarding data access/representation here.

April 2021 - New paper alert: “Gender and Racial Fairness in Depression Research using Social Media” will be included in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. [Paper]

October 2020 - New paper alert: “Do Models of Mental Health Based on Social Media Generalize?” will be included in Findings of ACL: EMNLP 2020. [Paper] [Code]

April 2020 - To support ongoing research efforts related to understanding the spread of information about COVID-19, I have independently reproduced code, data, and models from “Geocoding without geotags: a text-based approach for reddit”. Learn more at the Github repository here.

August 2019 - I have officially started my pursuit of a PhD at Johns Hopkins. Reach out if you’re ever in Baltimore!

June 2019 - After 2 owners, 3 years, and 4 positions, today marked my last day with Warner Media Applied Analytics. I will forever be thankful for the opportunities that were made available to me by my wonderful colleagues during that time. I now look forward to taking on new challenges as a Computer Science PhD student at Johns Hopkins University. Starting in August, I will be working alongside the Center for Language and Speech Processing and the Malone Center for Engineering in Healthcare to create new clinical decision support tools.

February 2019 - I’ve been promoted to Senior Quantitative Analyst. Grateful for the support that my co-workers have provided to get me to this point.

November 2018 - I’ll be attending EMNLP in Brussels, Belgium to present the first-ever approach for geolocating Reddit users. Long-paper can be found here. I’ll be giving one of the extended talks within the Workshop on Noisy User-generated Text.

June 2018 - My team was acquired by Warner Media (previously Time Warner). Excited for the new data that is about to come my way!

June 2018 - I’ll be attending NAACL in New Orleans, LA. Look forward to seeing all the interesting papers in the main conference, in addition to CLPsych and PEOPLES.

June 2017 - I’ve started a new position as a Quantitative Analyst at Legendary Entertainment. Appreciate the opportunity to return to my previous co-op employer as a full-time team member.

May 2017 - I graduated from Northeastern University (summa cum laude) with a BS in Mathematics. Time to travel a bit before starting a new research position in industry.

March 2017 - I presented work from the Action Lab’s “Pitchers and Pianists” at the New England Sequencing and Timing Meeting alongside my collaborators Dena Guo and Professor Dagmar Sternad. Video from the talk can be found here. Our talk starts at 2:51:51.

April 2016 - I won the award for Outstanding Student Research in Computer and Information Sciences at the Northeastern University RISE conference for my work to infer the gender of Reddit users using language. Poster can be found here.