April 29, 2024
AI in Medicine: Notes from the Mount Sinai Workshop
What the Workshop Was
The AI in Medicine workshop at Mount Sinai was organized by medical students Jen, Katie, and Joy Chen (MD-PhD candidate). Jen called it a "bootcamp"—two days of presentations, case studies, and hands-on sessions designed to give attendees real working knowledge regardless of technical background.
The faculty mixed medical students with physician-scientists. The format mixed presentations with direct tool use. The result: specific, grounded conversations about what AI does well, what it doesn't, and what nobody knows yet.
What AI Is Actually Doing in Medicine
NLP: More Than Chatbots
Anish Kumar made a point of separating NLP from the chatbot hype: "NLP does not equal chatbots... This field has a decades-long history going back all the way to World War II."
The applications he highlighted:
- Sentiment analysis on patient feedback
- Information extraction from unstructured EHR text
- Machine translation for diverse patient populations
The common thread: processing the massive amounts of language that healthcare generates and making it usable.
Computer Vision: Seeing What Humans Miss
Felix Richter's NICU case study showed computer vision using "pose estimation" to quantify infant alertness from video—something too subtle for human observation alone.
Dr. Alisa Ruch presented AI models for echocardiogram interpretation that outperformed human experts on mitral regurgitation assessment. A chest X-ray model, trained on echocardiogram labels, could detect severe LVH and dilated left ventricles better than 15 radiologists in head-to-head comparison.
The point isn't automation. It's extraction—pulling information from images that human eyes can't reliably detect.
Reinforcement Learning: Adapting in Real Time
Joy Chen and Corbin Matthews introduced reinforcement learning as uniquely suited to medicine's complexity. Matthews gave examples: optimizing CPAP pressure for respiratory patients, adjusting hypertension or diabetes medication timing based on patient state.
RL learns through feedback and adapts to individual patients—a different paradigm from static models.
No-Code ML: Who Gets to Build
Vivian's hands-on session challenged the assumption that AI development requires deep technical expertise. Participants built a chest X-ray classifier using Google Cloud AutoML without writing code.
Her point: "No-code machine learning is really important because it makes ML more accessible to a wider range of people across healthcare, finance, pretty much any industry."
If clinicians can build, they can shape what gets built.
The Hard Problems
The Evidence Gap
Dr. Gaurishan Karni's keynote was direct about the gap between demos and proof. He cited a 2020 BMJ paper on COVID models: "90% of them had high risk of bias and were not generalizable at all."
Dr. Ruch pointed to the CACTIS trial as an example of what's needed—prospective validation of AI safety nets in emergency departments, measuring actual clinical impact.
Working AI isn't the same as proven AI.
Data Problems
Divya's presentation catalogued what's wrong with healthcare data: missing data, biased collection, misdiagnosis, limited representation. Dr. Lily Chan's research on stigmatizing language in EHRs showed how bias embeds itself in clinical text.
Akil Merchant was blunt about the work: data collection, cleaning, preprocessing, and understanding limitations is where most of the time goes. It's not glamorous, but it's the foundation.
The Black Box Problem
Anish Kumar raised the transparency question directly: "How transparent do you think we have to be to patients when the efficacy of the tool might hinge on the fact that it's coming from a human agent?"
Dr. Karni pushed back: "The ultimate thing is whether you're going to believe this recommendation or not." He argued trust matters more than full interpretability.
Akil Merchant's neural network explanation aimed at a middle ground—understanding enough about how these systems work to make informed decisions about when to trust them.
Probability, Not Certainty
Akil Merchant emphasized that AI models output probabilities, not diagnoses. "What's the difference between 47% and 51%? It's a purely academic thing at that point."
Clinicians need frameworks for working with probabilistic outputs—not treating them as binary yes/no answers.
Where This Should Go
Responsible AI
Katie outlined core principles: reliability, fairness, safety, transparency, privacy, consent, accountability. Divya argued for granular race data and methods like PRPL to address underreporting bias.
These aren't optional add-ons. They're the difference between AI that helps and AI that amplifies existing problems.
Clinicians as Builders
Dr. Karni's framing: AI "will help us become better physicians," freeing time for "more meaningful physician-patient interaction."
Katie listed the roles clinicians can take: informed users, educators, clinical collaborators, innovators, advocates. Felix Richter's own path—from PhD AI research to NICU practice—showed what clinician-led development looks like.
Keeping Humans Central
Dr. Karni's closing: "The magic ingredients to improving patient outcomes in this AI age will be the same as they have been in the history of medicine and the history of humankind."
Rigor. Compassion. Imagination. Teamwork. Technology serves these. It doesn't replace them.
The Takeaway
This workshop didn't oversell AI or dismiss it. It showed specific applications, named specific limitations, and asked specific questions. The emphasis on validation, data quality, bias, clinician agency, and human values suggests a field trying to get this right rather than just get it done.
The presentations and discussions here are a useful map for anyone trying to understand where AI in medicine actually stands—not the hype, not the backlash, but the work.