Content area
Perceptual and ubiquitous intelligence, which enables computational systems to sense and reason about humans and the world around them using a variety of sensor-based devices, holds immense potential to make healthcare more proactive, personalized, and accessible. This intelligence is central to enabling new frontiers in healthcare, from systems for clinical diagnostics like remote vital signs monitoring and computer-assisted endoscopy, to all-day personal health management built upon present-day, wrist-worn health trackers and next-generation wearables such as smart glasses. However, existing approaches face significant problems in robust perception, high-level reasoning, and all-day operation. Perceptual methods for health sensing often degrade significantly in real-world conditions, such as in the presence of sensing target motion or within the visually ambiguous, challenging environments of internal organs. While recent advances in artificial intelligence (AI) have unlocked powerful high-level reasoning with large language models (LLMs), users still struggle to derive actionable insights from the vast data gathered by modern wrist-worn health trackers, a challenge compounded by the significant reliability and safety risks of using general-purpose, LLM-based chatbots for healthcare advice. Furthermore, as smart glasses with cameras become more widely adopted, the high power consumption of continuous camera sensing makes all-day personal health management, such as providing memory support for medication adherence, impractical.
To address the challenge of robust perception, my dissertation improves camera-based health sensing for clinical diagnostics. First, I addressed motion artifacts by developing a technique that utilizes neural motion transfer to create representative training data, significantly improving remote vital sign estimation under sensing target motion. Second, I addressed visually ambiguous environments by leveraging subtle photometric cues from near-field lighting in endoscopy to develop a novel per-pixel shading representation, enabling higher quality monocular depth estimation from clinical endoscopy videos.
To enable actionable and safer high-level reasoning for personal health management, my dissertation presents several contributions. I first addressed the challenge of personal health data interpretation by co-developing the Personal Health Insights Agent (PHIA), an LLM-based agent that autonomously uses code generation and web search tools to provide data-driven answers to a user's health questions based on their own data from wrist-worn health trackers. To improve the reliability of this reasoning, I further assessed and improved the probabilistic reasoning capabilities of underlying LLMs for questions involving data distributions. Finally, to better explore the challenge of safety when reasoning about healthcare questions, I curated and analyzed HealthChat-11K, a novel dataset of 11,000 real-world conversations, to systematically identify common failure modes and safety concerns in how users seek health information from AI chatbots.
Finally, to solve the problem of energy-efficient, all-day operation for next-generation wearables, a part of this dissertation extends battery life by using algorithms that selectively toggle power-hungry operations. To achieve this, I developed EgoTrigger, a context-aware system that listens for salient hand-object interaction events using a low-power microphone and selectively triggers a smart glass camera. This audio-driven approach can significantly reduce camera usage, as well as subsequent data storage and transmission, while preserving performance on downstream human memory enhancement tasks, enabling energy-efficient, all-day operation.