It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 University of California, San Francisco, Bakar Computational Health Sciences Institute, San Francisco, USA (GRID:grid.266102.1) (ISNI:0000 0001 2297 6811)
2 University of California, San Francisco, Division of Rheumatology, Department of Medicine, San Francisco, USA (GRID:grid.266102.1) (ISNI:0000 0001 2297 6811)
3 University of California, San Francisco, Division of Rheumatology, Department of Medicine, San Francisco, USA (GRID:grid.266102.1) (ISNI:0000 0001 2297 6811); San Francisco Veterans Affairs Medical Center, San Francisco, USA (GRID:grid.410372.3) (ISNI:0000 0004 0419 2775)
4 University of California, San Francisco, Bakar Computational Health Sciences Institute, San Francisco, USA (GRID:grid.266102.1) (ISNI:0000 0001 2297 6811); University of California Health, Center for Data-Driven Insights and Innovation, Oakland, USA (GRID:grid.266102.1)