De-Identification of Laboratory Reports in STEM

  • Alex Rudniy University of Scranton
Keywords: artificial intelligence, customized tagging, de-identification, machine learning, OpenNLP and NeuroNER toolkits, writing analytics, STEM Writing

Abstract

  • Background: Employing natural language processing and latent semantic analysis, the current work was completed as a constituent part of a larger research project for designing and launching artificial intelligence in the form of deep artificial neural networks. The models were evaluated on a proprietary corpus retrieved from a data warehouse, where it was extracted from MyReviewers, a sophisticated web application purposed for peer review in written communication, which was actively used in several higher education institutions. The corpus of laboratory reports in STEM annotated by instructors and students was used to train the models. Under the Common Rule, research ethics were ensured by protecting the privacy of subjects and maintaining the confidentiality of data, which mandated corpus de-identification.
  • Literature Review: De-identification and pseudonymization of textual data remains an actively studied research question for several decades. Its importance is stipulated by numerous laws and regulations in the United States and internationally with HIPAA Privacy Rule and FERPA.
  • Research Question: Text de-identification requires a significant amount of manual post-processing for eliminating faculty and student names.  This work investigated automated and semi-automated methods for de-identifying student and faculty entities while preserving author names in cited sources and reference lists. It was hypothesized that a natural language processing toolkit and an artificial neural network model with named entity recognition capabilities would facilitate text processing and reduce the amount of manual labor required for post-processing after matching essays to a list of users’ names. The suggested techniques were applied with supplied pre-trained models without additional tagging and training. The goal of the study was to evaluate three approaches and find the most efficient one among those using a users’ list, a named entity recognition toolkit, and an artificial neural network.
  • Research Methodology: The current work studied de-identification of STEM laboratory reports and evaluated the performance of the three techniques: brute forth search with a user lists, named entity recognition with the OpenNLP machine learning toolkit, and NeuroNER, an artificial neural network for named entity recognition built on the TensorFlow platform. The complexity of the given task was determined by the dilemma, where names belonging to students, instructors, or teaching assistants must be removed, while the rest of the names (e.g., authors of referenced papers) must be preserved.
  • Results: The evaluation of the three selected methods demonstrated that automating de-identification of STEM lab reports is not possible in the setting, when named entity recognition methods are employed with pre-trained models. The highest results were achieved by the users’ list technique with 0.79 precision, 0.75 recall, and 0.77 F1 measure, which significantly outweighed OpenNLP with 0.06 precision, 0.14 recall, and 0.09 F1, and NeuroNER with 0.14 precision, 0.56 recall, and 0.23 F1.
  • Discussion: Low performance of OpenNLP and NeuroNER toolkits was explained by the complexity of the task and unattainability of customized models due to imposed time constraints. An approach for masking possible de-identification errors is suggested.
  • Conclusion: Unlike multiple cases described in the related work, de-identification of laboratory reports in STEM remained a non-trivial labor-intensive task. Applied out of the box, a machine learning toolkit and an artificial neural network technique did not enhance performance of the brute forth approach based on user list matching.
  • Directions for Future Research: Customized tagging and training on the STEM corpus were presumed to advance outcomes of machine learning and predominantly artificial intelligence methods. Application of other natural language toolkits may lead to deducing a more effective solution.

Author Biography

Alex Rudniy, University of Scranton
Assistant Professor in Computer Science

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Devin, M. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arxiv.org/abs/1603.04467.

Adding a new entity type [Online forum comment]. (2017, 23 October). Retrieved from https://github.com/Franck-Dernoncourt/NeuroNER/issues/73.

Administrative Procedure Act of 1946. Retrieved from https://www.justice.gov/sites/default/files/jmd/legacy/2014/05/01/act-pl79-404.pdf.

Anson, I.G., & Anson, C.M. (2017). Assessing peer and instructor response to writing: A corpus analysis from an expert survey. Assessing Writing, 33, 12–24.

Apache OpenNLP Developer Documentation (2017). Retrieved from https://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#intro.description

APEC Privacy Framework. (2005). Retrieved from https://www.apec.org/-/media/APEC/Publications/2005/12/APEC-Privacy-Framework/05_ecsg_privacyframewk.pdf.

Aull, L. (2017). Corpus analysis of argumentative versus explanatory discourse in writing task genres. Journal of Writing Analytics, 1, 1–47.

Bayardo, R.J., & Agrawal, R. (2005). Data privacy through optimal k-anonymization. Proceedings: IEEE 21st International Conference on Data Engineering, 217–228. Retrieved from https://doi.org/10.1109/ICDE.2005.42.

Beckwith, B.A., Mahaadevan, R., Balis, U.J., & Kuo, F. (2006). Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Medical Informatics and Decision Making, 6(12), 1–10.

Boyd, D. (2008). Facebook’s privacy trainwreck: Exposure, invasion, and social convergence. Convergence: The International Journal of Research into New Media Technologies, 14(1), 13–20. Retrieved from http://dx.doi.org/10.1177/1354856507084416.

Carafe [Computer software] (2005). Available from https://sourceforge.net/projects/carafe/

Children's Online Privacy Protection Act of 1998 (COPPA). Retrieved from https://www.epic.org/privacy/kids/.

Coalition Letter against DOJ's XBD Bill. Retrieved from https://www.eff.org/document/2017-09-20-coalition-letter-against-dojs-xbd-bill.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493–2537.

Daries, J.P., Reich, J., Waldo, J., Young, E.M., Whittinghill, J., Ho, A.D., … Chuang, I. (2014). Quality social science research and the privacy of human subjects require trust. Communications of the ACM, 57(9), 56–63.

Decree for Federal Law of Protection of Personal Data in Possession of Individuals. Retrieved from http://www.dof.gob.mx/nota_detalle.php?codigo=5150631&fecha=05/07/2010.

Dehghan, A., Kovacevicr, A., Karystianisab, G., Keanead, J.A., & Nenadic, G. (2015). Combining knowledge- and data-driven methods for de-identification of clinical narratives. Journal of Biomedical Informatics, 58(5), 53 –59.

Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017). NeuroNER: An easy-to-use program for named-entity recognition based on neural networks. Retrieved form https://arxiv.org/abs/1705.05487.

Dernoncourt, F., Lee, J. Y., Uzuner, O., & Szolovits, P. (2016). De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association. 24(3), 596 –606. doi: 10.1093/jamia/ocw156.

Directive (EU) 2016/680 of the European Parliament and of the Council. Retrieved from http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016L0680&from=EN.

Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. (1995). Retrieved from https://eur-lex.europa.eu/eli/dir/1995/46/oj.

Donahue, C., Elliot, N., Ross, V., & Moxley, J. (2017, January). At the intersection of writing program administration, the digital humanities, and STEM education: Corpus methods as a lens for reader response. Paper presented at the Modern Language Association Convention. Philadelphia, PA.

Douglass, M., Clifford, G., Reisner, A., Moody, G., & Mark, R. (2005). Computer-assisted de-identification of free text in the MIMIC II database. Computational Cardiology, 32, 331–334.

Electronic Communications Privacy Act of 1986 (ECPA), 18 U.S.C. § 2510-22. Retrieved from https://it.ojp.gov/PrivacyLiberty/authorities/statutes/1285.

Elliot, N., Walkup, K., & Moxley, J. (2016). Preface to workshop two: Writing analytics, data mining, and writing studies. Proceedings of the 9th International Conference on Education Data Mining. Raleigh, NC: EDM.

Fair Credit Reporting Act. 15 U.S.C. §1618. Retrieved from https://www.ftc.gov/system/files/fcra_2016.pdf.

Family Educational Rights and Privacy Act of 1974 (FERPA). Retrieved from https://epic.org/privacy/student/ferpa/.

Federal Policy for the Protection of Human Subjects ('Common Rule'). Retrieved from https://www.hhs.gov/ohrp/regulations-and-policy/regulations/ OECD Privacy Guidelines (2013) common-rule/index.html.

Federal Trade Commission Enforcement of the U.S.-EU and U.S.-Swiss Safe Harbor Frameworks. Retrieved from https://www.ftc.gov/tips-advice/business-center/guidance/federal-trade-commission-enforcement-us-eu-us-swiss-safe-harbor.

Foufi, V., Gaudet-Blavignac, C., Chevrier, R., & Lovis, C. (2017). De-identification of medical narrative data. In R. Engelbrecht, R. Balicer, & M. Hercigonja-Szekeres (Eds.), The practice of patient centered cure (pp. 23–27). Amsterdam: IOS Press.

Freedom of Information Act of 1966. Retrieved from http://congressionaldata.org/the-original-text-of-the-freedom-of-information-act/.

GAO. (2008). Privacy: alternatives exist for enhancing protection of personally identifiable information: Report to congressional requesters (Report # GAO-08-536). Washington, DC: US Govt. Accountability Office. Retrieved from http://purl.access.gpo.gov/GPO/LPS111810.

Gellman, R. (2017). Fair information practices: A basic history. Version 2.18. Retrieved from https://bobgellman.com/rg-docs/rg-FIPshistory.pdf.

Gilbert, F. (2009). Global privacy and security law. Austin, TX: Wolters Kluwer Law & Business.

Greller, W., & Drachsler, H. (2012). Translating learning into numbers: A generic framework for learning analytics. Educational Technology & Society, 15(3), 42–57.

Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Retrieved from https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#protected.

Gulshan V., Peng L., Coram M., Stumpe M.C., Wu D., Narayanaswamy A., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402–2410.

Gupta, D., Saul, M., & Gilbertson, J. (2004). Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. American Journal of Clinical Pathology, 121(2), 176–186. Retrieved from https://doi.org/10.1309/E6K33GBPE5C27FYU.

H.R. 395 — 108th Congress: Do-Not-Call Implementation Act. Retrieved from https://www.govtrack.us/congress/bills/108/hr395.

H.R. 4709 — 109th Congress: Telephone Records and Privacy Protection Act of 2006. Retrieved from https://www.govtrack.us/congress/bills/109/hr4709.

H.R.3103 - Health Insurance Portability and Accountability Act of 1996. Retrieved from https://www.congress.gov/bill/104th-congress/house-bill/3103/text.

H.R.493 - Genetic Information Nondiscrimination Act of 2008. Retrieved from https://www.congress.gov/bill/110th-congress/house-bill/493/text.

H.R.4943 - CLOUD Act. Retrieved from https://www.congress.gov/bill/115th-congress/house-bill/4943/text.

Haber, S., Hatano, Y., Honda, Y., Horne, W., Miyazaki, K., Sander, T., … Yao, D. (2007). Efficient signature schemes supporting redaction, pseudonymization, and data deidentification (Report # HPL-2007-191). Retrieved from http://hpl.hp.com/techreports/2007/HPL-2007-191.pdf.

Hay, M., Miklau, G., Jensen, D., Towsley, D., & Weis, P. (2008). Resisting structural re-identification in anonymized social networks. Proceedings of the VLDB Endowment, 1(1), 102–114.

Health Insurance Portability and Accountability Act of 1996. Retrieved from https://www.gpo.gov/fdsys/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf

Hinton, G. E., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6), 82–97.

Ingersoll, G.S., Morton, T.S., & Farris, A.L. (2013). Taming text. How to find, organize, and manipulate it. New York: Manning Publications.

Jiang, J. (2012). Information extraction from text. In C.C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 11–41). New York: Springer Verlag.

Jolly, I. (2017). Data protection in the United States: Overview. Retrieved from https://uk.practicallaw.thomsonreuters.com/6-502-0467?transitionType=Default&contextData=(sc.Default)&firstPage=true&bhcp=1

Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. Retrieved from https://arxiv.org/abs/1404.2188

Kanuppinen, A., Leijen, D., Moxley, J., & Wärnsby, A. (2016). Pre-conference workshop: Responsible Action: International Higher Education Writing Research Exchange. Presented at the Conference on College Composition and Communication Convention, Houston, TX.

Khalil, M., & Ebner, M. (2016). De-identification in learning analytics. Journal of Learning Analytics, 3(1), 129–138. Retrieved from https://doi.org/10.18608/jla.2016.31.8.

Kudo, T., & Matsumoto, Y. A. (2004). Boosting algorithm for classification of semi-structured text. Proceedings:Empirical Methods in Natural Language Processing 2004, 301–308.

Kumar, U., & Helmy, A. (2009). Human behavior and challenges of anonymizing WLAN traces. Proceedings from IEEE Globecom’09: Global Communications Conference.

Labeau, M., L’oser, K., & Allauzen, A. (2015). Non-lexical neural architecture for fine-grained POS tagging. Proceedings: 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, 232–237.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. Retrieved from https://arxiv.org/abs/1603.01360.

Leahy, P. (2015). Bipartisan coalition led by Senators Lee and Leahy introduce legislation to ban bulk collection under Section 215 USA FREEDOM Act of 2015 would bring historic reforms to surveillance authorities. Retrieved from https://www.leahy.senate.gov/press/bipartisan-coalition-led-by-senators-lee-and-leahy-introduce-legislation_to-ban-bulk-collection-under-section-215.

LeCun, Y. (1986). Learning processes in an asymmetric threshold network. In E. Bienenstock, F, Fogelman-Soulié, & G. Weisbuch (Eds,), Disordered systems and biological organizations (pp. 233–240). Les Houches, France: Springer.

Lee, J.Y., Dernoncourt, F., Uzuner, O., & Szolovits, P. (2016). Feature-augmented neural networks for patient note de-identification. Proceedings of the Clinical Natural Language Processing Workshop, 17–22, Osaka, Japan.

Leijen, D., & Moxley, J. (2017, June). The value of peer review across different institutional, national, and curricular contexts. Paper presented at the 9th Conference of the European Association for Teaching Academic Writing (EATAW). University of London, London, UK.

Li, M. (2018). Scalable natural language de-identification based on machine learning approaches (Doctoral dissertation). Retrieved from https://etd.library.vanderbilt.edu/available/etd-03262018-113355/unrestricted/Li.pdf.

Lopez-Otero, P., Docio-Fernandez, L., Abad, A., & Garcia-Mateo, C. (August, 2017). Depression detection using automatic transcriptions of de-identified speech. Paper presented at INTERSPEECH 2017, Stockholm, Sweden.

Ludwig, J. (2009). Australian government. Enhancing national privacy protection. Australian government first stage response to the Australian law reform commission (Report # 108). Retrieved from https://www.alrc.gov.au/sites/default/files/pdfs/government_1st_stage_response.pdf.

Machanavajjhala, A., Gehrke, J., & Kifer, D. (2006). l-diversity: Privacy beyond k-anonymity. Proceedings from ICDE’06: 22nd International Conference on Data Engineering, 24–24.

McCulloch, W.S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5(4):115–133.

Meldau, E. (2018). Deep neural networks for inverse de-identification of medical case narratives in reports of suspected adverse drug reactions. Retrieved from https://kth.diva-portal.org/smash/get/diva2:1185934/FULLTEXT01.pdf.

Menikoff, J., Kaneshiro, J., & Pritchard, I. (2017). The Common Rule, Updated. New England Journal of Medicine, 82, 613–615.

Meystre, S.M., Friedlin, F.J., South, B.R., Shen, S., & Samore, M.H. (2010). Automatic de-identification of textual documents in the electronic health record: A review of recent researchBMC Medical Research Methodology, 10(70), 1–16. Retrieved from https://doi.org/10.1186/1471-2288-10-70

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. Retrieved from https://arxiv.org/abs/1301.3781.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. Proceedings: Advances in neural information processing systems (NIPS 2013), 3111–3119.

Mikolov, T., Yih, W., & Zweig, G. (2013c). Linguistic regularities in continuous space word representations. Proceedings: 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.

Minsky, M., & Papert, S. (1969). Perceptrons. An introduction to computational geometry. Cambridge, Mass: M.I.T. Press.

Modifications to the HIPAA Privacy, Security, Enforcement, and Breach Notification Rules under the Health Information Technology for Economic and Clinical Health Act and the Genetic Information Nondiscrimination Act; Other modifications to the HIPAA Rules. Retrieved from https://www.federalregister.gov/documents/2013/01/25/2013-01073/modifications-to-the-hipaa-privacy-security-enforcement-and-breach-notification-rules-under-the.

Moxley, J. (2013). Big data, learning analytics, and social assessment. The Journal of Writing Assessment, 6(1). Retrieved from http://www.journalofwritingassessment.org/article.php?article=68.

Moxley, J. (2016). From FYC to professional & technical communication, partnerships with other U.S. post-secondary programs, NSF funding, and STEM education: Rethinking document critique, collaboration, and research across cultures and contexts. Paper presented at the Computers and Writing, Rochester, NY.

Moxley, J. (2017a, January). e-Portfolios and digital learning: The future of corpus studies in the domain of writing analytics. Paper presented at the 8th Annual Forum on Digital Learning and ePortfolios, San Francisco, CA.

Moxley, J. (2017b, January). Welcome to writing analytics. Paper presented at the 4th International Conference on Writing Analytics: Writing Analytics, Data Mining, and Student Success 2017, St. Petersburg, FL.

Moxley, J. (2017c, March.) Evidence-based framework for structuring learning opportunities: Peer review, STEM, and digital feedback’s new culture. Poster presented at the Carnegie Foundation Summit, San Francisco, CA.

Moxley, J., & Eubanks, D. (2016). On keeping score: Instructors' vs. students' rubric ratings of 46,689 essays. Journal of the Council of Writing Program Administrators, 39(2), 53–80.

Moxley, J., Ross, V., Elliot, N., Rudniy, A., & Trauth, E. (2016). The role of instructor and peer feedback in improving the cognitive, interpersonal, and intrapersonal competencies of student writers in STEM courses. Presented at the International Writing Across the Curriculum Conference, University of Michigan, MI.

Moxley, J., Ross, V., & Trauth, E. (2016). Making friends in STEM: Experiences, challenges, and triumphs in collaboration, data mining, peer review, and assessment. Presented at the Council for Writing Program Administrators, Raleigh, NC.

Moxley, J., & Walkup, K. (2016). Mapping writing analytics. In Proceedings of the 9th International Conference on Educational Data Mining. Raleigh, NC: EDM.

Moxley, J., Wärnsby, A., Kauppinen, A., Leijen, D., Aull, L., Anderson, L., & Walkup, K. (2017, February). Politeness, social and intrapersonal presence in student peer reviews: A cross-cultural analysis. Roundtable held at Writing Research Across Borders IV, Bogata, Columbia.

MUC-7 dataset. Available from https://www-nlpir.nist.gov/related_projects/muc/proceedings/muc_7_toc.html.

Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. Proceedings: 2008 IEEE Symposium on Security and Privacy, 111–125.

Neamatullah, I., Douglass, M. M., Lehman, L. H., Reisner, A., Villarroel, M., Long, W. J., … Clifford, G.D. (2008). Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making, 8(32), 1–17. Retrieved from https://doi.org/10.1186/1472-6947-8-32.

O’Keefe, C. M., Otorepec, S., Elliot, M., Mackey, E., & O’Hara, K. (2017). The de-identification decision-making framework, 1–76. Retrieved from: https://publications.csiro.au/rpr/download?pid=csiro:EP173122&dsid=DS2.

OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data. Retrieved from http://www.oecd.org/internet/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonaldata.htm.

OECD Privacy Guidelines (2013). Retrieved from: http://www.oecd.org/internet/ieconomy/privacy-guidelines.htm.

OECD. (2013). Privacy Expert Group Report on the Review of the 1980 OECD Privacy Guidelines, OECD Digital Economy Papers, No. 229, Paris: OECD Publishing. Retrieved from http://dx.doi.org/10.1787/5k3xz5zmj2mx-en.

Package ‘openNLP’. (2016). Retrieved from https://cran.r-project.org/web/packages/openNLP/openNLP.pdf.

Parker, D. B. (1985). Learning-logic: Casting the cortex of the human brain in silicon. Technical Report Tr-47, Center for Computational Research in Economics and Management Science. Cambridge, MA: MIT.

Pennington, J., Socher, R., & C. Manning (2014). GloVe: Global vectors for word representation. Proceedings: EMNLP 2014 Conference on Empirical Methods in Natural Language Processing, 1532–1543.

Phelps-Hillen, J. (2017). Institutional Review Boards and Writing Studies research: A justice-oriented study (Doctoral dissertation). Retrieved from Scholar Commons Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6742.

Pinto, A., Oliveira, H.G., & Alves, A.O. (2016). Comparing the performance of different NLP toolkits in formal and social media text. Proceedings: 5th Symposium on Languages, Applications and Technologies (SLATE 2016), 3:1-3:16. DOI 10.4230/OASIcs.SLATE.2016.3.

Power, S. (2008). Privacy (Cross-border Information) Amendment Bill. Government Bill. Retrieved from http://www.legislation.govt.nz/bill/government/2008/0221/latest/whole.html.

Privacy Protection Act of 1980. Retrieved from https://www.justice.gov/usam/criminal-resource-manual-661-privacy-protection-act-1980.

Public Law 111–5. Title XIII—Health Information Technology. Retrieved from https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/hitechact.pdf.

Public Law 98-549. Retrieved from https://transition.fcc.gov/Bureaus/OSEC/library/legislative_histories/1286.pdf.

Quinlan, J.R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

Records, computers and the rights of citizens. Report of the Secretary's Advisory Committee on Automated Personal Data Systems. (1973). https://epic.org/privacy/hew1973report/default.html

Regulation (EU) 2016/679 of the European Parliament and of the Council. Retrieved from: http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN.

Right to Financial Privacy Act of 1978. Chapter 35—Right to Financial Privacy. Retrieved from http://uscode.house.gov/view.xhtml?path=/prelim@title12/chapter35&edition=prelim.

Ross, V., Elliot, N., Rudniy, A., & Moxley, J. (2016). Writing analytics, data mining, & writing studies. Presented at the 9th International Conference on Educational Data Mining (EDM 2016), Raleigh, NC.

Ross, V., & LeGrand, R. (2017) Assessing writing constructs: Toward an expanded view of inter-rater reliability. The Journal of Writing Analytics, 1, 227–275.

Ross, V., Liberman, M., Ngo, L., & LeGrand, R. (2017). Weighted log-odds-ratio, informative Dirichlet prior method to compare peer review feedback for top and bottom quartile college students in a first-year writing program. Retrieved from http://ceur-ws.org/Vol-1633/ws2-paper4.pdf

Ruch, P., Baud, R., Rassinoux, A., Bouillon, P., & Robert, G. (2000). Medical document anonymization with a semantic lexicon. Proceedings from AMIA Symposium, 729–733.

Rudniy, A. (2018). Case study—Data warehouse design for evidence-based research. IEEE Transactions on Professional Communication. Forthcoming.

S. 1490 — 111th Congress: Personal Data Privacy and Security Act of 2009. Retrieved from https://www.govtrack.us/congress/bills/111/s1490.

Saeed, M., Lieu, C., Raber, G., & Mark, R.G. (2002). MIMIC II: A massive temporal ICU patient database to support research in intelligent patient monitoring. Computers in Cardiology, 29, 641–644. doi: 10.1109/CIC.2002.1166854.

Sang, E.F.T.K., & De Meulder, F. (2003). Introduction to the conll-2003 shared task: Language-independent named entity recognition. Proceedings: Seventh conference on Natural language learning at HLT-NAACL. Association for Computational Linguistics, 142–147.

Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T, Ananiadou, S., & Tsujii, J. (2012). BRAT: A Web-based tool for NLP-assisted text annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 102-107.

Stubbs, A., Kotfila, C., & Uzuner, O. (2015). Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/uthealth shared task track 1. Journal of Biomedical Informatics, 58(S), 11–19.

Summary of the HIPAA Security Rule. Retrieved from https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html.

Sweeney, L. (1996). Replacing personally-identifying information in medical records, the Scrub system. Proceedings: A conference of the American Medical Informatics Association. AMIA Fall Symposium, 333–337.

Sweeney, L. (1998). Datafly: A system for providing anonymity in medical data. In T. Lin, & S. Qian (Eds.), Database security, XI: Status and prospects (pp. 356–381). Amsterdam: Elsevier Science.

Sweeney, L. (2000). Uniqueness of simple demographics in the U.S. Population (Report # LIDAPWP4). Pittsburgh, PA: Carnegie Mellon University.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 557–570.

Szarvas, G., Farkas, R., & Busa-Fekete, R. (2007). State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of American Medical Informatics Association, 14(5), 574–580.

Taguchi, K., & Aramaki, E. (2018). Novel location de-identification for machine and human. UISTDA 2018. Retrieved from http://ceur-ws.org/Vol-2068/uistda5.pdf.

Telephone Consumer Protection Act 47 U.S.C. § 227. Retrieved from https://transition.fcc.gov/cgb/policy/TCPA-Rules.pdf.

The Drivers Privacy Protection Act (DPPA) and the privacy of your state motor vehicle record. Retrieved from https://epic.org/privacy/drivers/.

The Privacy Act of 1974 (As Amended). Public Law 93-579, as codified at 5 U.S.C. 552a. Retrieved from http://www.dodig.mil/Portals/48/Documents/Programs/Privacy%20Program/pa1974.pdf?ver=2017-04-14-103528-910.

The Stanford Natural Language Processing Group. Retrieved from https://nlp.stanford.edu/software/.

Troyano, J., Dıaz, V., Enrıquez, F., & Romero, L. (2004). Improving the performance of a named entity extractor by applying a stacking scheme. In C. Lemaitre, C.A. Reyes, & J.A. Gonzalez (Eds.), IBERAMIA 2004, LNAI 3315, 295–304.

USA FREEDOM Act of 2015. Uniting and Strengthening America by Fulfilling Rights and Ensuring Effective Discipline Over Monitoring Act of 2015. Retrieved from https://www.congress.gov/bill/114th-congress/house-bill/2048.

USA PATRIOT Act of 2001. Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act of 2001. Retrieved from https://www.gpo.gov/fdsys/pkg/BILLS-107hr3162enr/pdf/BILLS-107hr3162enr.pdf.

Uzuner, O., Luo, Y., & Szolovits, P. (2007). Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 14(5), 550–563. Retrieved from https://doi.org/10.1197/jamia.M2444.

Video Privacy Protection Act of 1988. Retrieved from https://epic.org/privacy/vppa/

Wachtler, J., Khalil, M., Taraghi, B., & Ebner, M. (2016). On using learning analytics to track the activity of interactive MOOC videos. In M. Giannakos, D.G. Sampson, L. Kidzinski, & A. Pardo (Eds.), Proceedings of the LAK 2016 Workshop on Smart Environments and Analytics in Video-Based Learning (pp.8–17). Edinburgh, Scotland: CEURS-WS. Retrieved from http://ceur-ws.org/Vol-1579/paper3.pdf.

Wellner, B., Huyck, M., Mardis, S., Aberdeen, J., Morgan, A., Peshkin, L., … Hirschman, L. (2007). Rapidly retargetable approaches to de-identification in medical records. Journal of American Medical Informatics Association, 14(5), 564–573.

Yoose, B. (2017). Balancing privacy and strategic planning needs: A case study in de-identification of patron data. Journal of Intellectual Freedom and Privacy, 2(1) DOI: http://dx.doi.org/10.5860/jifp.v2i1.

Zeide, E. (2016). Student privacy principles for the age of big data: Moving beyond FERPA and FIPPs. Drexel Law Review, 8(2), 339–394.

Zhao, Y., Zhang, K., Ma, H., & Li, K. (2018). Leveraging text skeleton for de-identification of electronic medical records. BMC Medical Informatics and Decision Making, 18(1), 1–18.

Zhou, G., Zhang, J., Su, J., Shen, D., & Tan, C. (2005). Recognizing names in biomedical texts: A machine learning approach. Bioinformatics, 20(7), 1178–1190.

Published
2018-12-09