Data Mining Meets Medical & Educational Records – Privacy is Gone

25 Apr, 2015 by in data mining, FERPA, HIPAA, personally identifiable information, privacy, RGE 15 comments

A few weeks ago I opened my mail and this letter was enclosed. I was astounded. Could this be real?

It’s like a page ripped out of a dystopian futuristic novel.

Uglies series by Scott Westerfeld anyone?

Genetic modifications for thinness?

We need 100 thin people?

What concerned me most, however was how they got my information in the first place.

“We are contacting you for this study through a medical research organization at the University of Utah called the Resource for Genetic and Epidemioloic Research (RGE). The RGE helps medical researchers at the University of Utah to access information from State records. You are being contacted to participate in this study because it appears that from the recorded pre-pregnancy weights of one of your children’s birth certificates that you may meet our definition of being thin and that other members in your extended family are also thin.”

How do they know this?

From mining State records and medical information…

What about HIPAA?

What about Utah Code Annotated, Title 26, Chapter 2, Section 22 that protects birth certificates as PRIVATE information?

Extended family members?

Are they making a pedigree chart?

Yes, yes they are. By executive order. Since 1982.

The Utah Resource for Genetic and Epidemiologic Research (RGE) was established by Executive Order of the Governor of Utah on July 14, 1982, as a “data resource for the collection, storage, study, and dissemination of medical and related information”…”RGE governs access to the Utah Population Database (UPDB), which includes family history records, vital records, cancer registry records, driver license records, and others. These records are linked together to form multi-generational pedigrees as well as longitudinal person-level data.”

This letter is where we are as a society. It’s like some creepy futuristic dystopian novel.



Genetically modified traits for “thinness” based on data mining and research. Datapalooza? Data mining?

Hard to believe….until it showed up in my mailbox.


People ask me all the time why I am concerned about data mining in our schools and elsewhere. Here is a prime example of just how Orwellian things have become.


Each time you register your child for public school you bring in a birth record, you offer up their social security number in some cases, the exact place to find them, their confidential medical info. Their personally identifiable information. This PII is now being shared and stored in a statewide logitudinal database system or SLDS. (See page 13-on of the UTREx data file specs to get a glimpse of just HOW MUCH info is being collected and stored)

The data is maintained by the Utah Data Alliance, and ultimately it gets served up from your local school nightly,  via data portals UTREx and TIDE to the State, to Washington D.C., and is available to private corporate vendors with a “research interest.”

The Utah Data Alliance is a multi-agency collaborative partnership organized to:

Develop and maintain Utah’s only comprehensive statewide longitudinal data system (SLDS) to enable examination of educational progress and outcomes over time, from preschool, and K12 through postsecondary public education and into the workforce;

This is a national effort. A data mining goldmine where all states, systems align. If each state has an SLDS and they can collaborate, imagine how far reaching our data can spread.





“Better decisions require better information. This principle lies at the heart of the Statewide Longitudinal Data Systems (SLDS) Grant Program. Through grants and a growing range of services and resources, the program has helped propel the successful design, development, implementation, and expansion of K12 and P-20W (early learning through the workforce) longitudinal data systems. These systems are intended to enhance the ability of States to efficiently and accurately manage, analyze, and use education data, including individual student records. The SLDSs should help states, districts, schools, educators, and other stakeholders to make data-informed decisions to improve student learning and outcomes; “

“The long-term goal of the [SLDS] program is to enable all States to create comprehensive systems that permit the generation and use of accurate and timely data, support analysis and informed decision-making at all levels of the education system, increase the efficiency with which data may be analyzed to support the continuous improvement of education services and outcomes, facilitate research to improve student academic achievement and close achievement gaps, support education accountability systems, and simplify the processes used by State educational agencies to make education data transparent through Federal and public reporting.” – U.S. Department of Education, 2009

See the correlation between what is happening in my letter and what is happening in most schools across the nation?

Each time your child logs on to Utah Compose to practice writing or to take a SAGE (PARCC) exam, their data and response is being stored and shared.





Do you want American Institutes of Researchone of the world’s largest behavioral and social science research and evaluation organizations, having your child’s personally identifiable data? Because they have it.

So does The Department of Workforce Services, The Utah College of Applied Technology, Utah Education Network, the Utah Education Policy Center (at the University of Utah), and the Utah State Office of Education.

Our state has already pre-loaded all student information into the SLDS. Any time a student signs into the SAGE Portal for a formative assignment,  practice session or any other testing mechanism, they are providing additional information to their files. The only way to stop the flow of information is to not have them participate at any level. There is no opt out of the statewide logitudinal data system allowed under current law.

The Utah State Office of Education and AIR both contract with other researchers and organizations. That data could fall into alomost anyone’s hands. AIR’s data sharing and research goal is internationally reaching

AIR’s international work improves the quality of life in developing countries by using rigorous research and evaluation to enhance education and social development. With a wide variety of local partners and multinational organizations, we develop, pilot and implement field-based international development activities in lower- and middle-income countries. We provide governments and international aid agencies with the educational assessment tools and expertise they need to measure progress in student achievement and school effectiveness. Our international work is conducted by AIR’s International Development, Evaluation and Research program, as well as our Education and Health and Social Development programs.

Data being used, all without your knowledge, without informed consent. It’s legal under FERPA.

(Read more here and at the Washington Post.)

The Family Educational Rights and Privacy Act (FERPA) governs the protection and permissible uses by authorized representatives of student administrative data, including the disclosure and transfer of personally identifiable information (PII) in education records. The U.S. Department of Education (ED) released revised regulations December, 2011 to reconcile the statute with other federal laws that incented the development and use of state longitudinal data systems (SLDS):

Opt out.

Slow the data machine.

15 Responses to “Data Mining Meets Medical & Educational Records – Privacy is Gone”

  1. Brooke

    The only question I have is that the letter references “pre-pregnancy weights of one of your children’s birth certificates.” I don’t recall pre-pregnancy weight being listed on birth certificates. Did they misstate that somehow?

      • April

        I am a birth certificates clerk for the hospital I work for in Kentucky. The birth certificate worksheet the parents fill out does include a question about the pre-pregnant weight and the current weight.

        • Heather

          Good to know April, thank you! My question is are birth certificate worksheets public record or would they be covered under HIPAA? Because it seems to me, when a mother gives this information to a health care provider (or in the care of a hospital after birth) they are assuming it will be used for health care purposes and protected/ private, not released to public research companies.

    • LeAnn

      Data Mining Meets Medical Records … I would have thought we were safe with HIPPA and all, but now I’m beginning to wonder. A few weeks ago I was given a prescription for an anti-inflammatory drug that is mostly used for RA patients. I am in a cast from the knee down and experiencing a great deal of swelling when it is not elevated. About a week later, I received a letter from a totally different drug company trying to convince me that their drug could help me with my RA. I don’t have RA, but apparently my prescription from my local Walgreens, tells drug companies otherwise. How in the world did they get my name, address, etc. Is it my doctor selling that information or the Walgreens Pharmacy, either one, I’m not too happy about!!!

  2. Brooke Anderson

    What personally identifiable information is included in the SLDS? You provided a link to the data dictionary for Utrex, but not one for the UDA. They are separate entities and databases.

    Out of the information included in Utrex, which information do you think should not be included?

    • Heather

      Brooke, there is also a link to Utah Data Alliance included in the post. If someone can find their data dictionary, that would be helpful as I have been unable to find it online. I do not think records that are sensitive, for instance, disability information and detailed diagnoses should be kept online electronically. Medical records are not covered under HIPAA when they are lsited in a child’s school record, they are only protected under FERPA. They are now available for research companies. There is too much risk of a hack or data breach. Also page 114 of the UTREX data dictionary “Incident Association Record” is a big eye opener as it discusses and documents involvement of a youth with a weapon.

      • Brooke Anderson

        “I do not think records that are sensitive, for instance, disability information and detailed diagnoses should be kept online electronically.” Well, I think you are fighting the tide on that one. Medical, educational, legal, and financial records are electronic now. It is extremely unlikely that all organizations with sensitive information will choose to return to paper-based record-keeping.

        Utrex does not include detailed medical information. It includes information on disability because students who are enrolled in school receive services for disabilities, and depending on the service (like braille readers for the blind or classroom audio systems for the hard-of-hearing or self-contained placement) that is part of their educational record. That record, and those services, need to transfer when students change schools, either moving up grade levels or moving between districts.

        “114 of the UTREX data dictionary “Incident Association Record” is a big eye opener as it discusses and documents involvement of a youth with a weapon.” Incidents where students bring weapons to school are documented as per safe school laws. Is this documentation something that shouldn’t happen?

        “They are now available for research companies” Are you saying Utrex records are available to research companies? Which companies?

      • Brooke Anderson

        “I found this link for the SLDS data dictionary
        I’m afraid this isn’t the UDA’s SLDS data dictionary. This is the data dictionary for USOE data warehouse, which is a separate entity and database from the UDA. This data dictionary is for the warehouse that was started in 1998. This particular warehouse is the one used for NCLB and in-house reporting.

        I can’t find a data dictionary for the UDA’s SLDS either, but I do know it does not contain PII. The purpose of the SLDS is to house aggregate and de-identified information to analyze the performance of the state education system as a whole, not individual students. For example, what happens to Utah’s high school dropouts? As it turns out, most HS dropouts will re-engage with education in less than a year, usually in the form of adult ed or community college. Without an SLDS, there’s no opportunity to accurately answer big-picture policy questions. It’s not an Orwellian plot to monitor everyone; it’s a way to answer basic questions about UT education with something better than hunches. You can find more about the UDA and data privacy here:

        “Also you can see that personally identifiable information and individual student (not aggregated or de-identified) data is given to AIR , the proctors of SAGE formative, interim and summative as they have a contract with USOE” This document doesn’t reference AIR.

        I’m looking through this document:, the AIR contract, and it does not indicate the data is given to AIR except to build the system indicated in the contract. Is there an indication within the contract that AIR is allowed to retain or mine data collected through SAGE?

        • Heather

          Nothing prevents the SLDS from collecting all 400 data points listed at the national data collection model of the federal NCES site. Proper protections are missing. It’s not our responsibility to prove what SIS and SLDS collect. It’s the state’s responsibility to prove to parents that the children’s data is properly protected, and that parental consent is honored.

          If the point of computer adaptive testing is to track a student’s learning through their school year, over their school career, and into the workforce, then the point is to collect personally identifiable information. Look at again on page 2. In the diagram it illustrates that anyone who contracts with USOE will have access to individual student data. AIR contracted with USOE and they have access to PII and data. They serve up formative assigments which are now availbale and the data uploads continuously to them via TIDE and the SAGE portal. In my mind this compromises the State databases and children’s privacy.

  3. David Gillespie

    I thought I should point out that without the RGE, many of the advances in cancer treatment and other genetic diseases would not have happened. I know this because I worked on the epidemiology of BRCA1 in the 90’s (a gene involved in breast and ovarian cancer that was discovered with the RGE), and now do research at Huntsman Cancer Institute in Glioblastoma. The Utah Population Database ( provides the multigenerational data necessary to hunt down multi-gene interactions and the foundation for personalized treatments that are finally starting to have clinical impact. I can’t comment on how all this is linked into the public schools and the Utah Data Alliance, but before you decry this as a huge Orwellian government control plot, remember that the reason it was established was to give medical research the data needed to save lives, and that is what it is doing.

    • Dr. Gary Thompson

      I save lives with intimate personal data also….with one exception….

      I ask permission.

      It’s called “informed consent”.

      That said, I agree with you post comments. Data can and does save lives. So do ethics.

      “Above ALL else, do no harm”.

      • David Gillespie

        I’m only speaking about the Utah Population Database, because I know that the information we use is only collected by informed consent, and is strictly protected. Most people sign the consent form without really thinking about it, or have given permission at some point for their medical records to be accessed for research without realizing it.


Leave a Reply

CommentLuv badge