Welcome to Component 24, Healthcare and Data Analytics, Unit 1, Introduction to Healthcare Data Analytics. This is Lecture A. This unit introduces the basics of working with healthcare data analytics. healthcare data for the novice.
The different types of data are explored, as well as the array of technology and tools available for working with data. Big data is defined, and the special challenges related to working with data are discussed. The objectives for this unit, Introduction to Healthcare Data Analytics, Lecture A, R2, give a basic overview of data analytics in healthcare, and describe the nine steps of the data analytics process.
In 2011, Peter Sondergaard, Senior Vice President and Global Head of Research for the Worldwide Information Technology Research and Advisory Company, Gartner, stated that, Information is the oil of the 21st century and analytics is the combustion engine. So, what exactly is analytics and why is it so important to 21st century healthcare? The Institute of Medicine in their 2012 report titled Best Care at Lower Cost the Path to Continuously Learning Healthcare in America stated that America's healthcare system has become far too complex and costly to continue business as usual, pervasive inefficiencies, an inability to manage a rapidly deepening clinical knowledge base, and a reward system poorly focused on key patient needs. needs, all hinder improvements in the safety and quality of care and threaten the nation's economic stability and global competitiveness. Achieving higher quality care at lower cost will require fundamental commitments to the incentives, culture, and leadership that foster continuous learning, as the lessons from research and each care experience are systematically captured, assessed and translated into reliable care.
They define a learning healthcare system as a system designed to generate and apply the best evidence for the collaborative healthcare choices of each patient and provider, to drive the process of discovery as a natural outgrowth of patient care, and to ensure innovation, quality, safety, and value in healthcare. Consider the various information systems you've learned about so far. A hospital will likely have an electronic health record system, as well as specialized departmental systems for laboratory, diagnostic imaging, pharmacy, nutrition services, billing, anatomic pathology, and so on. Each of these systems is designed and intended for clinical use.
In other words, patient care. And so they capture specific... data about the patient.
However, none of these systems has a complete set of data for any individual patient or for a group of patients, such as all patients who were admitted in January with a certain diagnosis. That can be used for analysis and reporting. Obtaining deep insight into what is happening with individual patients, as well as across groups of patients, requires aggregating data together from many systems. Obtaining deep insight into what is happening with individual patients, as well as across groups of patients, requires aggregating data together from many systems and performing statistical analyses of this aggregated data.
In contrast to the various clinical systems discussed on the previous slide, a clinical data warehouse brings together data for a patient into a single, coordinated location, and this location is used for analysis and reporting purposes. This is accomplished via a process known as Extraction Transform Load or ETL, which retrieves data from various clinical systems, synchronizes formats of data in a process called transformation, and cleans up the data, and then imports the data into the database of the clinical data warehouse. The transformation process is especially important as data can be stored in a variety of forms across systems.
For example, For example, a laboratory system might use the letters M, F, or U for patient gender, male, female, or unknown, while the radiology information system might use 1, 2, or 9 instead. However, they must match the designations used in the clinical data warehouse, and that process of converting them to match is called transformation. Another important step is ensuring that all of a patient's records from various systems are linked together. This typically requires a master patient index, sometimes called a master person index, to link a patient's various identifiers across systems. Now that you have an understanding of the need for a centralized, coordinated location for patient data that can be used for analysis and reporting, we'll define the term analytics and explore the different types of analytics.
What is analytics? Isn't it the same thing as statistics? The term analytics has been used in a variety of ways and with different meanings. In fact, Gartner stated that analytics has emerged as a catch-all term for a variety of different business intelligence, BI, and application-related initiatives. In 2015, the National Institute of Standards issued a formal definition of analytics as follows.
The term analytics refers to the discovery of meaningful patterns in data. and is one of the steps in the data lifecycle of collection of raw data, preparation of information, analysis of patterns to synthesize knowledge, and action to produce value. As shown in this diagram, analytics is the entire process of data collection, extraction, transformation, analysis, interpretation, and reporting. It includes statistical analysis as one of the steps.
Further, the NIST stated that analytics is used to refer to the methods, their implementations and tools, and the results of the use of the tools as interpreted by the practitioner. The analytics process is the synthesis of knowledge from information. IBM in 2013 categorized analytics into three types.
Descriptive uses business intelligence and data mining to ask, what has happened? Predictive uses statistical models and forecasts to ask, what could happen? Prescriptive uses optimization and simulation to ask, what should we do?
To these three types, Gartner adds a fourth type of diagnostic analytics, which they define as a form of advanced analytics which examines data or content to answer the question, Why did it happen? As shown in this diagram, the simplest type of analytics starts in the lower left-hand corner with descriptive analytics. Diagnostic analytics are more valuable to the institution but also more difficult to perform.
Even more difficult and also more valuable are predictive analytics. Finally, the most difficult and also the most valuable are prescriptive analytics. Let's look at each of these now. Descriptive analytics are the simplest type of analytics and simply describe the data.
statistics are used, such as the number of laboratory tests, the average age of patients, or the average length of stay in the hospital for patients with a particular diagnosis. Descriptive analytics are often presented as a way to determine the age of the patient. as pie charts, bar or column charts, tables, or written narratives.
Gartner defines diagnostic analytics as a form of advanced analytics which examines data or content to answer the question, why did it happen? Tools used for diagnostic analytics include drill down techniques, data discovery, and correlations. Let's start with an example before going into the formal definitions. Kaiser Permanente analyzed data on infants to develop an algorithm for classifying which babies were at risk for developing sepsis, and conversely, which babies did not need to be treated. Sepsis is described by the Mayo Clinic as a potentially life-threatening complication.
of an infection. Sepsis occurs when chemicals released into the bloodstream to fight the infection trigger inflammatory responses throughout the body. This inflammation can trigger a cascade of changes that can damage multiple organ systems causing them to fail. If sepsis progresses to septic shock, blood pressure drops dramatically which may lead to death. Kaiser Permanente stated that judicious application of our scheme could result in decreased antibiotic treatment in 80,000 to 240,000 U.S. newborns each year.
With that example, in mind, let's now look at a definition of predictive analytics and how the Kaiser Permanente case is an example of predictive analytics. Gartner states that predictive analytics has the following four attributes. First, an emphasis on prediction rather than description, classifying, or clustering.
In the Kaiser Permanente example, they were trying to predict which newborns were at risk of developing a life-threatening condition so that they could treat the babies to prevent it. The second attribute defined by Gartner is rapid analysis, often in hours or days. Consider again the sepsis example. Sepsis is a a rapidly progressing condition that, if it progresses to the most severe stage of septic shock, can have a 50% mortality rate. Therefore, analysis of the data to predict which infants are at risk of developing this condition must be done rapidly, not over a period of weeks or months.
The third attribute defined by Gartner is an emphasis on the business relevance of the resulting insights. Consider the word relevance and how that would apply to the example of infants with a life-threatening infection. Information that would directly affect the care and prevent infants from dying is relevant. And finally, the fourth attribute defined by Gartner is an emphasis on ease of use, thus making the tools accessible to business users. In other words, these tools tools should be available to the clinical staff to use.
However, it is important to note that, as Michael Wu of Lithium states, the purpose of predictive analytics is not to tell you what will happen in the future. It cannot do that. In fact, No analytics can do that.
Predictive analytics can only forecast what might happen in the future, because all predictive analytics are probabilistic in nature. This brings us then to the highest level of analytics. which is prescriptive analytics. Gartner defines prescriptive analytics as a form of advanced analytics which examines data or content to answer the question what should be done or what can we do to make something happen and is characterized by techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristic and machine learning.
Now let's look at the steps in data analysis in more detail. Data analytics involves a sequence of steps. Number one, identify the problem. Number two, identify what data are needed and where those data are located.
Number three, develop a plan for analysis and a plan for retrieval. Number four, extract the data. Number five, check, clean, and prepare the data for analysis.
Number six, 6. Analyze and interpret the data. 7. Visualize the data. 8. Disseminate the new knowledge. 9. Implement the knowledge into the organization. We will go into each of these in more detail on the next slides.
The first step is to define the problem to be studied, or in business terms, identify the business case. Why is this important to study? How will the results impact patient care or the institution?
You must have a clearly stated problem or question to guide the rest of the process. You also need to identify any stakeholders, people who have a direct interest in this problem and who need to receive the results of the analysis at the end of the process. Next, the data needed for the analysis need to be identified.
Where are the data elements located? in what system or systems and what database tables who is the contact person for each system who will be responsible for retrieving the data is there a clinical data warehouse if not the required data elements may be stored in different systems requiring multiple extraction steps A plan for retrieving the data from the various systems, along with a plan for checking that all the data required were actually retrieved, should be developed. There needs to be some way to determine how many records are expected and then actually retrieved. This may involve cross-checking against other systems. This step will require the participation of the individuals who normally perform data retrieval from the systems involved.
An analysis plan needs to be developed. A statistician should be consulted and questions to be addressed here include What is the population? What size does the sample need to be? What statistical tests should be performed? The next step is the actual extraction of the data from the system or systems involved.
After the data are retrieved, the data needs to be checked for completeness. Is the set of data complete? Were all the records that should be retrieved actually retrieved? At a minimum, descriptive statistics, such as counts, must be performed at this step. At this point, changes to the extraction plan may be needed, and another extraction from the source systems may need to take place.
Once a complete set of records is extracted from the source systems, errors in the records need to be identified and corrected, and all data have errors, such as transposed letters and names, and incorrect values. Decisions must be made about how to handle empty fields. Next, data must also be synchronized or transformed.
For example, patient gender in one system in the hospital may be stored as MFU, while another system might use 129. One set of values must be changed so that all the records are using the same values. After all necessary transformation steps have been completed, the data are then imported into the destination system where the actual data analysis and reporting will take place. This may be a system as complex as a clinical data warehouse or as simple as a desktop computer.
The data are now in the system where the analysis will be run, and it should be a complete set of data. You need to check that everything is ready for analysis. Did you get what you needed? Check and verify this against the analysis plan that was developed in step 3, and that you have everything to address the problem that was identified in step 1. Now you are ready to do the actual analysis, to execute the analysis plan that was developed earlier.
Perform the statistical analyses and enlist the assistance of the statistician to confirm the interpretations and conclusions. of your analysis. Now you need to be able to communicate the results of your analysis and how the results address the problem from step one. This communication must be very clear and rapidly understandable to the decision-makers involved.
in the institution, so selecting an appropriate representation for your findings is essential. Choose a visualization that is appropriate for the type of data. For example, categorical data can be represented with column or bar charts, tables, and pivot tables, while quantitative data can be shown with histograms and a wide variety of other types of graphics, such as scatter plots and star plots.
Some common tools are text, Tableau and Microsoft Excel chart function. Once the analysis, interpretation, and any visualizations are complete, a report must be developed. It might be a formal written document, an email, or a presentation.
Regardless of the delivery method, the report needs to clearly state the original problem, the process that was used to address the problem, and then the results of the analysis along with the supporting visualization. This represents new knowledge and needs to be distributed to the stakeholders that were identified in step one. Finally, the new knowledge needs to be implemented to address the original problem. This will require the participation of the stakeholders. For more information on these topics, read the articles, Six Steps of an Analytics Project by Jadeep Kanduja, The Seven Key Steps of Data Analysis by Gwen Shapira.
Article URLs are mentioned in the reference slide at the end of this presentation. This concludes Lecture A of Component 24, Healthcare and Data Analytics. Unit 1, Introduction to Healthcare Data Analytics.
To summarize, analytics is the entire process of data collection, extraction, transformation, analysis, interpretation, and reporting. It can be categorized into three types, descriptive, predictive, and prescriptive.