Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.

Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.

The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.

In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.

This project is divided into 4 parts:

  • Part 1: Summarizing the data.
  • Part 2: Answering specific question about medication adherence.
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
  • Part 4: Identifying opportunities.

Details

Part 1: Summary

How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.

This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

Part 2: Specific Questions

In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).

This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.

These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.

Questions

  1. What was the median length of followup?  What percentage of the patients had at least 1 year of records?
  2. For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
  3. How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
  4. What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine.  Then briefly comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  1. For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
  2. Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
    1. Age
    1. Baseline Conditions
  3. How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  4. For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  5. How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
    1. Identify which patients filled a prescription in the first two weeks.
    1. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
    1. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

  • ACE Inhibitors
    • Beta Blockers
    • Statins
  1. Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Questions

  1. Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
  2. If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
  3. How would you build on the reporting capabilities that you have created? What would you design next?

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Questions

  1. What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
  2. What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
  3. How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

Notes

This project involves extensive coding on large data sets. As a training exercise, we are asking you to develop your skills in R and to use the data.table package for processing applications.

Part of your work is to understand what is measured in the tables. The fields in the tables should make intuitive sense. However, the organization has not provided a data dictionary. It will be up to you to gain this understanding of what is measured.

The goal of this project is to present a report that can be delivered to managers of the client’s organization. To that end, the final output should not show your code or calculations. Make the explanations clear and concise.

Assessment

  • Part 1: Summarizing the adherence data: 10 points, evenly divided among each table.
  • Part 2: Answering specific question about adherence: 50 points, evenly divided among each question. Partial credit may be awarded for modest effort in the right direction (1-2 points) or a largely correct approach with small mistakes (3-4 points).
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond this month’s report: 15 points, evenly divided among each question. Partial credit may be awarded based on judgments.
  • Part 4: Identifying opportunities: 25 points, with 5 points apiece for Questions 1-3 and 10 points for Question 4.

Submission

Please turn in the following files:

  • Your reporting code (.Rmd file);
  • Output file showing your answers to parts 1-4 (.html file);

Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.

Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.

The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.

In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.

This project is divided into 4 parts:

  • Part 1: Summarizing the data.
  • Part 2: Answering specific question about medication adherence.
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
  • Part 4: Identifying opportunities.

Details

Part 1: Summary

How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.

This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

Part 2: Specific Questions

In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).

This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.

These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.

Questions

  1. What was the median length of followup?  What percentage of the patients had at least 1 year of records?
  2. For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
  3. How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
  4. What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine.  Then briefly comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  1. For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
  2. Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
    1. Age
    1. Baseline Conditions
  3. How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  4. For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  5. How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
    1. Identify which patients filled a prescription in the first two weeks.
    1. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
    1. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

  • ACE Inhibitors
    • Beta Blockers
    • Statins
  1. Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Questions

  1. Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
  2. If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
  3. How would you build on the reporting capabilities that you have created? What would you design next?

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Questions

  1. What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
  2. What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
  3. How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

Notes

This project involves extensive coding on large data sets. As a training exercise, we are asking you to develop your skills in R and to use the data.table package for processing applications.

Part of your work is to understand what is measured in the tables. The fields in the tables should make intuitive sense. However, the organization has not provided a data dictionary. It will be up to you to gain this understanding of what is measured.

The goal of this project is to present a report that can be delivered to managers of the client’s organization. To that end, the final output should not show your code or calculations. Make the explanations clear and concise.

Assessment

  • Part 1: Summarizing the adherence data: 10 points, evenly divided among each table.
  • Part 2: Answering specific question about adherence: 50 points, evenly divided among each question. Partial credit may be awarded for modest effort in the right direction (1-2 points) or a largely correct approach with small mistakes (3-4 points).
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond this month’s report: 15 points, evenly divided among each question. Partial credit may be awarded based on judgments.
  • Part 4: Identifying opportunities: 25 points, with 5 points apiece for Questions 1-3 and 10 points for Question 4.

Submission

Please turn in the following files:

  • Your reporting code (.Rmd file);
  • Output file showing your answers to parts 1-4 (.html file);

Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.

Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.

The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.

In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.

This project is divided into 4 parts:

  • Part 1: Summarizing the data.
  • Part 2: Answering specific question about medication adherence.
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
  • Part 4: Identifying opportunities.

Details

Part 1: Summary

How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.

This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

Part 2: Specific Questions

In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).

This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.

These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.

Questions

  1. What was the median length of followup?  What percentage of the patients had at least 1 year of records?
  2. For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
  3. How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
  4. What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine.  Then briefly comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  1. For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
  2. Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
    1. Age
    1. Baseline Conditions
  3. How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  4. For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  5. How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
    1. Identify which patients filled a prescription in the first two weeks.
    1. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
    1. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

  • ACE Inhibitors
    • Beta Blockers
    • Statins
  1. Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Questions

  1. Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
  2. If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
  3. How would you build on the reporting capabilities that you have created? What would you design next?

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Questions

  1. What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
  2. What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
  3. How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

Notes

This project involves extensive coding on large data sets. As a training exercise, we are asking you to develop your skills in R and to use the data.table package for processing applications.

Part of your work is to understand what is measured in the tables. The fields in the tables should make intuitive sense. However, the organization has not provided a data dictionary. It will be up to you to gain this understanding of what is measured.

The goal of this project is to present a report that can be delivered to managers of the client’s organization. To that end, the final output should not show your code or calculations. Make the explanations clear and concise.

Assessment

  • Part 1: Summarizing the adherence data: 10 points, evenly divided among each table.
  • Part 2: Answering specific question about adherence: 50 points, evenly divided among each question. Partial credit may be awarded for modest effort in the right direction (1-2 points) or a largely correct approach with small mistakes (3-4 points).
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond this month’s report: 15 points, evenly divided among each question. Partial credit may be awarded based on judgments.
  • Part 4: Identifying opportunities: 25 points, with 5 points apiece for Questions 1-3 and 10 points for Question 4.

Submission

Please turn in the following files:

  • Your reporting code (.Rmd file);
  • Output file showing your answers to parts 1-4 (.html file);

Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.

Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.

The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.

In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.

This project is divided into 4 parts:

  • Part 1: Summarizing the data.
  • Part 2: Answering specific question about medication adherence.
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
  • Part 4: Identifying opportunities.

Details

Part 1: Summary

How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.

This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data.  To that end, we recommend providing some instructions that will help other consultants use the information more effectively.

Part 2: Specific Questions

In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).

This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.

These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.

Questions

  1. What was the median length of followup?  What percentage of the patients had at least 1 year of records?
  2. For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
  3. How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
  4. What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine.  Then briefly comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  1. For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
  2. Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
    1. Age
    1. Baseline Conditions
  3. How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  4. For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins
  5. How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
    1. Identify which patients filled a prescription in the first two weeks.
    1. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
    1. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

  • ACE Inhibitors
    • Beta Blockers
    • Statins
  1. Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
    1. ACE Inhibitors
    1. Beta Blockers
    1. Statins

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Questions

  1. Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
  2. If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
  3. How would you build on the reporting capabilities that you have created? What would you design next?

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Questions

  1. What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
  2. What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
  3. How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

Notes

This project involves extensive coding on large data sets. As a training exercise, we are asking you to develop your skills in R and to use the data.table package for processing applications.

Part of your work is to understand what is measured in the tables. The fields in the tables should make intuitive sense. However, the organization has not provided a data dictionary. It will be up to you to gain this understanding of what is measured.

The goal of this project is to present a report that can be delivered to managers of the client’s organization. To that end, the final output should not show your code or calculations. Make the explanations clear and concise.

Assessment

  • Part 1: Summarizing the adherence data: 10 points, evenly divided among each table.
  • Part 2: Answering specific question about adherence: 50 points, evenly divided among each question. Partial credit may be awarded for modest effort in the right direction (1-2 points) or a largely correct approach with small mistakes (3-4 points).
  • Part 3: Generalizing and automating the reporting infrastructure for use beyond this month’s report: 15 points, evenly divided among each question. Partial credit may be awarded based on judgments.
  • Part 4: Identifying opportunities: 25 points, with 5 points apiece for Questions 1-3 and 10 points for Question 4.

Submission

Please turn in the following files:

  • Your reporting code (.Rmd file);
  • Output file showing your answers to parts 1-4 (.html file);

Q7

How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.

##    (Intercept)   sexMale regionNortheast regionSouth  regionWest  diabetes
## 1:    2.022171 -0.141174     -0.01168398  -0.1026476 -0.03691874 0.1332725
##    baseline.conditionmoderate symptoms or light procedure
## 1:                                             -0.3478524

ACE Inhibitors

##
## Call:
## glm(formula = ace ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records2.followup1)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.6662  -0.4312  -0.3518   0.5429   0.7087 
##
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                             0.852146   0.097394
## age                                                    -0.004460   0.001454
## sexMale                                                -0.058348   0.017690
## regionNortheast                                        -0.033400   0.025736
## regionSouth                                            -0.044898   0.028417
## regionWest                                             -0.018559   0.025064
## diabetes                                                0.028163   0.020484
## baseline.conditionmoderate symptoms or light procedure -0.114196   0.018543
##                                                        t value Pr(>|t|)   
## (Intercept)                                              8.749  < 2e-16 ***
## age                                                     -3.067 0.002179 **
## sexMale                                                 -3.298 0.000983 ***
## regionNortheast                                         -1.298 0.194467   
## regionSouth                                             -1.580 0.114224   
## regionWest                                              -0.740 0.459067   
## diabetes                                                 1.375 0.169270   
## baseline.conditionmoderate symptoms or light procedure  -6.159 8.28e-10 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2423127)
##
##     Null deviance: 771.64  on 3130  degrees of freedom
## Residual deviance: 756.74  on 3123  degrees of freedom
## AIC: 4457.1
##
## Number of Fisher Scoring iterations: 2

Beta Blockers

##
## Call:
## glm(formula = bb ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records2.followup1)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.7930  -0.5352   0.3004   0.4337   0.6198 
##
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                             1.124517   0.096306
## age                                                    -0.006749   0.001438
## sexMale                                                -0.060498   0.017492
## regionNortheast                                         0.002688   0.025449
## regionSouth                                            -0.014593   0.028100
## regionWest                                              0.008513   0.024784
## diabetes                                                0.058104   0.020255
## baseline.conditionmoderate symptoms or light procedure -0.138941   0.018335
##                                                        t value Pr(>|t|)   
## (Intercept)                                             11.676  < 2e-16 ***
## age                                                     -4.693 2.81e-06 ***
## sexMale                                                 -3.459  0.00055 ***
## regionNortheast                                          0.106  0.91589   
## regionSouth                                             -0.519  0.60356   
## regionWest                                               0.344  0.73124   
## diabetes                                                 2.869  0.00415 **
## baseline.conditionmoderate symptoms or light procedure  -7.578 4.60e-14 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2369302)
##
##     Null deviance: 763.03  on 3130  degrees of freedom
## Residual deviance: 739.93  on 3123  degrees of freedom
## AIC: 4386.8
##
## Number of Fisher Scoring iterations: 2

Statins

##
## Call:
## glm(formula = statin ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records2.followup1)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.8643  -0.6227   0.2567   0.3247   0.4542 
##
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                             1.079680   0.089994
## age                                                    -0.004810   0.001344
## sexMale                                                -0.021788   0.016345
## regionNortheast                                         0.025104   0.023781
## regionSouth                                            -0.044068   0.026258
## regionWest                                             -0.021163   0.023159
## diabetes                                                0.055488   0.018928
## baseline.conditionmoderate symptoms or light procedure -0.096550   0.017134
##                                                        t value Pr(>|t|)   
## (Intercept)                                             11.997  < 2e-16 ***
## age                                                     -3.579  0.00035 ***
## sexMale                                                 -1.333  0.18263   
## regionNortheast                                          1.056  0.29121    
## regionSouth                                             -1.678  0.09340 . 
## regionWest                                              -0.914  0.36088   
## diabetes                                                 2.932  0.00340 **
## baseline.conditionmoderate symptoms or light procedure  -5.635  1.9e-08 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2068897)
##
##     Null deviance: 658.99  on 3130  degrees of freedom
## Residual deviance: 646.12  on 3123  degrees of freedom
## AIC: 3962.3
##
## Number of Fisher Scoring iterations: 2

Q8

For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.

ACE Inhibitors

##      Average standard_deviation Median_value
## 1: 0.4404344          0.4965186            0

Beta Blockers

##      Average standard_deviation Median_value
## 1: 0.5793676          0.4937394            1

Statins

##      Average standard_deviation Median_value
## 1: 0.6988183          0.4588448            1

Q9

How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:

  1. Identify which patients filled a prescription in the first two weeks.
  2. Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
  3. Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.

Perform this analysis for each medicine and comment on the results.

##
## Call:
## glm(formula = ace + bb + statin ~ age + sex + region + diabetes +
##     baseline.condition, data = Year_records.m)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -2.1529  -0.7236   0.1453   0.9349   1.5359 
##
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                             2.4164913  0.0083321
## age                                                    -0.0079752  0.0001245
## sexMale                                                -0.0435096  0.0014781
## regionNortheast                                         0.0319705  0.0021330
## regionSouth                                            -0.0259724  0.0023992
## regionWest                                              0.0328137  0.0020657
## diabetes                                                0.0863697  0.0017296
## baseline.conditionmoderate symptoms or light procedure -0.2130398  0.0015346
##                                                        t value Pr(>|t|)   
## (Intercept)                                             290.02   <2e-16 ***
## age                                                     -64.04   <2e-16 ***
## sexMale                                                 -29.44   <2e-16 ***
## regionNortheast                                          14.99   <2e-16 ***
## regionSouth                                             -10.83   <2e-16 ***
## regionWest                                               15.88   <2e-16 ***
## diabetes                                                 49.94   <2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -138.83   <2e-16 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.8490106)
##
##     Null deviance: 1345795  on 1558160  degrees of freedom
## Residual deviance: 1322888  on 1558153  degrees of freedom
## AIC: 4166834
##
## Number of Fisher Scoring iterations: 2

Age, region North East, North West and diabetes had a positive association towards patient medication followup. #### ACE Inhibitors

##
## Call:
## glm(formula = ace ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records.m)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.6412  -0.4979   0.4022   0.4937   0.5874 
##
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                             0.7007178  0.0045031
## age                                                    -0.0022726  0.0000673
## sexMale                                                -0.0114760  0.0007988
## regionNortheast                                         0.0120714  0.0011528
## regionSouth                                            -0.0093569  0.0012966
## regionWest                                              0.0175783  0.0011164
## diabetes                                                0.0320078  0.0009348
## baseline.conditionmoderate symptoms or light procedure -0.0763906  0.0008294
##                                                        t value Pr(>|t|)   
## (Intercept)                                            155.608  < 2e-16 ***
## age                                                    -33.766  < 2e-16 ***
## sexMale                                                -14.366  < 2e-16 ***
## regionNortheast                                         10.472  < 2e-16 ***
## regionSouth                                             -7.216 5.34e-13 ***
## regionWest                                              15.745  < 2e-16 ***
## diabetes                                                34.242  < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -92.107  < 2e-16 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2479834)
##
##     Null deviance: 389211  on 1558160  degrees of freedom
## Residual deviance: 386396  on 1558153  degrees of freedom
## AIC: 2249190
##
## Number of Fisher Scoring iterations: 2

Region North East, West, and diabetes had a positive influence on the patient’s likelihood for making a medication followup within a year, for ACE inhibitors. #### Beta Blockers

##
## Call:
## glm(formula = bb ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records.m)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.7435  -0.5734   0.3482   0.4142   0.5322 
##
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                             8.815e-01  4.414e-03
## age                                                    -3.668e-03  6.597e-05
## sexMale                                                -2.195e-02  7.830e-04
## regionNortheast                                         1.576e-02  1.130e-03
## regionSouth                                            -9.224e-03  1.271e-03
## regionWest                                              9.869e-03  1.094e-03
## diabetes                                                2.817e-02  9.163e-04
## baseline.conditionmoderate symptoms or light procedure -7.444e-02  8.130e-04
##                                                        t value Pr(>|t|)   
## (Intercept)                                            199.703  < 2e-16 ***
## age                                                    -55.599  < 2e-16 ***
## sexMale                                                -28.027  < 2e-16 ***
## regionNortheast                                         13.951  < 2e-16 ***
## regionSouth                                             -7.258 3.94e-13 ***
## regionWest                                               9.018  < 2e-16 ***
## diabetes                                                30.739  < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -91.571  < 2e-16 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2382653)
##
##     Null deviance: 374435  on 1558160  degrees of freedom
## Residual deviance: 371254  on 1558153  degrees of freedom
## AIC: 2186899
##
## Number of Fisher Scoring iterations: 2

Region North East and South, and diabetes influenced the patient’s likelihood to make a medication followup with the 365 days, for Beta Blockers. #### Statins

##
## Call:
## glm(formula = statin ~ age + sex + region + diabetes + baseline.condition,
##     data = Year_records.m)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max 
## -0.7682  -0.6358   0.3019   0.3520   0.4163 
##
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                             8.343e-01  4.254e-03
## age                                                    -2.035e-03  6.358e-05
## sexMale                                                -1.009e-02  7.547e-04
## regionNortheast                                         4.135e-03  1.089e-03
## regionSouth                                            -7.391e-03  1.225e-03
## regionWest                                              5.367e-03  1.055e-03
## diabetes                                                2.620e-02  8.831e-04
## baseline.conditionmoderate symptoms or light procedure -6.221e-02  7.835e-04
##                                                        t value Pr(>|t|)   
## (Intercept)                                            196.118  < 2e-16 ***
## age                                                    -31.999  < 2e-16 ***
## sexMale                                                -13.367  < 2e-16 ***
## regionNortheast                                          3.797 0.000147 ***
## regionSouth                                             -6.034 1.60e-09 ***
## regionWest                                               5.089 3.61e-07 ***
## diabetes                                                29.666  < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -79.397  < 2e-16 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## (Dispersion parameter for gaussian family taken to be 0.2213089)
##
##     Null deviance: 346677  on 1558160  degrees of freedom
## Residual deviance: 344833  on 1558153  degrees of freedom
## AIC: 2071867
##
## Number of Fisher Scoring iterations: 2

Region North East, West, and diabetes had signficant p-values hence an influence of the patient’s likelihood to seek statin medication followup within a year provided they made refilling within 2 first weeks after diagnosis.

Q10

Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.

##    Average standard_deviation Median_b
## 1: 31.2253           34.22754       17

ACE Inhibitors

##    ace Average standard_deviation Median_b
## 1:   1 31.2253           34.22754       17
## 2:   0 31.2253           34.22754       17

Beta Blockers

##    bb Average standard_deviation Median_b
## 1:  0 31.2253           34.22754       17
## 2:  1 31.2253           34.22754       17

Statins

##    statin Average standard_deviation Median_b
## 1:      0 31.2253           34.22754       17
## 2:      1 31.2253           34.22754       17

Part 3: Generalization

This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.

Q1

Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?

The problems in the data include missing values and data for adherence and baseline separated. The data should be collected in one form to help easily link the baseline conditions to patients and conduct a summary for each region. It will benefit by having a lesser workload. This should be reported to the data clerk or collection technicians.

Q2

If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?

The organization will need reporting capabilities by having the data analysis conducted in this exercise turned into an automated script whereby it will keep capturing data through time and generating the reports. The frequency should be daily and the analysis carried out for every 365 rolling window days. That is, an analysis report generated for every 365 days.

Q3

How would you build on the reporting capabilities that you have created? What would you design next?

The reporting should include a word report and a data visualization demonstrationion. For instance, the visualization will show the effect of variables like age, sex, region, baseline condition, and dabetes on the patient’s likelihood to make a medication followup or refill for ACE Inhibitors, Beta Blockers, and Statins. The next design will be reports that comprise of both the data analysis and insightful comments based on the output.

Part 4: Opportunities

This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.

Q1

What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.

The opportunities to learn value information will give insights for strategic decision-making. This include establishing the effect of age on health condition and sex-based discrepancies in health.

Does age influence baseline condition? What is the effect of diabetes on baseline condition? What are the discrepancies in health condition based on age and sex?

Q2

What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?

Based on the analysis report, the intervention to improve adherence to medication will include reminders to take medication, deploying home care nurses , and educating family members to offer supportive services to the individuals. working with individuals aged below 13 and those above 50 years. It will help them adhere to medication patterns and maintain a health lifestyle by ensuring they get medical checkups using their insurance covers.

Q3

How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?

The information will be presented in graphs showing the adherence patterns for the different user groups based on age and sex. This will help them note the areas of concern such as individuals below the age of 13 and those above age of 50 or the elderly. Graphical representation of information with demonstration will make it easier for them to review and understand since they do not have the data analysts technical skills.

Q4

Video Submission: Make a 2-minute pitch to the client with a proposal for the next phase of work. Include in your request a budget, a time frame, and staffing levels. Explain why this proposal would be valuable for the client and worth the investment in your consulting services. Please submit this answer as a short video recording. You may use any video recording program you feel comfortable with. The only requirements are that you are visible and audible in the video. You may also submit a file of slides if that is part of your pitch.

Subscribe For Latest Updates
Let us notify you each time there is a new assignment, book recommendation, assignment resource, or free essay and updates