HAP 464 – Homework Assignment

Instructions:

There are 2 sections in the homework.

• Section 1 is worth 40 points. This section is purely theory. Submit a clear and crisp answer. Code is not required. You may submit code, but it wouldn’t be worth any points.
• Section 2 is worth 60 points. This section is purely mathematical/coding related. Approach the question either through mathematical steps or coding. Whatever is followed, needs to be submitted. Every step must be shown. Results need to be submitted too.
• While submitting any formulae as your answer, DO NOT copy/paste it from some resource, instead use Word resources and create it.

Section I:

Q1. This question has 2 sub parts which are as follows:

1. What is the difference between prognosis and diagnosis? – 3 points

In the context of SQL and database design, diagnosis refers to the process of identifying and resolving issues or problems with the database. It involves analyzing the symptoms and identifying the root cause of the problem.

Prognosis, on the other hand, refers to the prediction of future database performance based on current trends and patterns. It involves analyzing the performance data and predicting how the database will perform under different conditions or with different loads. In summary, diagnosis in the context of SQL and database design refers to identifying and resolving current issues or problems, while prognosis refers to predicting future performance based on current trends and patterns.

1. What is the assumption made while calculating the multimorbidity index? – 2 points

a multimorbidity index may not be directly applicable. However, assuming you are referring to database indexing, the assumption made while calculating a database index is that it will be used to speed up the execution of queries that filter or sort data based on the indexed columns. This assumption is based on the fact that indexing can significantly reduce the number of disk accesses required to locate the desired data, which can improve query performance.

Another assumption made when calculating a database index is that the indexed columns will be frequently used in queries, and that the overhead of maintaining the index (e.g., during inserts, updates, and deletes) will be offset by the performance gains when executing queries. Additionally, it is assumed that the index will not consume too much disk space or memory and that the performance of other database operations will not be significantly impacted by the presence of the index.

Overall, the assumptions made when calculating a database index are based on the tradeoff between query performance and the cost of maintaining the index, which can vary depending on the specific database design and optimization goals.

Q2. Explain the contingency matrix used to calculate LR and deduce the formula? – 10 points

In the context of SQL database design and optimization, a contingency matrix is a tool used to evaluate the effectiveness of an index in improving query performance.

The contingency matrix is a two-dimensional table that cross-tabulates the actual values of a binary classification problem with the predicted values of a model. It is also called a confusion matrix, as it shows how many correct and incorrect predictions a model made compared to the actual results.

Here is an example contingency matrix for a binary classification problem:

In this example, the model makes predictions of “Negative” and “Positive” outcomes, while the actual outcomes can be “Negative” or “Positive” as well. The four possible outcomes are:

• True Negative (TN): The model predicted “Negative,” and the actual outcome was also “Negative.”
• False Positive (FP): The model predicted “Positive,” but the actual outcome was “Negative.”
• False Negative (FN): The model predicted “Negative,” but the actual outcome was “Positive.”
• True Positive (TP): The model predicted “Positive,” and the actual outcome was also “Positive.”

Using the contingency matrix, we can calculate several performance metrics for our model, including sensitivity, specificity, precision, and recall. One such metric is the likelihood ratio (LR), which measures the ratio of the probability of a positive test result in people with the condition to the probability of a positive test result in people without the condition.

The formula for calculating the LR from the contingency matrix is:

LR = (TP / FN) / (FP / TN)

In SQL database design and optimization, the contingency matrix can be used to evaluate the effectiveness of an index in improving query performance. By analyzing the query execution plan, we can determine which index, if any, is used to satisfy a query. We can then use the contingency matrix to evaluate the effectiveness of the index in reducing the number of rows that need to be scanned. A higher value of LR indicates a better index, as it reduces the number of false positives and false negatives.

Q3. Tabulate the differences between ICD9 and ICD10 coding systems (at least 5 differences)? – 10 points

Q4. Explain the steps performed during body system adjustment? – 5 points

ICD9 (International Classification of Diseases, 9th Revision) and ICD10 (International Classification of Diseases, 10th Revision) are both systems of medical classification codes used to classify and code diseases, symptoms, and procedures. Here are at least five differences between the two systems:

1. Number of codes: ICD9 has approximately 14,000 codes, while ICD10 has approximately 69,000 codes. This means that ICD10 is more detailed and allows for more specific coding of medical conditions and procedures.
2. Structure of codes: ICD9 codes are three to five digits long, while ICD10 codes are seven characters long and include alphanumeric characters. The first three characters in ICD10 codes represent the category of disease, while the remaining characters provide more detailed information about the condition.
3. Use of decimals: ICD9 codes do not use decimals, while ICD10 codes use decimals to indicate greater specificity. For example, in ICD9, a code for a fracture of the radius might be 813, while in ICD10, there are separate codes for a fracture of the proximal radius (S52.51) and a fracture of the distal radius (S52.62).
4. Combination codes: ICD10 includes many more combination codes, which allow for the coding of both the condition and any related complications or manifestations. For example, there is a single ICD10 code for diabetes with peripheral circulatory disorders, while in ICD9, separate codes would be needed for the diabetes and the peripheral circulatory disorder.
5. Implementation date: ICD9 was first published in 1977 and was used in the United States until October 1, 2015, when it was replaced by ICD10. ICD10 has been used internationally since 1994, but the United States was one of the last countries to adopt it. The delay in implementation in the US was due to concerns about the increased complexity and potential costs associated with the transition.

Q5. What do Sensitivity and Specificity mean? What do they signify? – 10 points

Section II:

Q1. Calculate the likelihood ratios associated with each of the ICDs listed in the table below. Consider the outcome as having depression – 20 points

Q2. Consider a male patient who has all the conditions listed in the table above and calculate his multimorbidity index of developing depression – 20 points

Q3. Calculate the area under ROC associated with the sensitivity and specificity values attached in the below table and report it. Submit the picture of how ROC looks too – 20 points