- a. Randomly remove 5% of values and mark the removed values as missing
- b. Employ 3 nearest neighbor imputation methods (1-NN, k-NN and weighted k-NN) [hint: distance from complete cases]. Choose any k>=5. For continuous features, use 2 different distance measures and for categorical, use 1 distance measure. You cannot use any existing library to calculate distance or to implement imputation.
- c. For each feature, calculate the accuracy of imputation. You will define your own accuracy measures for both continuous and categorical features [hint: how different the imputed values are from the original values]. d. For each feature, conduct above steps (a-c) on the original data.csv file.
- Repeat above three steps (a-d) where you remove 10% and 20% of values from each feature for creating missingness. The output (imputation accuracy) of the program should be written on “results.csv”. the output file will list the features and corresponding imputation accuracy for all possible combinations of methods, such as: 1-NN & distance measure 1, 1-NN & distance measure 2, etc.
- 2) Repeat the experiments in Step 1 where distance is calculated after scaling the continuous features using 2 different feature scaling methods of your choice. You are allowed to use existing libraries for feature scaling. Note that, imputation should be done using original values, not the scaled ones. Add imputation accuracies from the experiments using scaled values to the file “results.csv”.

- Math Made Easy – Free Online Math Problem Solvers
- Top 10 Best Image to Text Converter
- CPT 209 Chapter 15 Quiz – Troubleshooting Windows Startup
- CPT 209 Chapter 13 Quiz Maintaining Windows
- CPT 209 Chapter 14 Quiz Troubleshooting Windows After Startup

- a. Data: Describe features, instances and the source.
- b. Methods: Explain the 3 imputation methods, 3 distance methods, 2 feature scaling methods, and imputation accuracy measures used in your experiments.
- c. Tools: Mention which language (R or Python), version and IDE you used for implementation. Mention if any library needs to be installed for running your code. Provide installation instructions. d. Results: Present imputation accuracies for each feature for all 3 different missing percentages (5%, 10%, and 20%) and different imputation methods.
- Use tables and graphs for presenting the results. Also, present comparative analysis of imputation performance when original values are used for distance calculation vs. when scaled values are used. All of these should be reported for two different distance measures.

- Source code: Add enough comments in the code explaining your program. Commands to load necessary libraries must be included in your code.
- data.csv
- data.csv
- results.csv
- Report (both word and pdf document)

%d bloggers like this: