Part 1: Identify the Questions
Q1. Thinking about different loan applicants in general, how would you expect them to fall into different groups?
I’d expect them to be divided into categories depending on their earnings or credit scores. Those with a greater salary often have a higher credit rating, making them more likely to accept a loan. Applicants with lesser incomes may not have enough funds to pay off debt, resulting in a poorer credit rating, making it difficult to accept someone like that for a loan. You normally do not make loans to those who do not seem to repay the debt.
Q2. When evaluating previous loan data, what would you expect your target variable to be?
As far as I can tell, the goal variable is whether or not the loan was accepted. The remaining data will be utilized to determine whether a credit has been approved or declined.
Q3. What factors do you think would affect whether a loan will be accepted or rejected?
I believe that criteria such as their history of repaying prior debts are significant. As a result, a credit rating is an excellent indicator of whether a loan will be granted or declined. Other considerations include income, the purpose for the borrowing, age, and job status.
Q4. Identify the data you would need to answer your questions and validate your hypothesis?
I’d want to get data on past loan applicants, including their income, credit score, work status, and the outcome depending on their demographics.
Part 2: Master the Data
Q5. Given this list of attributes, what concerns do you have with the data’s ability to predict answers to the questions you’ve identified before?
One issue I have is that there isn’t enough data to portray loan acceptances and denials accurately.
Q6. What does the lack of attributes in the Reject Data files tell us about the data that Lending Club retains on rejected loans?
It is said that they do not save a lot of data regarding rejected loans. One may easily conclude that Lending Club prioritizes approved loans and ignores rejected ones.
Q7. How will that affect a classification analysis
It is possible to form bigger groupings with variations. Classification analysis is the process of categorizing a set of data to predict it. With fewer variables, you risk categorizing sections of the data that do not truly meet the criterion for inclusion in that class. You wouldn’t be able to know since there aren’t any other qualities.
Part 3: Perform an Analysis of the Data
Q8. Which model has the highest accuracy? How do you know?
1- Random Forest
4- Bayes Net
Because it accurately identified 91.6 per cent of the data, the Bayes Net had the best accuracy.
Part4: Address and refine the result
Q9. How useful is your classification model in predicting which applicants will be approved or rejected? How do you know?
After running the classification model, please save it to your desktop and re-enter the original data into Weka. You would return to the categorize section and load the model after re-uploading the data in Weka. This is when you may reassess your model, which will give you a prediction column. My properly categorized cases dropped dramatically to 31%, indicating that my model does not predict which applications would be accepted or denied.
Q10. How would you interpret the results of your analysis in plain English?
I would claim that the Bayes net classification model was ineffective in predicting which applications will be accepted or denied since it could only predict 31 per cent of the data accurately out of 100 per cent of the data.