Main problem: Why are the predicted rent of customers with a certain level of income higher than rent of customers with both higher and lower levels of income keeping other variables fixed? And how may I address that?
Details: I have got data about customers' rent, income, age and no. of dependents. I binned the rent into 10 categories and fitted a KNN model (K = 3) to the data. Using this model, I tried to predict the rent of customers that belong to various groups that we didn't cover but for each feature in these instances, it is within the range of the particular feature in the data I got. For example, we have got information about customers with \$5000-5500 and \$6000-6500 in income, 50-55 and 60-65 in age, 3 and 5 dependents, I would like to predict the rent of customers that have \$5500-6000 in income, 55-60 in age and with 4 dependents.
What I did was to generate random samples from these groups (i.e. instances with incomes within 5500-6000, 55-60 in age and 4 dependents following the above example), and get the predicted rent class for each sample and took an average of the mean of the rent class of the samples (I didn't find an easy approach to predict the rent directly in SAS so I have to predict the class first then take an average of the class means).
However, in the results I got, some of the predicted rent were not very intuitive, so say if I fix age and number of dependents, I could get a predicted mean rent higher than the mean rents for groups with both higher and lower income. I'm not sure how to address that and would really appreciate some help. Thanks.