I need to run an experiment with customer support agents who are rated on two metrics by the customer: average issue resolution rate (proportion of issues resolved) and average star rating (1-5).
Whenever an agent gets a new issue, I plan to show a feature to the agent (or not show) 50% of the time at random.
Once the experiment is over, I would like to compare if there is
1. A difference in the average star rating between the issues where the feature was shown and the issues where the feature was not shown
2. A difference in the resolution rate
A couple of additional points:
1. The customer will not always leave a feedback. They on average leave feedback about resolution ~15% of the time and a star rating ~7% of the time.
2. The star ratings are discrete (1-5 stars) with most of them being 5*, then a few 1* and the rest are very low. The average star rating is around 4.3 in the general population and resolution rate is around 55%
1. What are the correct statistical tests for the above two comparisons?
2. How do I determine the sample size needed for determining an effect size of 0.1 in the star rating and 1% in the resolution rate with 90% power?
If someone could work the math out here, I'd be super delighted!
Thanks in advance for the help!