I have a dataset of variable $x$ that has a value between 0 and 6. I would like to have a function that defines empirical CDF of variable $x$. Since $x$ does not have a specific distribution (such as Gaussian, etc.), I need to rely on data values to create this function. Using the following code, I can plot the empirical CDF as:
max_diam = 6 ax = sns.distplot(x, hist_kws=dict(cumulative=True), kde_kws=dict(cumulative=True)).set(xlim=(0, max_diam)) ax = sns.kdeplot(x, bw=.1, cumulative=True).set(xlim=(0, max_diam), ylim=(0, 1.0))#, color="r") plt.show()
Now I would like to find the function that
kdeplot uses to plot CDF. I have tried to do regression, but the quality is not good, as there is only a single point after 4.9 (6.0) which makes the plot overfit for high orders and underfit in low orders
def ecdf(data): # Compute ECDF x = np.sort(data) n = x.size y = np.arange(1, n+1) / n return(x,y) x, y = ecdf(x) degree=7 lw = 2 plt.scatter(x=x, y=y, s=10); plt.xlabel('x', fontsize=16) plt.ylabel('y', fontsize=16) polynomial_features= PolynomialFeatures(degree) x_poly = polynomial_features.fit_transform(x.reshape(-1, 1)) model = skl.LinearRegression() model.fit(x_poly, y) x_test = polynomial_features.fit_transform(x_plot.reshape(-1, 1)) y_test = model.predict(x_test) plt.plot(x_plot, y_test, color='yellowgreen', linewidth=lw, label="degree %d" % degree) plt.show()
So my question is:
kdeplotis using for plotting the orange line?