# Comparing Power Law with other Distributions

I'm using Aaron Clauset's powerlaw package to try fitting my data to a Power Law.

First, some details on my data:

1. It is discrete (word count data);
2. It is heavily skewed to the left (skewness is approx. 16)
3. It is Leptokurtic (kurtosis is approx. 300)

What I have done so far

df_data is my Dataframe, where word_count is a Series containing word count data for around 1000 word tokens.

First I've generated a fit object:

``````fit = powerlaw.Fit(data=df_data.word_count, discrete=True, verbose=False, xmin=1, xmax=200)
``````

Next, I compare the powerlaw distribution for my data against other distributions - namely, lognormal, exponential, lognormal_positive, stretched_exponential and truncated_powerlaw, with the fit.distribution_compare(distribution_one, distribution_two) method.

As a result of the distribution_compare method, I've obtained the following (r,p) tuples for each of the comparisons:

• fit.distribution_compare('power_law', 'lognormal') = (0.35617607052907196, 0.73466960075186816)
• fit.distribution_compare('power_law', 'exponential') = (481.35250943681206, 4.3450007097178692e-05)
• fit.distribution_compare('power_law', 'lognormal_positive') = (89.186233734863649, 4.1315378698322223e-08)
• fit.distribution_compare('power_law', 'stretched_exponential') = (1.7564708682020371, 0.2974294888802046)
• fit.distribution_compare('power_law', 'truncated_power_law') =(-0.003684604382383605, 0.93159035254165268)

From the powerlaw documentation:

R : float

The loglikelihood ratio of the two sets of likelihoods. If positive, the first set of likelihoods is more likely (and so the probability distribution that produced them is a better fit to the data). If negative, the reverse is true.

p : float

The significance of the sign of R. If below a critical value (typically .05) the sign of R is taken to be significant. If above the critical value the sign of R is taken to be due to statistical fluctuations.

From the comparison results between powerlaw, exponential and lognormal distributions, I feel inclined to say that I have a powerlaw distribution.

Would this be a correct interpretation/assumption about the test results? Or perhaps I'm missing something?