Testing significance of rate of change in nutrient ratios against compound y - glm

I have been working out the rate of change in nutrient ratios against a compound (Y).
I can see that the rate of nutrient change decreases as the compound Y rate of change increases graphically.
Just wondering what are the best tests to prove the significance please? I was using GLMs but I have both negative and positive values so it is hard to find an appropriate family.
Here is one of my graphs:
Nutrient ratio rate of change against Compound Y rate of change
Any help you can give would be highly appreciated.
Kind regards,
A

Related

Use contrast as measure of effect size glmmTMB

I'm investigating the effect of an experimental treatment with the response variable being Weight (Wt). I have 2 control treatments (A,B) & 2 experimental treatments (C,D). I'm looking to see if the effect of treatment D relative to the control treatments increases with temperature (med_Hobo), and the same for treatment C relative to treatment D. (Additional fixed effects: Atl.Pac, NS, and random effect of Site)
Contrast matrix:
mat<-rbind(c(-0.5,-0.5,0,1),
c(0,0,-1,1))
Cmat<-ginv(mat)
Model:
WW<-glmmTMB(Wt~med_Hobo*Treatment*Atl.Pac*NS+(1|Site)+ (1|Site:Treatment),
data=WetWtsEnv,contrasts=list(Treatment=Cmat),family=Gamma(link="log"))
Using emmeans and emtrends I can demonstrate that the mean contrast 1 is significant while the mean constrast 2 is not. But I'm struggling to assess any change in the contrast with Temperature. In the figure below, Treatment D (blue) clearly diverges with Temperature, how do I demonstrate this in numbers/figures? E.g. I'd expect contrasts at med_Hobo=10 to be 1 and increase to a factor of ~2 at med_Hobo=30, with a corresponding increase in c.i.
Weight predictions based on glmmTMB model
Thanks in Advance!

Geometric Mean instead of Simple Mean

Appreciated if some one can explain me some use cases for geometric mean instead of simple Mean
Use of geometric mean:
A geometric mean is useful in machine learning when comparing items with a different number of properties and numerical ranges. The geometric mean normalizes the number ranges giving each property equal weight in the average. This contrasts with arithmetic mean where a larger number range would more greatly affect the average than a smaller number range. To better understand this try doing a geometric mean calculation compared with an arithmetic mean calculation using two numbers. Make one number be chosen from 0 to 5 and the other number from 0 to 100. Vary the two numbers to see how each affects the average.
Harmonic mean : Harmonic mean is a type of average generally used for numbers that represent a rate or ratio such as the precision and the recall in information retrieval. The harmonic mean can be described as the reciprocal of the arithmetic mean of the reciprocals of the data.
Use of Harmonic mean:
The harmonic mean is used in machine learning to calculate something called an F-score or F-measure. The F-score is a test for evaluating the performance of algorithms in information retrieval.

Fast Fourier Transform with negative amplitude

Using wavesurfer to analyse bird songs data and getting negative amplitudes with the FFT analysis that I can not find the reason why.
I have been using wavesurfer for analysing my paper data on birds song. I open my data with the spectrogram and then in spectrum section is possible to find frequency and amplitude (using FFT analysis). My amplitude is negative but I can not find a way to justify that. I dont know why is that and I have significant results on my data, meaning that everything needs to be justified. The forum of the software does not work and there is literally no answer to my question on the internet. I have even emailed the creators asking for help. Find a screen attached to the windows.
dB are on a logarithmic scale. The log of a small enough positive FFT magnitude can be negative.
e.g. 20*log10(0.1) = -20
We express values in dB in order to say something like: it's n times as large. A dB quantity is the log10 of a ratio, it is used to make a comparison between two measurements, for example output vs. input. Sometimes there is a single measurement compared with a fixed value. In that case we do not use the dB unit but some variants, e.g. if the reference is 1mW, we use dBm. The logarithmic scale facilitates calculations of attenuation and gain along a processing chain. A negative value indicates the ratio is less than 1, and the signal is attenuated (or less than the other), conversely a positive value indicates the signal is amplified (or greater than the other).
A dB (deci+Bel) is 0.1 B, so -60.1dB is -6.01B.
Power ratio
Originally the dB scale is related to power. Assuming the value is a power, you have to find the number which log10 is -6.01, a matter of taking the exponential of both sides:
log(x) = -6.01 (after conversion to B)
10^(log(x)) = 10^-6.01
x = 9.7e-07 ≈ 1/1,000,00
So your measurement is a power 1e6 times smaller than another power. Which one? It's not told, but it's likely the spectral line with the maximum value.
Voltage ratio
When dB relate to a ratio of voltages, the number is doubled for the same ratio. The reason is:
It is assumed power is increased (or decreased) by managing to increase (decrease) some voltage. A change in the same proportion occurs for the current according to Ohm's law.
Power is voltage x current, therefore to change the power in the ratio k, it's sufficient to change the voltage in the ratio sqrt(k). Log (sqrt(k)) = 1/2*log(k).
E.g.
Power increase from 10W to 20W, a ratio of 2, is log(2) = 0.3B or 3dB.
Voltage increase from 5V to 10V, also a ratio of 2 but 0.3B x 2 or 6dB.
Root-power quantities vs. root-power quantities
The rule actually separates power quantities (like power, energy or acoustic intensity) and root-power quantity (like voltage, sound pressure and electric field).

What should be the threshold for modified z score?

I am trying to find out outliers in my data set. I was previously using z score to calculate that.I was using a 99% confidence interval which is like +/- 2.576 on z score table. However i realized that calculating zscore using median absolute deviation would be better. I have the modified z score based on
0.0645*(x- median)/MAD
My problem is i am not sure whats a good cutoff in case of modified z score or is it based on kind of data i have?
This depends on the type of data you have. In general, median-based operations lose a little outlier information. However, the results for sufficiently large data sets should be similar, with the centroid shifted from the mean to the median; in a skewed data set, this will likely give you better results.
As for the cut-off point, here's a starting hint.
Think about the math: traditional Z-scores are based on a root-sum-square computation. Think about the root(N) factor in this. How would that affect your 99% point for the median computation, which is a simple linear computation?

Word2Vec: Number of Dimensions

I am using Word2Vec with a dataset of roughly 11,000,000 tokens looking to do both word similarity (as part of synonym extraction for a downstream task) but I don't have a good sense of how many dimensions I should use with Word2Vec. Does anyone have a good heuristic for the range of dimensions to consider based on the number of tokens/sentences?
Typical interval is between 100-300. I would say you need at least 50D to achieve lowest accuracy. If you pick lesser number of dimensions, you will start to lose properties of high dimensional spaces. If training time is not a big deal for your application, i would stick with 200D dimensions as it gives nice features. Extreme accuracy can be obtained with 300D. After 300D word features won't improve dramatically, and training will be extremely slow.
I do not know theoretical explanation and strict bounds of dimension selection in high dimensional spaces (and there might not a application-independent explanation for that), but I would refer you to Pennington et. al, Figure2a where x axis shows vector dimension and y axis shows the accuracy obtained. That should provide empirical justification to above argument.
I think that the number of dimensions from word2vec depends on your application. The most empirical value is about 100. Then it can perform well.
The number of dimensions reflects the over/under fitting. 100-300 dimensions is the common knowledge. Start with one number and check the accuracy of your testing set versus training set. The bigger the dimension size the easier it will be overfit on the training set and had bad performance on the test. Tuning this parameter is required in case you have high accuracy on training set and low accuracy on the testing set, this means that the dimension size is too big and reducing it might solve the overfitting problem of your model.

Resources