Statistical Comparison of Folate Content Across Food Groups

Use hypothesis testing and distribution analysis to compare folate levels between two groups of foods

Folate is an essential nutrient involved in DNA synthesis and cell growth. Compare folate levels in berries and green vegetables using statistics.

Define two groups of Food entities, berries and green vegetables:

In[1]:=

berries=

food

food type

:exactly

strawberry

FOOD TYPE

added food types

:exactlynone

food

food type

:exactly

blueberry

FOOD TYPE

added food types

:exactlynone

food

food

food

food

food

food

food

food

food

food

food

food

food

food

;greens=

food

food type

:exactly

spinach

FOOD TYPE

added food types

:exactlynone

food

food type

:exactly

kale

FOOD TYPE

added food types

:exactlynone

food

food

food

food

food

food

food

food

food

food

food

food

food

food

;

Retrieve the folate values for each food group by querying the corresponding entity property, removing missing entries, and converting any quantity-valued results into plain numerical values suitable for statistical analysis:

In[2]:=

berriesFolate=NQuantityMagnitudeDeleteMissingEntityValueberries,

relative food folate content



Out[2]=

{0.24,0.07,0.25,0.21,0.01,0.06,0.08,0.26,0.06,0.63,0.06,0.17}

In[3]:=

greensFolate=NQuantityMagnitudeDeleteMissingEntityValuegreens,

relative food folate content



Out[3]=

{0.98,0.14,0.27,0.38,0.87,0.09,0.12,1.36,0.745,0.715,0.09,0.43,0.37,0.78,0.77}

Compute basic descriptive statistics for each group ( Min, Max, Mean, Median, Variance and StandardDeviation ) to summarize and compare the overall distribution and variability of folate levels in berries and green vegetables:

In[4]:=

stats[data_]:="Min"Min[data],"Max"Max[data],"Mean"Mean[data],"Median"Median[data],"Variance"Variance[data],"StandardDeviation"StandardDeviation[data];

In[5]:=

berriesStats=stats[berriesFolate]

Out[5]=

Min0.01,Max0.63,Mean0.175,Median0.125,Variance0.0283909,StandardDeviation0.168496

In[6]:=

greensStats=stats[greensFolate]

Out[6]=

Min0.09,Max1.36,Mean0.540667,Median0.43,Variance0.145639,StandardDeviation0.381627

At this stage, one might want to use TrimmedMean to remove data outliers that may be skewing a result, such as the outlying 10% of data from both ends of the distribution:

In[7]:=

N[TrimmedMean[berriesFolate,0.10]]

Out[7]=

0.146

In[8]:=

N[TrimmedMean[greensFolate,0.10]]

Out[8]=

0.512308

To compare the mean folate content of berries and green vegetables, we use a two-sample TTest. This test is designed to determine whether the means of two independent groups differ in a statistically meaningful way. It assumes that the data in each group are approximately normally distributed and that the population variances are unknown.

Before applying the t-test, we check whether the normality assumption is reasonable for each group.

To assess normality, we apply DistributionFitTest separately to each group using a default significance level 𝛼 = 0.05. The test returns a p-value, which is compared to 𝛼. If the p-value is greater than 𝛼, we do not have strong evidence to reject the null hypothesis (that the data come from a normal distribution). If the p-value is below 𝛼, the null hypothesis is rejected, indicating that the normality assumption may be questionable:

In[9]:=

berriesNormality=DistributionFitTest[berriesFolate]

Out[9]=

0.11161

In[10]:=

greensNormality=DistributionFitTest[greensFolate]

Out[10]=

0.0718978

After testing normality of the data, we evaluate whether the observed difference in mean folate content between berries and green vegetables is statistically significant using the t-test introduced above. The test output includes a p-value quantifying the strength of the evidence, along with short textual conclusions summarizing the result:

In[11]:=

tTestPValue=TTest[{berriesFolate,greensFolate},0,"PValue"]

Out[11]=

0.00333388

In[12]:=

tTestConclusion=TTest[{berriesFolate,greensFolate},0,"TestConclusion"]

Out[12]=

The null hypothesis that the mean difference is 0 is rejected at the 5 percent level based on the T test.

To visually compare the folate distributions of berries and green vegetables, we use a paired histogram:

In[13]:=

PairedHistogram[greensFolate,berriesFolate,6,PlotLabel"Folate Content Distributions: Greens vs. Berries",AxesLabel{"xxxxxxx","relative folate content","zzzzzz"},ChartStyle{"RoseColors"},Ticks{{1,2,3,4,5,6,7},Automatic},ChartLegends{"greens","berries"},ChartLabelsPlaced[{"number of greens","number of berries"},Below]]

Out[13]=

	greens
	berries

Go to Nutrients by the Numbers: Food and Nutrition Statistics with Wolfram Language to see nutrition examples of the MannWhitneyTest for data that is not normally distributed and ANOVA (Analysis of Variance) to compare the means of three or more groups.

External Links

Nutrients by the Numbers: Food and Nutrition Statistics with Wolfram Language

Publisher Information

Contributed by: Wolfram Staff

Wolfram Language Example Repository

Statistical Comparison of Folate Content Across Food Groups

External Links

See Also

Publisher Information

Statistical Comparison of Folate Content Across Food Groups

External Links

See Also

Related Symbols

Publisher Information