Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext
正態分佈的幾個例子

例子程序normal_misc_examples.cpp 舉例說明了它們的使用方法。

Traditional Tables

首先我們需要包含一些頭文件來訪問正態分佈 (以及標準輸出)。

#include <boost/math/distributions/normal.hpp> // normal_distribution
  using boost::math::normal; // typedef 提供的缺省類型為 double.

#include <iostream>
  using std::cout; using std::endl; using std::left; using std::showpoint; using std::noshowpoint;
#include <iomanip>
  using std::setw; using std::setprecision;
#include <limits>
  using std::numeric_limits;

int main()
{
  cout << "Example: Normal distribution, Miscellaneous Applications.";

  try
  {
    { // Traditional tables and values.

讓我們以打印一些 traditional tables開始

double step = 1.; // in z 
double range = 4; // min and max z = -range to +range.
int precision = 17; // traditional tables 僅計算到非常低的精度.

// Construct a standard normal distribution s
  normal s; // (缺省均值 = 0,且標準差 = 1)
  cout << "Standard normal distribution, mean = "<< s.mean()
    << ", standard deviation = " << s.standard_deviation() << endl;

首先是概率分佈函數(probability distribution function)。

cout << "Probability distribution function values" << endl;
cout << "  z " "      pdf " << endl;
cout.precision(5);
for (double z = -range; z < range + step; z += step)
{
  cout << left << setprecision(3) << setw(6) << z << " " 
    << setprecision(precision) << setw(12) << pdf(s, z) << endl;
}
cout.precision(6); // default

以及從 -∞ 到 z的正態曲線之下的區域 ,累積分佈函數( cumulative distribution function)。

// 對於一個標準正態分佈( standard normal distribution ) 
cout << "Standard normal mean = "<< s.mean()
  << ", standard deviation = " << s.standard_deviation() << endl;
cout << "Integral (area under the curve) from - infinity up to z " << endl;
cout << "  z " "      cdf " << endl;
for (double z = -range; z < range + step; z += step)
{
  cout << left << setprecision(3) << setw(6) << z << " " 
    << setprecision(precision) << setw(12) << cdf(s, z) << endl;
}
cout.precision(6); // default

把你在一納秒的時間內的工作量比作由在 US National Bureau of Standards (now NIST)的Milton Abramovitz and Irene Stegen 辛苦從事的人力計算機組(team of human computers)(And all this you can do with a nanoscopic amount of work compared to the team of human computers toiling with Milton Abramovitz and Irene Stegen at the US National Bureau of Standards (now NIST))。 從1938年開始,他們的"Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables", 最終於 1964 年出版,自從那時候起已經被重印了許多次。 (一個主要的替代計劃為Digital Library of Mathematical Functions)。

打印一個傳統的二維表留作學生的練習,現在為什麼煩惱呢,這個數學工具包( Math Toolkit)允許你寫出下面的代碼:

double z = 2.; 
cout << "Area for z = " << z << " is " << cdf(s, z) << endl; // 獲取 z的區域.

相應地,我們可以獲得顯著性水平(significance levels)的傳統關鍵值(traditional 'critical' value)。對於95%的置信水平(confidence level),顯著性水平(significance level )通常稱作 alpha,為 0.05 = 1 - 0.95 (對於一個單側測試),所以我們可以寫出下面的代碼:

  cout << "95% of area has a z below " << quantile(s, 0.95) << endl;
// 95% 的區域有一個小於 1.64485 的z值

一個雙側測試(two-sided test) (兩個水平之間的比較,而不是一個單側測試)

  cout << "95% of area has a z between " << quantile(s, 0.975)
    << " and " << -quantile(s, 0.975) << endl;
// 95% 的區域有一個在 1.95996 和 -1.95996之間的z值

首先定義一個顯著性水平表(table of significance levels):這是真實的發生頻率位於計算的區間之外的概率。

有一個z值超出範圍一個標準差值的概率的alpha水平是很方便的。這不會是一些很好的類似於0.05的簡潔的數字,但是我們可以很容易地計算它。

double alpha1 = cdf(s, -1) * 2; // 0.3173105078629142
cout << setprecision(17) << "Significance level for z == 1 is " << alpha1 << endl;

並且放在我們喜愛的alpha值數組中。

double alpha[] = {0.3173105078629142, // z for 1 standard deviation.
  0.20, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

信度值(Confidence value)作為百分數是 (1 - alpha) * 100 (因此 alpha 0.05 == 95% 信度值),真實的發生頻率位於計算的區間之內

cout << "level of significance (alpha)" << setprecision(4) << endl;
cout << "2-sided       1 -sided          z(alpha) " << endl;
for (int i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
  cout << setw(15) << alpha[i] << setw(15) << alpha[i] /2 << setw(10) << quantile(complement(s,  alpha[i]/2)) << endl;
  // 使用 quantile(complement(s, alpha[i]/2)) 來避免潛在的精度損失}
cout << endl;

注意:單側測試 (one-sided),其中我們使用 a > < test (不是兩個) 並考慮從z 到+∞的尾部的區域 (積分),與一個雙側測試( two-sided test),其中我們使用> < tests,因此考慮從 -∞ 到 z 以及從 z到 +∞的兩個尾部區域之間的差別。

所以兩側值(2-sided values) alpha[i] 通過使用 alpha[i]/2計算。

如果我們考慮一個 alpha = 0.05 的簡單例子,那麼對於一個雙側測試( two-sided test),尾部區域從 -∞ 到 -1.96 是 0.025 (alpha/2) 而+z 到 +1.96 的尾部區域也是 0.025 (alpha/2), 而在 -1.96 到 12.96 之間的區域的值是 alpha = 0.95。兩個尾部區域的和是 0.025 + 0.025 = 0.05。

均值的任一邊的標準差(Standard deviations either side of the Mean)

有累積分佈函數幫助,我們可以很容易地計算容易記住的比例值:與均值的距離為1,2,3個標準差。

cout.precision(3);
cout << showpoint << "cdf(s, s.standard_deviation()) = "
  << cdf(s, s.standard_deviation()) << endl;  // from -infinity to 1 sd
cout << "cdf(complement(s, s.standard_deviation())) = "
  << cdf(complement(s, s.standard_deviation())) << endl;
cout << "Fraction 1 standard deviation within either side of mean is "
  << 1 -  cdf(complement(s, s.standard_deviation())) * 2 << endl;
cout << "Fraction 2 standard deviations within either side of mean is "
  << 1 -  cdf(complement(s, 2 * s.standard_deviation())) * 2 << endl;
cout << "Fraction 3 standard deviations within either side of mean is "
  << 1 -  cdf(complement(s, 3 * s.standard_deviation())) * 2 << endl;

對於一個有用的精度, 1, 2 & 3 百分比是 68, 95 和 99.7,並且這些值得記作有用的 'rules of thumb',比如, 在 標準差中:

Fraction 1 standard deviation within either side of mean is 0.683
Fraction 2 standard deviations within either side of mean is 0.954
Fraction 3 standard deviations within either side of mean is 0.997

通過使用cout.precision(15),對於置信區間(confidence intervals) 我們可以獲得一些更精值。

Fraction 1 standard deviation within either side of mean is 0.682689492137086
Fraction 2 standard deviations within either side of mean is 0.954499736103642
Fraction 3 standard deviations within either side of mean is 0.997300203936740

但在你為這個給人印象深刻的精度面前開始興奮之前,不要忘記標準差的置信區間(confidence intervals of the standard deviation) 是非常寬的,尤其是當你僅使用幾個測量值來計算標準差。

一些簡單的例子
燈泡的壽命(Life of light bulbs)

例子來自於 K. Krishnamoorthy, Handbook of Statistical Distributions with Applications, ISBN 1 58488 635 8, page 125... ,使用這個數學工具庫實現。

在這裡顯示了一些簡單的例子:

// K. Krishnamoorthy, Handbook of Statistical Distributions with Applications,
 // ISBN 1 58488 635 8, page 125, example 10.3.5

100 W 燈泡的平均壽命是 1100 h 標準差為 100 h。假定,或許是一點證據和更多的信心,分佈是正態的,我們使用這些值來構造一個稱之為bulbs 的正態分佈:

double mean_life = 1100.;
double life_standard_deviation = 100.;
normal bulbs(mean_life, life_standard_deviation); 
double expected_life = 1000.;

然後我們就可以使用累積分佈函數來預測持續不同時間的燈泡百分比。

cout << "Fraction of bulbs that will last at best (<=) " // P(X <= 1000)
  << expected_life << " is "<< cdf(bulbs, expected_life) << endl;
cout << "Fraction of bulbs that will last at least (>) " // P(X > 1000)
  << expected_life << " is "<< cdf(complement(bulbs, expected_life)) << endl;
double min_life = 900;
double max_life = 1200;
cout << "Fraction of bulbs that will last between "
  << min_life << " and " << max_life << " is "
  << cdf(bulbs, max_life)  // P(X <= 1200)
   - cdf(bulbs, min_life) << endl; // P(X <= 900)

[Note] 注意

Real-life failures are often very ab-normal, with a significant number that 'dead-on-arrival' or suffer failure very early in their life: the lifetime of the survivors of 'early mortality' may be well described by the normal distribution.

需要多少個洋蔥?

一個商店每週需求為 5 lb 袋洋蔥,這種需求是一個均值為140袋,標準差為10袋的正態分佈。

double mean = 140.; // 每週的袋數.
double standard_deviation = 10; 
normal sacks(mean, standard_deviation);

double stock = 160.; // 每週.
cout << "Percentage of weeks overstocked "
  << cdf(sacks, stock) * 100. << endl; // P(X <=160)
// Percentage of weeks overstocked 97.7

所以這就會導致有很多發霉的洋蔥!因此我們必須回答:多大的庫存水平才可以滿足在95%的星期的需求。

double stock_95 = quantile(sacks, 0.95);
cout << "Store should stock " << int(stock_95) << " sacks to meet 95% of demands." << endl;

並且可以很容易地估算如何滿足80%的需求,那麼浪費的就更少一些。

double stock_80 = quantile(sacks, 0.80);
cout << "Store should stock " << int(stock_80) << " sacks to meet 8 out of 10 demands." << endl;

打包牛肉(Packing beef)

一台機器用於在每個盒子中裝3 kg牛肉(ground beef)。經過很長一段時間發現每個盒子中的牛肉平均重量為3kg且標準差為0.1kg。假定將牛肉裝進盒子的過程為正態分佈,我們可以計算大於3.1kg的盒子的百分比。

double mean = 3.; // kg
double standard_deviation = 0.1; // kg
normal packs(mean, standard_deviation);

double max_weight = 3.1; // kg
cout << "Percentage of packs > " << max_weight << " is "
<< cdf(complement(packs, max_weight)) << endl; // P(X > 3.1)

double under_weight = 2.9;
cout <<"fraction of packs <= " << under_weight << " with a mean of " << mean 
  << " is " << cdf(complement(packs, under_weight)) << endl;
// <= 2.9 且均值為 3 的百分比是 0.841345
// 也就是 0.84 - 多於目標的 0.95
// 要求 95% 的盒子超過這個重量,那麼我們應當將均值設為多少呢 ?
// KK StatCalc says:
double over_mean = 3.0664;
normal xpacks(over_mean, standard_deviation);
cout << "fraction of packs >= " << under_weight
<< " with a mean of " << xpacks.mean() 
  << " is " << cdf(complement(xpacks, under_weight)) << endl;
// f>= 2.9 且均值為 3.06449 的百分比是 0.950005
double under_fraction = 0.05;  //  95% 超過這個重量的最小平均重量是 mean - sd = 2.9
double low_limit = standard_deviation;
double offset = mean - low_limit - quantile(packs, under_fraction);
double nominal_mean = mean + offset;

normal nominal_packs(nominal_mean, standard_deviation);
cout << "Setting the packer to " << nominal_mean << " will mean that "
  << "fraction of packs >= " << under_weight 
  << " is " << cdf(complement(nominal_packs, under_weight)) << endl;

設置打包機為 3.06449 將意味著 >= 2.9 的百分比為 0.95。

設置打包機為 3.13263 將意味著>= 2.9的百分比為 0.99,但那將會使平均損失從 0.0644 增長兩倍到 0.133。

另一方面,我們可以投資一個更好的標準差更低的打包機。

為了估算新的打包機到底有多好(標準差到底有多小),我們必須使用得5%的分位點位於 under_weight limit, 2.9 處。

double p = 0.05; // 需要第 p 個分位點.
cout << "Quantile of " << p << " = " << quantile(packs, p)
  << ", mean = " << packs.mean() << ", sd = " << packs.standard_deviation() << endl; //

Quantile of 0.05 = 2.83551, mean = 3, sd = 0.1

對於當前的打包機 (mean = 3, sd = 0.1), 5% 分位點在 2.8551 kg處,比我們2.9kg的目標稍低一些。所以,標準差將會變小

讓我們通過假設將其對分(現在為0.1)為標準差為0.05kg開始。

normal pack05(mean, 0.05); 
cout << "Quantile of " << p << " = " << quantile(pack05, p) 
  << ", mean = " << pack05.mean() << ", sd = " << pack05.standard_deviation() << endl;

cout <<"Fraction of packs >= " << under_weight << " with a mean of " << mean 
  << " and standard deviation of " << pack05.standard_deviation()
  << " is " << cdf(complement(pack05, under_weight)) << endl;
//

>= 2.9 且均值為3 ,標準差為 0.05 的盒子的百分比是 0.9772

所以0.05 是一個非常好的估計,但對於2.9的目標稍高了一點,所以標準差會稍大一點。所以,我們可以進行更多的估計以使得結果更接近一些,假設把標準差提高到 0.06 kg。

normal pack06(mean, 0.06); 
cout << "Quantile of " << p << " = " << quantile(pack06, p) 
  << ", mean = " << pack06.mean() << ", sd = " << pack06.standard_deviation() << endl;

cout <<"Fraction of packs >= " << under_weight << " with a mean of " << mean 
  << " and standard deviation of " << pack06.standard_deviation()
  << " is " << cdf(complement(pack06, under_weight)) << endl;

重量大於 2.9 ,均值為 3,標準差為0.06的盒子的百分比為 0.9522

現在我們的計算結果真的非常接近了,為了更好的完成這個工作,我們可能需要使用根查找算法( root finding method),例如已經提供的工具,參考 不使用導數查找根(Root Finding Without Derivatives)

但在這個分佈(正態分佈)情形中,我們應當更靈活並進行一個直接的計算

normal s; // 標準正態分佈(standard normal distribution), 
double sd = 0.1;
double x = 2.9; // 我們要求的界限.
// 那麼概率 p = N((x - mean) / sd)
// 如果我們想要找到滿足要求的標準差,
// 使得第 p 個分位點位於 x,
// 在這種開發部下是 0.95 (95%) 分位點在 2.9 kg 處, 當均值為 3 kg.

double prob =  pdf(s, (x - mean) / sd);
double qp = quantile(s, 0.95);
cout << "prob = " << prob << ", quantile(p) " << qp << endl; // p = 0.241971, quantile(p) 1.64485
// 重新整理,我們可以直接計算要求的標準差:
double sd95 = abs((x - mean)) / qp;

cout << "If we want the "<< p << " th quantile to be located at "  
  << x << ", would need a standard deviation of " << sd95 << endl;

normal pack95(mean, sd95);  // Distribution of the 'ideal better' packer.
cout <<"Fraction of packs >= " << under_weight << " with a mean of " << mean 
  << " and standard deviation of " << pack95.standard_deviation()
  << " is " << cdf(complement(pack95, under_weight)) << endl;

// 重量大於 2.9 ,均值為 3的包的百分比為 0.95

注意這兩個簡單的問題 (我們是否過度填充或者可以測量得更好一些) 實際上是很常見的。測量牛肉重量可以被其它的任何東西的測量替代。但是這個計算的精度依賴於標準差的精度-標準差幾乎一直都比我們希望的要差一些,尤其是基於很少量的測量的時候。

螺栓的長度(Length of bolts)

如果一個螺栓的長度在3.9 到 4.1 之間,那麼它才是可用的。對於大量的螺栓,一個大小為50的樣本顯示平均長度為3.95,標準0.1。假定是一個正態分佈, 那麼哪種比例是可用的?真正的樣本均值是未知的,但我們可以使用樣本均值和標準差來找到近似的結果。

    normal bolts(3.95, 0.1);
    double top = 4.1;
    double bottom = 3.9; 

cout << "Fraction long enough [ P(X <= " << top << ") ] is " << cdf(bolts, top) << endl;
cout << "Fraction too short [ P(X <= " << bottom << ") ] is " << cdf(bolts, bottom) << endl;
cout << "Fraction OK  -between " << bottom << " and " << top
  << "[ P(X <= " << top  << ") - P(X<= " << bottom << " ) ] is "
  << cdf(bolts, top) - cdf(bolts, bottom) << endl;

cout << "Fraction too long [ P(X > " << top << ") ] is "
  << cdf(complement(bolts, top)) << endl;

cout << "95% of bolts are shorter than " << quantile(bolts, 0.95) << endl;
 


PrevUpHomeNext