Chat with us, powered by LiveChat 6.1 Introduction We introduced several statistical techniques for the analysis of data in Chap. | Credence Writers
+1(978)310-4246 [email protected]

6.1 Introduction

We introduced several statistical techniques for the analysis of data in Chap. 3, most
of which were descriptive or exploratory. But, we also got our !rst glimpse of
another form of statistical analysis known as Inferential Statistics. Inferential statis-
tics is how statisticians use inductive reasoning to move from the speci!c, the data
contained in a sample, to the general, inferring characteristics of the population from
which the sample was taken.

Many problems require an understanding of population characteristics; yet, it can
be dif!cult to determine these characteristics because populations can be very large
and dif!cult to access. So rather than throw our hands into the air and proclaim that
this is an impossible task, we resort to a sample: a small slice or view of a
population. It is not a perfect solution, but we live in an imperfect world and we
must make the best of it. Mathematician and popular writer John Allen Paulos sums
it up quite nicely—“Uncertainty is the only certainty there is, and knowing how to
live with insecurity is the only security.”

So, what sort of imperfection do we face? Sample data can result in measurements
that are not representative of the population from which they are taken, so there is
always uncertainty as to how well the sample represents the population. We refer to
these circumstances as sampling error: the difference between the measurement
results of a sample and the true measurement values of a population. Fortunately,
through carefully designed sampling methods and the subsequent application of
statistical techniques, statisticians can infer population characteristics from results
found in a sample. If performed correctly, the sampling design will provide a
measure of reliability about the population inference we will make.

Let us carefully consider why we rely on inferential statistics:

1. The size of a population often makes it impossible to measure characteristics for
every member of the population—often there are just too many members of
populations. Inferential statistics provides an alternative solution to this problem.

2. Even if it is possible to measure characteristics for the population, the cost can be
prohibitive. Accessing measures for every member of a population can be costly.
We call this a census.

3. Statisticians have developed techniques that can quantify the uncertainty associ-
ated with sample data. Thus, although we know that samples are not perfect,
inferential statistics provides a reliability evaluation of how well a sample
measure represents a population measure.

This was precisely what we were attempting to do in the survey data on the four
webpage designs in Chap. 5; that is, to make population inferences from the
webpage preferences found in the sample. In the descriptive analysis we presented
a numerical result. With inferential statistics we will make a statistical statement
about our con!dence that the sample data is representative of the population. For the
numerical outcome, we hoped that the sample did in fact represent the population,
but it was mere hope. With inferential statistics, we will develop techniques that
allow us to quantify a sample’s ability to re”ect a population’s characteristics, and

180 6 Inferential Statistical Analysis of Data

this will all be done within Excel. We will introduce some often used and important
inferential statistics techniques in this chapter.

6.2 Let the Statistical Technique Fit the Data

Consider the type of sample data we have seen thus far in Chaps. 1–5. In just about
every case, the data has contained a combination of quantitative and qualitative data
elements. For example, the data for teens visiting websites in Chap. 3 provided the
number of page—views for each teen, and described the circumstances related to the
page-views, either new or old site. This was our !rst exposure to sophisticated
statistics and to cause and effect analysis-one variable causing an effect on another.
We can think of these categories, new and old, as experimental treatments, and the
page-views as a response variable. Thus, the treatment is the assumed cause and the
effect is the number of views. To determine if the sample means of the two
treatments were different or equal, we performed an analysis called a paired
t-Test. This test permitted us to consider complicated questions.

So, when do we need this more sophisticated statistical analysis? Some of the
answers to this question can be summarized as follows:

1. When we want to make a precise mathematical statement about the data’s
capability to infer characteristics of the population.

2. When we want to determine how closely these data !t some assumed model of
behavior.

3. When we need a higher level of analysis to further investigate the preliminary
!ndings of descriptive and exploratory analysis.

This chapter will focus on data that has both qualitative and quantitative compo-
nents, but we will also consider data that is strictly qualitative (categorical), as you
will soon see. By no means can we explore the exhaustive set of statistical tech-
niques available for these data types; there are thousands of techniques available and
more are being developed as we speak. But, we will introduce some of the most often
used tools in statistical analysis. Finally, I repeat that it is important to remember that
the type of data we are analyzing will dictate the technique that we can employ. The
misapplication of a technique on a set of data is the most common reason for dismissing
or ignoring the results of an analysis; the analysis just does not match the data.

6.3 !2—Chi-Square Test of Independence
for Categorical Data

Let us begin with a powerful analytical tool applied to a frequently occurring type of
data—categorical variables. In this analysis, a test is conducted on sample data, and
the test attempts to determine if there is an association or relationship between two

6.3 !2—Chi-Square Test of Independence for Categorical Data 181

categorical (nominal) variables. Ultimately, we would like to know if the result can
be extended to the entire population or is due simply to chance. For example,
consider the relationship between two variables: (1) an investor’s self-perceived
behavior toward investing, and (2) the selection of mutual funds made by the
investor. This test is known as the Chi-square, or Chi-squared, test of indepen-
dence. As the name implies, the test addresses the question of whether or not the two
categorical variables are independent (not related).

Now, let us consider a speci!c example. A mutual fund investment company
samples a total of 600 potential investors who have indicated their intention to invest
in mutual funds. The investors have been asked to classify themselves as either risk-
taking or conservative investors. Then, they are asked to identify a single type of
fund they would like to purchase. Four fund types are speci!ed for possible purchase
and only one can be selected—bond, income, growth, and income and growth. The
results of the sample are shown in Table 6.1. This table structure is known as a
contingency table, and this particular contingency table happens to have 2 rows and
4 columns—what is known as a 2 by 4 contingency table. Contingency tables show
the frequency of occurrence of the row and column categories. For example, 30 (!rst
row/!rst column) of the 150 (Totals row for risk-takers) investors in the sample that
identi!ed themselves as risk-takers said they would invest in a bond fund, and
51 (second row/second column) investors considering themselves to be conservative
said they would invest in an income fund. These values are counts or the frequency
of observations associated with a particular cell.

6.3.1 Tests of Hypothesis—Null and Alternative

The mutual fund investment company is interested in determining if there is a
relationship in an investor’s perception of his own risk and the selection of a fund
that the investor actually makes. This information could be very useful for marketing
funds to clients and also for counseling clients on risk-tailored investments. To make
this determination, we perform an analysis of the data contained in the sample. The
analysis is structured as a test of the null hypothesis. There is also an alternative to
the null hypothesis called, quite appropriately, the alternative hypothesis. As the
name implies, a test of hypothesis, either null or alternative, requires that a hypoth-
esis is posited, and then a test is performed to see if the null hypothesis can be:
(1) rejected in favor of the alternative, or (2) not rejected.

Table 6.1 Results of mutual fund sample

Fund types frequency

Investor risk preference Bond Income Income/Growth Growth Totals

Risk-taker 30 9 45 66 150

Conservative 270 51 75 54 450

Totals 300 60 120 120 600

182 6 Inferential Statistical Analysis of Data

In this case, our null hypothesis assumes that self-perceived risk preference is
independent of a mutual fund selection. That suggests that an investor’s self-
description as an investor is not related to the mutual funds he or she purchases, or
more strongly stated, does not cause a purchase of a particular type of mutual fund. If
our test suggests otherwise, that is, the test leads us to reject the null hypothesis,
then we conclude that it is likely to be dependent (related).

This discussion may seem tedious, but if you do not have a !rm understanding of
tests of hypothesis, then the remainder of the chapter will be very dif!cult, if not
impossible, to understand. Before we move on to the calculations necessary for
performing the test, the following summarizes the general procedure just discussed:

1. an assumption (null hypothesis) that the variables under consideration are inde-
pendent, or that they are not related, is made

2. an alternative assumption (alternative hypothesis) relative to the null is made that
there is dependence between variables

3. the chi-square test is performed on the data contained in a contingency table to
test the null hypothesis

4. the results, a statistical calculation, is used to attempt to reject the null hypothesis
5. if the null is rejected, then this implies that the alternative is accepted; if the null is

not rejected, then the alternative hypothesis is rejected

The chi-square test is based on a null hypothesis that assumes independence of
relationships. If we believe the independence assumption, then the overall fraction of
investors in a perceived risk category and fund type should be indicative of the entire
investing population. Thus, an expected frequency of investors in each cell can be
calculated. We will have more to say about this later in the chapter. The expected
frequency, assuming independence, is compared to the actual (observed) and the
variation of expected to actual is tested by calculating a statistic, the !

2
statistic (! is

the lower case Greek letter chi). The variation between what is actually observed and
what is expected is based on the formula that follows. Note that the calculation
squares the difference between the observed frequency and the expected frequency,
divides by the expected value, and then sums across the two dimensions of the i by
j contingency table:

!
2 ! “i “j obsij ” exp valij

! ”

2=exp valij
# $

where:
obsij ! frequency or count of observations in the ith row and jth column of the

contingency table
exp valij ! expected frequency of observations in the ith row and jth column of

the contingency table, when independence of the variables is assumed.
1

1
Calculated by multiplying the row total and the column total and dividing by total number of
observations—e.g. in Fig. 6.1 expected value for conservative/growth cell is (120 ! 450)/600 ! 90.
Note that 120 is the marginal total Income/Growth and 450 is the marginal total for Conservative.

6.3 !2—Chi-Square Test of Independence for Categorical Data 183

Once the !
2
statistic is calculated, then it can be compared to a benchmark value

of !
2
# that sets a limit, or threshold, for rejecting the null hypothesis. The value of !

2

# is the limit the !
2
statistic can achieve before we reject the null hypothesis. These

values can be found in most statistics books. To select a particular ! 2 “, the ” (the
level of signi!cance of the test) must be set by the investigator. It is closely related to
the p-value—the probability of obtaining a particular statistic value or more extreme
by chance, when the null hypothesis is true. Investigators often set # to 0.05; that is,
there is a 5% chance of obtaining this !

2
statistic (or greater) when the null is true.

So, our decision-maker only wants a 5% chance of erroneously rejecting the null
hypothesis. That is relatively conservative, but a more conservative (less chance of
erroneously rejecting the null hypothesis) stance would be to set # to 1%, or even less.

Thus, if our !
2
is greater than or equal to !

2
#, then we reject the null. Alterna-

tively, if the p-value is less than # we reject the null. These tests are equivalent. In
summary, the rules for rejection are either:

Reject the null hypothesis when !
2 # !2 #

or

Reject the null hypothesis when p-value $#
(Note that these rules are equivalent)

Fig. 6.1 Chi-squared calculations via contingency table

184 6 Inferential Statistical Analysis of Data

Figure 6.1 shows a worksheet that performs the test of independence using the
chi-square procedure. The !gure also shows the typical calculation for contingency
table expected values. Of course, in order to perform the analysis, both tables are
needed to calculate the !

2
statistic since both the observed frequency and the

expected are used in the calculation. Using the Excel CHISQ.TEST (actual
range, expected range) cell function permits Excel to calculate the data’s !

2
and

then return a p-value (see cell F17 in Fig. 6.1). You can also see from Fig. 6.1 that the
actual range is C4:F5 and does not include the marginal totals. The expected range
is C12:F13 and the marginal totals are also omitted. The internally calculated !

2

value takes into consideration the number of variables for the data, 2 in our case, and
the possible levels within each variable—2 for risk preference and 4 for mutual fund
types. These variables are derived from the range data information (rows and
columns) provided in the actual and expected tables.

From the spreadsheet analysis in Fig. 6.1 we can see that the calculated !
2
value in

F18 is 106.8 (a relatively large value), and if we assume # to be 0.05, then !
2
# is

approximately 7.82 (from a table in a statistics book). Thus, we can reject the null
since 106.8 > 7.82.2 Also, the p-value from Fig. 6.1 is extremely small (5.35687E-
23)

3
indicating a very small probability of obtaining the !

2
value of 106.8 when the

null hypothesis is true. The p-value returned by the CHISQ.TEST function is shown
in cell F17, and it is the only value that is needed to reject, or not reject, the null
hypothesis. Note that the cell formula in F18 is the calculation of the !

2
given in the

formula above and is not returned by the CHISQ.TEST function. This result leads
us to conclude that the null hypothesis is likely not true, so we reject the notion
that the variables are independent. Instead, there appears to be a strong dependence
given our test statistic. Earlier, we summarized the general steps in performing a test
of hypothesis. Now we describe in detail how to perform the test of hypothesis
associated with the !

2
test. The steps of the process are:

1. Organize the frequency data related to two categorical variables in a contingency
table. This shown in Fig. 6.1 in the range B2:G6.

2. From the contingency table values, calculate expected frequencies (see Fig. 6.1
cell comments) under the assumption of independence. The calculation of !

2
is

simple and performed by the CHISQ.TEST(actual range, expected range) func-
tion. The function returns the p-value of the calculated !

2
. Note that it does not

return the !
2
value, although it does calculate the value for internal use. I have

calculated the !
2
value in cells C23:F24 and their sum in G25 for completeness of

calculations, although it is unnecessary to do so.

2
Tables of !

2
# can be found in most statistics texts. You will also need to calculate the degrees

of freedom for the data: (number of rows–1) % (number of columns–1). In our example:
(2–1) % (4–1) !3.
3
Recall this is a form of what is known as “scienti!c notation”. E-17 means 10 raised to the “17
power, or the decimal point moved 17 decimal places to the left of the current position for 3.8749.
Positive (E + 13 e.g.) powers of 10 moves the decimal to the right (13 decimal places).

6.3 !2—Chi-Square Test of Independence for Categorical Data 185

3. By considering an explicit level of #, the decision to reject the null can be made
on the basis of determining if !

2 # !2 #. Alternatively, # can be compared to the
calculated p-value: p-value $#. Both rules are interchangeable and equivalent. It
is often the case that an # of 0.05 is used by investigators.

6.4 z-Test and t-Test of Categorical and Interval Data

Now, let us consider a situation that is similar in many respects to the analysis just
performed, but it is different in one important way. In the !

2
test the subjects in our

sample were associated with two variables, both of which were categorical. The cells
provided a count, or frequency, of the observations that were classi!ed in each cell.
Now, we will turn our attention to sample data that contains categorical and interval
or ratio data. Additionally, the categorical variable is dichotomous, and thereby can
take on only two levels. The categorical variable will be referred to as the experi-
mental treatment, and the interval data as the response variable. In the following
section, we consider an example problem related to the training of human resources
that considers experimental treatments and response variables.

6.5 An Example

A large !rm with 12,000 call center employees in two locations is experiencing
explosive growth. One call center is in South Carolina (SC) and the other is in Texas
(TX). The !rm has done its own standard, internal training of employees for
10 years. The CEO is concerned that the quality of call center service is beginning
to deteriorate at an alarming rate. They are receiving many more complaints from
customers, and when the CEO disguised herself as a customer requesting call center
information, she was appalled at the lack of courtesy and the variation of responses
to a relatively simple set of questions. She !nds this to be totally unacceptable and
has begun to consider possible solutions. One of the solutions being considered is
a training program to be administered by an outside organization with experience
in the development and delivery of call center training. The hope is to create a
systematic and predictable customer service response.

A meeting of high level managers is held to discuss the options, and
some skepticism is expressed about training programs in general: many ask the
question—Is there really any value in these outside programs? Yet, in spite of the
skepticism, managers agree that something has to be done about the deteriorating
quality of customer service. The CEO contacts a nationally recognized training !rm,
EB Associates. EB has considerable experience and understands the concerns of
management. The CEO expresses her concern and doubts about training. She is not
sure that training can be effective, especially for the type of unskilled workers they
hire. EB listens carefully and has heard these concerns before. EB proposes a test to

186 6 Inferential Statistical Analysis of Data

determine if the special training methods they provide can be of value for the call
center workers. After careful discussion with the CEO, EB makes the following
suggestion for testing the effectiveness of special (EB) versus standard (internal)
training:

1. A test will be prepared and administered to all the customer service representa-
tives working in the call centers, 4000 in SC and 8000 TX. The test is designed to
assess the current competency of the customer service representatives. From this
overall data, speci!c groups will be identi!ed and a sample of 36 observations
(test scores) for each group will be a taken. This will provide a baseline call center
personnel score, standard training.

2. Each customer service representative will receive a score from 0 to 100.
3. A special training course by EB will be offered to a selected group of customer

service representatives in South Carolina: 36 incarcerated women. The compe-
tency test will be re-administered to this group after training to detect changes in
scores, if any.

4. Analysis of the difference in performance between representatives specially
trained and those standard trained will be used to consider the application of
the training to all employees. If the special training indicates signi!cantly better
performance on the exam, then EB will receive a large contract to administer
training for all employees.

As mentioned above, the 36 customer service representatives selected to receive
special training are a group of women that are incarcerated in a low security prison
facility in the state of South Carolina. The CEO has signed an agreement with the
state of South Carolina to provide the SC women with an opportunity to work as
customer service representatives and gain skills before being released to the general
population. In turn, the !rm receives signi!cant tax bene!ts from South Carolina.
Because of the relative ease with which these women can be trained, they are chosen
for the special training. They are, after all, a captive audience. There is a similar
group of customer service representatives that also are incarcerated woman. They are
in a similar low security Texas prison, but these women are not chosen for the special
training.

The results of the tests for employees are shown in Table 6.2. Note that the data
included in each of !ve columns is a sample of personnel scores of similar sizes (36):
(1) non-prisoners in TX, (2) women prisoners in TX, (3) non-prisoners in SC,
(4) women prisoners in SC before special training, and (5) women prisoners in SC
after special training. All the columns of data, except the last, are scores for customer
service representatives that have only had the internal standard training. The last
column is the re-administered test scores of the SC prisoners that received special
training from EB. Additionally, the last two columns are the same individual sub-
jects, matched as before and after special training, respectively. The sample sizes for
the samples need not be the same, but it does simplify the analysis calculations. Also,
there are important advantages to samples greater than approximately 30 observa-
tions that we will discuss later.

6.5 An Example 187

Table 6.2 Special training and no training scores

Observation

36
Non-prisoner
scores TX

36 Women
prisoners
TX

36
Non-prisoner
scores SC

36 Women SC
(before special
training)!

36 Women SC
(with special
training)!

1 81 93 89 83 85

2 67 68 58 75 76

3 79 72 65 84 87

4 83 84 67 90 92

5 64 77 92 66 67

6 68 85 80 68 71

7 64 63 73 72 73

8 90 87 80 96 98

9 80 91 79 84 85

10 85 71 85 91 94

11 69 101 73 75 77

12 61 82 57 62 64

13 86 93 81 89 90

14 81 81 83 86 89

15 70 76 67 72 73

16 79 90 78 82 84

17 73 78 74 78 80

18 81 73 76 84 85

19 68 81 68 73 76

20 87 77 82 89 91

21 70 80 71 77 79

22 61 62 61 64 65

23 78 85 83 85 87

24 76 84 78 80 81

25 80 83 76 82 84

26 70 77 75 76 79

27 87 83 88 90 93

28 72 87 71 74 75

29 71 76 69 71 74

30 80 68 77 80 83

31 82 90 86 88 89

32 72 93 73 76 78

33 68 75 69 70 72

34 90 73 90 91 93

35 72 84 76 78 81

36 60 70 63 66 68

Averages! 75.14 80.36 75.36 79.08 81.06

Variance! 72.12 80.47 78.47 75.11 77.31

Total TX 74.29 Total TX

Av ! (8000
obs.)

VAR! 71.21

(continued)

188 6 Inferential Statistical Analysis of Data

Every customer service representative at the !rm was tested at least once, and the
SC women prisoners were tested twice. Excel can easily store these sample data and
provide access to speci!c data elements using the !ltering and sorting capabilities we
learned in Chap. 5. The data collected by EB provides us with an opportunity to
thoroughly analyze the effectiveness of the special training.

So, what are the questions of interest and how will we use inferential statistics to
answer them? Recall that EB administered special training to 36 women prisoners in
SC. We also have a standard trained non-prisoner group from SC. EB’s !rst question
might be—Is there any difference between the average score of a randomly selected
SC non-prisoner sample with no special training and the SC prisoner’s average score
after special training? Note that our focus is on the aggregate statistic of average
scores for the groups. Additionally, EB’s question involves SC data exclusively.
This is done to not confound results, should there be a difference between the
competency of customer service representatives in TX and SC. We will study the
issue of the possible difference between Texas and SC scores later in our analysis.

EB must plan a study of this type very carefully to achieve the analytical goals she
has in mind. It will not be easy to return to these customer representatives and
re-administer the competency exams.

6.5.1 z-Test: 2 Sample Means

To answer the question of whether or not there is a difference between the average
scores of SC non-prisoners without special training and prisoners with special
training, we use the z-Test: Two Sample for Means option found in Excel’s Data
Analysis tool. This analysis tests the null hypothesis that there is no difference
between the two sample means and is generally reserved for samples of 30 observa-
tions or more. Pause for a moment to consider this statement. We are focusing on the
question of whether two means from sample data are different; different in statistics
suggests that the samples come from different underlying populations distributions
with different means. For our problem, the question is whether the SC non-prisoner
group and the SC prisoner group with special training have different population

Table 6.2 (continued)

Observation

36
Non-prisoner
scores TX

36 Women
prisoners
TX

36
Non-prisoner
scores SC

36 Women SC
(before special
training)!

36 Women SC
(with special
training)!

Total SC
Av!

75.72 Total SC

(4000 obs.) VAR! 77.32

Total Av! 74.77 TX&SC

(12,000
obs.)

VAR! 73.17

!Same 36 SC women prisoners that received training

6.5 An Example 189

means for their sample scores. Of course, the process of calculating sample means
will very likely lead to different values. If the means are relatively close to one
another, then we will conclude that they came from the same population; if the
means are relatively different, we are likely to conclude that they are from different
populations. Once calculated, the sample means will be examined and a probability
estimate will be made as to how likely it is that the two sample means came from the
same population. But, the question of importance in these tests of hypothesis is
related to the populations—are the averages of the population of SC non-prisoners
and of the population of SC prisoners with special training the same, or are they
different?

If we reject the null hypothesis that there is no difference in the average scores,
then we are deciding in favor of the training indeed leading to a difference in scores.
As before, the decision will be made on the basis of a statistic that is calculated from
the sample data, in this case the z-Statistic, which is then compared to a critical
value. The critical value incorporates the decision maker’s willingness to commit an
error by possibly rejecting a true null hypothesis. Alternatively, we can use the
p-value of the test and compare it to the level of signi!cance which we have
adopted—as before, frequently assumed to be 0.05. The steps in this procedure are
quite similar to the ones we performed in the chi-square analysis, with the exception
of the statistic that is calculated, z rather than chi-square.

6.5.2 Is There a Difference in Scores for SC Non-prisoners
and EB Trained SC Prisoners?

The procedure for the analysis is shown in Figs. 6.2 and 6.3. Figure 6.2 shows the
Data Analysis dialogue box in the Analysis group of the Data ribbon used to select
the z-Test. We begin data entry for the z-Test in Fig. 6.3 by identifying the range
inputs, including labels, for the two samples: 36 SC non-prisoner standard trained
scores (E1:E37) and 36 SC prisoners that receive special training (G1:G37). Next,
the dialog box requires a hypothesized mean difference. Since we are assuming there
is no difference in the null hypothesis, the input value is 0. This is usually the case,
but you are permitted to designate other differences if you are hypothesizing a
speci!c difference in the sample means. For example, consider the situation in
which management is willing to purchase the training, but only if it results in
some minimum increase in scores. The desired difference in scores could be tested
as the Hypothesized Mean Difference.

The variances for the variables can be estimated to be the variances of the
samples, if the samples are greater than approximately 30 observations. Recall
earlier that I suggested that a sample size of at least 30 was advantageous, this is
why! We can also …