All Crime,,,All Crime,,,Aggravated Assault,,,Burglary,,,Homicide,,,Larceny,,,Motor Vehicle Theft,,,Robbery,
Offense Type,,,Day of the Week,,,Day of the Week,,,Day of the Week,,,Day of the Week,,,Day of the Week,,,Day of the Week,,,Day of the Week,
,,,,,,,,,,,,,,,,,,,,,,
Aggravated Assault,2124,,Monday,2218,,Monday,271,,Monday,286,,Monday,7,,Monday,1328,,Monday,171,,Monday,155
Burglary,2014,,Tuesday ,2391,,Tuesday ,286,,Tuesday ,307,,Tuesday ,9,,Tuesday ,1416,,Tuesday ,180,,Tuesday ,193
Homicide,50,,Wednesday,2333,,Wednesday,272,,Wednesday,310,,Wednesday,7,,Wednesday,1406,,Wednesday,172,,Wednesday,166
Larceny,9630,,Thursday,2728,,Thursday,294,,Thursday,266,,Thursday,5,,Thursday,1351,,Thursday,180,,Thursday,182
Motor Vehicle Theft,1270,,Friday,2418,,Friday,285,,Friday,343,,Friday,6,,Friday,1442,,Friday,181,,Friday,161
Robbery,1192,,Saturday,2387,,Saturday,352,,Saturday,270,,Saturday,4,,Saturday,1394,,Saturday,188,,Saturday,179
,16290,,Sunday,2255,,Sunday,364,,Sunday,232,,Sunday,12,,Sunday,1293,,Sunday,198,,Sunday,156
,,,,16730,,,2124,,,2014,,,50,,,9630,,,1270,,,1192
Worked Example for Week 10
Using the same data from our previous Worked Examples, along with new data, I will show you
how you can test the difference between means using some basic descriptive statistics of the two
samples being compared. As a result, you will see an example of hypothesis testing: a statistical
test that examines the possible significant difference between two groups. The process can be
done for proportions as well, but for our purposes we will focus on means.
To perform this test we will require descriptive statistics for two groups. The first group will be
the sample of prisoners we have used in all of our Worked Examples. The second group will be a
new hypothetical group from a different prison. Let?s assume the first group is male prisoners
and the second group is female prisoners. Thus, we are testing to see if there is a significant
difference in the mean number of months incarcerated for males and females.
Our null hypothesis is always written that the two means are equal. To reject this, we need to
find a significant difference. Similar to ?innocent until proven guilty?, we assume equal means
until proven different. The null hypothesis can be written as follows:
H0: ?1 = ?2 or 1 = 2
Our alternative hypothesis is written that the two means are not equal (significantly different).
H1: ?1 ? ?2 or 1 ? 2
Notice the subscript numbers next to the symbols for means: this is simply referring to each
group. Each symbol with a ?1? subscript requires the descriptive data from Group 1 and each
symbol with a ?2? subscript requires the descriptive data from Group 2. It does not necessarily
matter which group you refer to as ?1? and ?2?; what is important is that you are consistent
throughout the hypothesis test processes*. If you mix the groups throughout the steps, you will
end up with incorrect and invalid results. Now that we have stated our hypothesis, we must list
the descriptive statistics needed for the hypothesis test.
Group 1, as we stated, is the male prisoners. Since we have used this data throughout our
Worked Examples, we already have all the information needed.
N1 = 10 1 = 4 s1
2
= 3.4
Recall N is our sample size. is the mean. S2 is the variance. This is very important; the standard
deviation is not required in this test, we need to use the variance (standard deviation ?squared?).
Group 2, as we stated, is the female prisoners. For our purposes, I will simply supply the
descriptive statistics needed for the hypothesis test.
N2 = 12 2 = 2 s2
2
= 2
Now that we have the required information we can begin to test the hypothesis that the mean
number of months incarcerated is equal between male and female prisoners. From our two small
samples we see that males have an average of 4 months and females have an average of 2
months. Remember, these are very small samples so we must use that information in
combination with the variability to determine if we have enough information to conclude the
difference is significant.
Step 1: Compute the standard error.
This step is tedious and requires a lot of information.
S 1- 2= v(
) (
) S 1- 2= v(
) (
)
S 1- 2 = .73
Step 2: Compute the test statistic (t-value). We simply subtract the mean of group 2 from the
mean of group 1 and divide by the standard error, the value we just calculated.
T=
T=
T= 2.74
Step 3: Determine the critical value. This step requires knowledge of what alpha level you will
be using and a T-distribution table of values. As is common in criminal justice research we will
look at alpha levels of .05 and .01.
The critical value for our hypothesis test with an alpha level of .05 is 2.086
The critical value for our hypothesis test with an alpha level of .01 is 2.845
Step 4: Compare test statistic (t-value) and critical value. Interpret.
We computed a test statistic of 2.74 which is larger than the first critical value of 2.086; we reject
the null hypothesis that the mean number of months incarcerated is equal between males and
females. We are stating that based on the information provided to us we can say at the .05 alpha
level (or with 95% confidence) the means are different.
When examining the test statistic of 2.74 at the .01 alpha level we fail to reject the null
hypothesis that the mean number of months incarcerated is equal between males and females.
This is due to the test statistic being lower than the critical value of 2.845. We are stating that
based on the information provided to us we can?t say at the .01 alpha level the means are
N1
N1 N1
N1
different. Essentially, we can be 95% confident that the observed difference is true or not due to
sampling error/chance but we cannot be 99% confident.
*As a note, depending on the values of the means and the order in which you subtract one from
the other you may end up with a negative test statistic (t-value). That is fine. When this happens
simply compare the numerical value itself to the critical value just as you would if the value was
positive. The negative only implies directionality, a component we are not focusing on. The
interpretation of actual difference is what is important.
Steps to test the null hypothesis that the mean number of months incarcerated for
males is equal to the mean number of months incarcerated for females:
Find the standard error of the difference between means using:
S 1- 2= v(
) (
N
)
Compute the test statistic (t-value) by dividing the difference between means by
the standard error of the difference between means using:
T=
Determine the critical value (based on alpha level and degrees of freedom).
df=(N1+N2-2)
Compare our T -value and our critical value. Interpret. (If t-value exceeds critical;
reject the null hypothesis)
McCormack
CRIM.3950.031
Tests of Equality II
Thus far, we have extended our description of data; most recently, through Chapter 5, we learned about
measures of variability and dispersion. However, usually, the main objective in data gathering is not
description. The most common research initiatives revolve around hypothesis testing.
In hypothesis testing, we are estimating a population value (of a specific variable) based on sample data.
Hypothesis testing, in part, allows us to test the difference between means; frequencies, proportions, etc.
We can examine different subgroups with the same variable: number of crimes by adults, juveniles, etc.
We can examine different components of the same variables: number of assaults, murders, etc.
When we are hypothesis testing, we do so by analyses that result in inferential statistics. An important
facet of inferential statistics is the probability of empirical outcomes ? how likely we are to see that result
in the population. When a sample statistic is not equal to a population parameter, this can be
conceptualized in two ways.
? It may be a product of random fluctuations in sample statistics called sampling error. The disparity
could represent a legitimate statistical effect.
? The sample may not be appropriate to generalize to the population. A genuine discrepancy exists
between the sample statistic and population parameter.
The overarching purpose of hypothesis testing is to determine which of the above explanations is more
valid.
Perhaps the most commonly used inferential test is the t-test. This test is used to compare means across
groups, e.g. do males, on average, commit more crimes than females? A t-test uses estimation to compare
ONE value across TWO groups. We are asking, are those values significantly different?
What is the average age of our class? Let?s use hypothetical data.
Mean of 32 years overall: 35 males, 29 females (even gender split)
McCormack
CRIM.3950.031
In this case, we could use a t-test to answer the question, is there a significant difference between the
average age of males and females? Using our common levels of confidence, we can be 95/99% confident
in our answer, that there is a significant difference or there is not a significant difference. On the surface,
we see there is a difference: 35 ? 29
Why do we need to statistically test for a difference? What if we had just 2 folks in the class, 1 male (35
years old) and 1 female (29 years old)? Should two people be representative of a population? Doubtful!
We need to take into account some of the groups? descriptive information to assess if the 35 value is
significantly different from the 29 value based on the their variability and how many cases were
used to calculate them (e.g., is our sample big enough?). Here is what we need to know for each group.
? What are the averages?
o (???1) and (???2)
? What are the variations?
o (?1
2) and (?2
2)
? How many cases?
o (n1) and (n2)
That information is either always readily available or simple to obtain. We already know how to calculate
those values! What is next?
McCormack
CRIM.3950.031
Essentially, we end up calculating (or the software does it for us), a t-value that represents the amount of
difference between the two values (mean of group 1 and mean of group 2). Then, based on our level of
confidence (95% or 99%) and the number of cases, we compare it to a critical value. If/when the t-value
is greater than the critical value, we have enough information (the observed difference is larger enough)
to say there is a significant difference between groups. How do we get a difference that is enough?
? A large numerical difference and/or
? A large amount of cases
If the t-value (we calculated) is greater than the critical value (determined via a statistical table)
There is a significant difference in the average _____ between groups.
If the t-value (we calculated) is less than the critical value
There is a not significant difference in the average _____ between groups.
The t-value we calculated represents the difference between the groups. The critical value that is pre-
determined represents how different the groups? values must be to be significantly different. There are
only a few short formulae needed if we were to compute this by hand?.
Let?s see an example.
1) Calculate/obtain descriptives
Number of months: Is there a significant difference in sentence length between male and female offenders?
Males: mean = 4 variance = 3.4 N = 10
Females mean = 2 variance = 2 N = 12
2) Calculate the standard error
3) Calculate the test statistic (t-value)
McCormack
CRIM.3950.031
4) Compare to critical t-value
2) Calculate the standard error
3) Calculate the test statistic (t-value)
4) Compare to critical t-value
McCormack
CRIM.3950.031
Our t- value is 2.74. To be 95% confident we have a significant difference in sentence length between
male and female inmates, our t-value needs to be greater than 2.086. Achieved.
We are 95% confidence the sentence length is significantly different between males and females.
Our t-value is 2.74. To be 99% confident we have a significant difference in sentence length between male
and female inmates, our t-value needs to be greater than 2.845. Not achieved.
We are not 99% confident the sentence length is significantly different between males and females.
If the 99% level of confidence was our threshold, we would accept the null hypothesis that the sentence
lengths are equal between the two groups (male and female).
In this example, we only have enough cases/the difference is large enough to be 95% confident it
exists in the population, but not 99% confident. This can happen.
If you achieve significance at the 99% level, you will achieve it at the 95% level. If you don?t
achieve significant at the 95% level, you won?t achieve it at the 99% level.
Assignment 2 McCormack
CRIM.3950.01
Open the Excel data file in the Week 8 folder.
Your spreadsheet should like this:
This week we will complete the calculation and explanation of descriptive statistics with the help
of Excel. Below are the commands needed to compute the respective values.
Measures of Variability/Dispersion1:
Min. =min(array of values)
Max. =max(array of values)
Range =(Max-Min)
Variance =VAR.P(array of values)2
Standard Deviation =STDEV.P(array of values)
Quartiles:
Q1: =QUARTILE.INC(array of values, 1)3
Q3: =QUARTILE.INC(array of values,3)
1 In order to calculate the Range in Excel, we need to calculate the High and Low value(s) in the data set(s).
2 In Excel, like all statistics programs, you have the option of calculating some measures for a ?sample? or for a
?population?. We are going to use the Population form. The alternative form of the function in Excel for variance
and standard deviation would have ?.S? instead of a ?.P?.
3 In Excel, there is the option to INClude the median the calculation of the quartiles or EXClude the median. Including,
as we have done, will affect the Q1 and Q3 calculations and provide more symmetrical quartiles. The disadvantage
to this is that it will make it more difficult to identify outliers. Thus, we assume more normality in our distribution.
Assignment 2 McCormack
CRIM.3950.01
IQR4: =(Q1-Q3)
Normality:
Skewness: =SKEW.P(array of values)
Kurtosis: =KURT(array of values)
The measures of skewness and kurtosis may be unfamiliar and/or new to you. These values can
also be used to describe how normally distributed a set of data is. Each also has a threshold – or
rule of thumb – value, which one can use to determine how normal (or not) a distribution is.
Skewness measures the symmetry of the distribution. The closer the value is to 0 the more
symmetrical the distribution is, e.g. the less skewed. The value measures the relative size of the
two tails. Data that has skewness measured greater than | 1 | are highly skewed (positive or negative
direction indicates the direction of the skew). Kurtosis measures the ?peakedness? of the
distribution. Excel calculates this value using the ?minus 3? rule – a correction that actually reflects
a normal distribution with a value of 0. Thus, in Excel, the closer the value is to 0, the more
normally distributed the distribution is. Both values, like many of our descriptive measures, are
heavily influenced by our sample size.
Like before, add these measures in the Excel spreadsheet. Begin below ?mean?, in cell A49. Your
spreadsheet should look like this (open your previous assignment if needed):
Using the commands above, let?s calculate the values. Ultimately, you should come up with the
following values ? formatted in tabular form:
4 The Interquartile Range (IQR) describes where the middle 50% of the data is located.
Assignment 2 McCormack
CRIM.3950.01
All Reported Crimes
Annual Number in Boston 1985-2014
Mode N/A (multimodal)
Median 35,788
Mean 43,069
Min. 22,018
Max. 70,003
Range 47,985
Variance 258,567,142.6
Standard Deviation 16,080.02
Q1 31,718.75
Q3 56,188
IQR 24,452.25
Skewness 0.47
Kurtosis -1.27
How would we explain these results? First, we see a fairly large range. Annually, we have seen a
near consistent decrease in the overall number of reported crime in Boston. Its peak was in 1989,
with just over 70,000 crimes reported, and a low in 2014, with just over 22,000 crimes reported.
With a large range usually comes a large standard deviation. A value of just over 16,000 indicates
that the typical distance each annual total is away from the mean is about 16,000. So, the annual
values tend to differ. We know the IQR represents where the middle 50% of the data lie, so half
of all years had between about 31,000 and 56,000 reported crimes.
Variability and Dispersion
Calculate the measures of variability and dispersion for the same 3 offenses you have worked with
previously. Create a similar table for those same 3 offenses.
Copy/paste them into or create in a Word document (.doc or .docx) which will be
submitted.
Beneath each table (3 you picked and created), write a 100-word paragraph describing the
measures of variability and dispersion.
Ensure all of your tables and write-ups are submitted in one Word (.doc or .docx) file for
Assignment 3.
This pie chart represents crimes committed in Boston by the day of the week. On Monday, 13% of crime happens. On Tuesday, Wednesday, Saturday, and Sunday, 14% of crimes occur. On Friday, 15% of crime takes place. Thursday, 16 % of crime happens. Thursday is the highest percentage that crimes are committed. To determine why crime is the highest on Thursday, we have to decide what factors may differ from the other days of the week. Once we figure out the factors, we can try to eliminate to reduce crimes on all Thursdays. I would have thought crimes would be the highest on Friday and Saturday because that?s the time most people seem to be outside. When there are more people out, more people can commit crimes.
This pie chart represents aggravated assault crimes committed in Boston by the day of the week. On Monday, Tuesday Wednesday and Friday 13% of crime happens. On Thursday 14% of crimes occur. On Friday and Saturday 17% of crime takes place. You can Cleary see on the map that Friday and Saturday has more aggravated assault then the rest of the days.
This pie chart represents aggravated assault crimes committed in Boston by the day of the week. On Monday, Tuesday Wednesday and Friday 13% of crime happens. On Thursday 14% of crimes occur. On Friday and Saturday 17% of crime takes place. You can Cleary see on the map that Friday and Saturday has more aggravated assault then the rest of the days.
This bar graph displays data for all crimes committed in Boston by the day of the week. On Monday, there were 2218 crimes. On Tuesday, there were 2391 crimes. On Wednesday, There were 2333 crimes. On Thursday, there were 2728 crimes committed. On Friday, there were crimes committed. On Saturday, there were 2387 crimes committed. On Sunday, there were 2255 crimes committed. In Total, there were 16,730 crimes committed. I am not a fan of this graph because you can?t tell how many numbers off offense there are. You will have to estimate how many crimes you think were in all bar graphs. Since you cannot label each number on the axis, it?s important to label each number on the bar. Not having the number on the bar, someone can produce the wrong data by estimating a number.
This bar graph displays data for aggravated assault crimes committed in Boston by the day of the week. On Monday, there were 271 Aggravated Assaults. On Tuesday, there were 286 aggravated assault crimes. On Wednesday, Three were 272 aggravated assault crimes. On Thursday, there were 284 aggravated assault crimes committed. On Friday, there were 285 aggravated assault crimes committed. On Saturday, there were 352 aggravated assault crimes. On Sunday, there were 364 aggravated assault crimes.
This bar graph displays data for Burglary crimes committed in Boston by the day of the week. On Monday, there were Burglary crimes. On Tuesday, there were 307 aggravated assault crimes. On Wednesday, Three were 310 Burglary crimes. On Thursday, there were 266 Burglary crimes committed. On Friday, there were 343 Burglary crimes committed. On Saturday, there were 270 Burglary crimes. On Sunday, there were 232 Burglary crimes.On the Bar graph you can see what bar is the highest.
All Crime |
|||||||||||||||||||
Days Of The Week |
|||||||||||||||||||
Monday |
2218 |
||||||||||||||||||
Tuesday |
2391 |
||||||||||||||||||
Wednesday |
2333 |
||||||||||||||||||
Thursday |
2728 |
||||||||||||||||||
Friday |
2418 |
||||||||||||||||||
Saturday |
2387 |
||||||||||||||||||
Sunday |
2255 |
||||||||||||||||||
Total |
16730 |
||||||||||||||||||
|
This table represents the data for aggravated assault crimes committed in Boston by the day of the week. On Monday, there were 271 Aggravated Assaults. On Tuesday, there were 286 aggravated assault crimes. On Wednesday, Three were 272 aggravated assault crimes. On Thursday, there were 284 aggravated assault crimes committed. On Friday, there were 285 aggravated assault crimes committed. On Saturday, there were 352 aggravated assault crimes. On Sunday, there were 364 aggravated assault crimes. The table is very clear you can determine what days of a certain number of crime. On o side of the table it will represent the number of aggravated assaults. The other side of the table it will represents the day of the week. |
||||||||||||||||||
All Crime |
|
Day Of Week |
|
Monday |
2218 |
Tuesday |
2391 |
Wednesday |
2333 |
Thursday |
2728 |
Friday |
2418 |
Saturday |
2387 |
Sunday |
2255 |
Total |
16730 |
This table displays data for all crimes committed in Boston by the day of the week. On Monday, there were 2218 crimes. On Tuesday, there were 2391 crimes. On Wednesday, There were 2333 crimes. On Thursday, there were 2728 crimes committed. On Friday, there were crimes committed. On Saturday, there were 2387 crimes committed. On Sunday, there were 2255 crimes committed. In Total, there were 16,730 crimes committed. All crime is bold on the table, The days of the day of the weeks is in italics. The table also consist of three lines that organize the two titles on the table.
Burglary |
|
Day Of Week |
|
Monday |
286 |
Tuesday |
307 |
Wednesday |
310 |
Thursday |
266 |
Friday |
343 |
Saturday |
270 |
Sunday |
232 |
Total |
2014 |
This table displays data for Burglary crimes committed in Boston by the day of the week. On Monday, there were Burglary crimes. On Tuesday, there were 307 aggravated assault crimes. On Wednesday, Three were 310 Burglary crimes. On Thursday, there were 266 Burglary crimes committed. On Friday, there were 343 Burglary crimes committed. On Saturday, there were 270 Burglary crimes. On Sunday, there were 232 Burglary crimes. The table was created Using word. Then table has no axes like the graphs do. There are 2 columns. In the two columns it represents Burglary and day of the week. In the 10 rows it represents the specific day of the week in one row. In the other row it list the number of burglaries.
Burglary crime in boston by the day of the week,2017
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 286 307 310 266 343 270 232
Crime Commited In Boston By Day Of Week
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 2218 2391 2333 2728 2418 2387 2255
Days Of Week
Number Of Offenses
Aggravated Assault Crime In Boston By The Day Of The Week
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 271 286 272 294 285 352 364
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 286 307 310 266 343 270 232
Number Of Burglary Offenes
Boston Crime By Offense Type, 2017
Aggravated Assault Burglary Homicide Larceny Motor Vehicle Theft Robbery 2124 2014 50 9630 1270 1192
Offense Type
Numbe r Of Offenses
Boston CRIME BY OFFENSE TYPE,2017
Aggravated Assault Burglary Homicide Larceny Motor Vehicle Theft Robbery 2124 2014 50 9630 1270 1192
Crime Committed In Boston By Day Of The Week , 2017
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 2218 2391 2333 2728 2418 2387 2255
Aggravated assault crime in boston by the day of the week, 2017
Monday Tuesday Wednesday Thursday Friday Saturday Sunday 271 286 272 294 285 352 364