Many companies and organisations collect data to improve their information and products. Skills in collecting data can make this process more efficient and reliable. Data is a collective name for information recorded for statistical purposes. There are many different types of data:

Discrete data - numerical data that can only take certain values, for example, the number of children in a classroom or a shoe size.

Continuous data - numerical data that can take any value within a given range, for example, the masses of 10 babies or the heights of some adults.

Primary data - data that has been collected from the original source for a specific purpose, for example, if a school wanted to know what their students thought of the school canteen service they would question the pupils directly.

Secondary data - data that is not originally collected by a group for a specific purpose, for example, finding out the average cost of cars in a car park by using national statistics.

We are looking at collecting data. So there are four topics we'll cover types of data, random sampling, stratified sampling and questionnaires. So the first topic, types of data. Data. We've got three different categories we can use to define different types of data.

So the first one, qualitative versus quantitative. Well, qualitative is non numerical data. It's things like it could be diaries or video transcripts or interviews with people, or kind of open ended survey questions, things like that. So things were normally it's the written word that we kind of tried to gleam information from. You may get this more in social sciences, right?

If you're doing a kind of psychology study, you might be looking at the impacts of war on mental health or something. And you might read the diaries of people who were at war. Quantitative is numerical. So quantitative could be, for example, the number of students in a class, or it could be the number of years a prime minister has served, or something like that. So it's very easy to quantify.

They are the main two differences, moving on to continuous and discrete types of data. So this way of categorising data is slightly different. Continuous means that we can continue to get more and more precise with our measurements. So, for example, length of a piece of string or someone's height, we can continue to get more and more precise in how we measure someone's height, right? I could say someone is 1.85 metres tall, but I could also say 1.85 43 metres tall.

Or I could get even more precise the 1.85,431,679 metres tall. I could keep going and be more and more precise. There's no ends to it. Likewise with time, we could say you say bulk and 100 metres in 9.58 seconds, but we could get more and more precise, right? We could say it's actually 9.58 one.

There's no end in sight. Really. With continuous data we can get more, more and precise. Discrete data is the opposite. Discrete data is already in nice categories for us.

So again, this could be the number of students in a class or it could be the number of cars that go under a bridge in an hour. It's very clearly, very clearly count up in chunks. It's one car, then another car, then another car. There's no kind of question of the precision that we have to give our answer to those two. If we now look at primary versus secondary, well, primary data is data that we collect ourselves.

We could go out into the field and survey people, or we could go over to the bridge and count the cars driving under the bridge. Secondary is when we use other people's data. So, for example, we go to public library and read through newspaper articles and we could use those articles to understand, help us in our study of people who are infected by war. We could read the articles from the time or we may have data from the highway traffic control. They may have already counted the cars.

They use those weird hoses to count the cars as they go over the hose. We could use that data from some secondary source to help us work out the frequency of cars passing a point. So those two explain that. So that's our three different ways of categorising data. Let's move on to random sampling.

Random sampling is really as the name suggests. This is where we have a sample. We could say, have a school for the kids and we want to just work out, say, the average time that students wake up in the morning. And so we could just pick 50 people, 50 kids out of this school, and ask those 50 kids, what time do you wake up in the morning? That would be a random sample.

Stratify sampling is a little bit more accurate, often with the results that we go from it. Stratified sampling is, say, in our school we've got, say, year sevens, we've got your seven s, year eight, year nines, year ten s and year eleven s. Well, let's say we've actually got, for some reason, there are loads of people having kids when the year eleven s were born. And so there are 100 year eleven s and they're only like 40 year sevens. I could fill in these photos, just make up some numbers.

What if we know that there are far more year 11th than year 7th? Well, surely we should take that into account, right? We shouldn't pick, say, ten students from year seven and ten students from year eleven because we're going to skew our results. We should take into account the fact that there are loads of your eleven s probably going up super late, right? Teenagers always sleep in, so we're unlikely to get that information if we only sample ten year eleven.

If we end up randomly samping 50 kids, we're going to roughly get ten from each group, but it's not likely to give us the most accurate results. What we should do instead is we should wait our results, we should sample twice as many year eleven s as we do year eight because there are twice as many year eleven in the school as there are year eights. That's stratified sampling. It's taking into account the distribution of your sample of data. Questionnaires so, response options for questionnaires.

This is nothing that comes up quite often.

Well, let's say we want to conduct this survey, right? We want to ask people when they wake up in the morning what we could say, do you wake up between eight and 815?

816, 834-831-2845. Some people might sleep in right into the last minute. Well, is this the right way to go about this? What if someone makes up at 745, which if we give them boxes that they can tick to pick each of these options, which box would someone tick if they wake up at 745, there is no box. Likewise, there's someone at 06:00 in the morning.

There is no box for them to take. And so, whilst most people will fit into one of these three boxes, we also need to provide a box that captures all those people who wake up before eight. So we could say anyone who wakes up before eight tick this box. And likewise, if someone makes up at 855, there's no box for them to stick. And so we could say before rate is on box, and to capture the rest of the people after 845 would be another option.

And that means that whatever time someone makes up, there's always a box for them to tick. Bias. So bias is in the way that we structure our questions.

With this questionnaire, we may say experts have concluded that waking up early is better for your health. When do you wake up? Well, would this be a good question to ask people? Because we've just told them that if you wake up early, that's good. And if they've now got that in their minds, they're more likely to want to inadvertently lie on the questionnaire.

If they get up at 830 ish rather than ticking this box, they may say, well, sometimes I get a ten plus day, I'll tick this box and I'll convince myself that I'm healthy. And so we always need to take care of bias by making it very clear that we don't influence anyone's answers when we're asking them the questions. Let's look at a couple of example questions. So we've got a table showing the number of students in each year group at a school.

Jenny is carrying out a survey for her GCC mathematics projects. She uses a stratified sample of six students according to the year group calculate the number of year eleven students that should be in her samples. This is very similar to the example we looked at on the previous page. Right? So we can see here we've got 130 year eleven in total.

We need to work out exactly if we're measuring 60 students in total, how many should we pick from year eleven to include in our sample? Well, first of all, we need to know how many students are in total, right, to know what 130 actually represents as a fraction of the total number of students. So we can just do 190 plus plus 145 plus 140.

And if I mark that into my calculator, what do I get?

We have 750 students in total at the school. Well, how can we work out what this is as a fraction of the total? What all we do is we say we've got 130 out of a total of 750. So this fraction is representative of the fraction of students that are in year eleven. So if you want to make sure that we have a representative sample of year eleven, we need the same fraction.

We need to sample the same fraction of students. Right? So we need this fraction of 60 students. And the answer to this will then give us the number of students in your eleven that we need to sample. So if I put this into my calculator, 130 over 750 times 60, we end up with 10.4.

Obviously we can't sample just the legs of a student. So we're going to round this to the nearest student, which is going to be ten students, right? Question two, final question. Alison wants to find out how much time people spend reading books. She's going to use a questionnaire to figure this out.

So she wants us to design a suitable questionnaire for this study. So how much time do you spend reading books? Well, first of all, we don't know is that per day or per week? I would say roughly, given how people tend to read books, they might read a book one day and then not read it for a couple of days, and then pick it up again a couple of days, after that, read it more. And so let's say on average, we'll sample over a week.

Let's say on average, how long do you spend reading per week?

And this is our question. We also need to give people options to tick. So we can say some people don't spend any time at all, right? Or they read very little. And so we could say less than 30 minutes could be our first option, right?

Some people don't read much at all. Then we could say next option could be say 31 to 60 minutes, give them a box to tick there.

Some people might do 61 to 90 minutes.

We might do one more one like this.

If you read more than an hour and a half or 2 hours a week, it's not that obvious to most people, probably how long exactly they spend. We could do 91 to an hour and a half and 3 hours.

And the people who have the luxury of reading for more than 3 hours a week, we could say more than 3 hours.

And again, that will allow us to capture everyone's results. Right. There won't ever be a case where someone doesn't have a box to tick for this. So that would be a fairly good way to lay our question out. And that's the end of the chapter.

# Book a lesson with this tutor

Oli W

A young, gifted tutor who uses his engineering experience to ignite your curiosity in Maths. A straight 'A' student, he knows exactly how you need to learn to help you ace your exams.