This is the 2nd article in our series of What is Data Science? Today, let’s talk about data collection for data science. As a data scientist, you don’t actually have to collect the data, but it’s good to know the information. Because it’s important to know the resources and the process to collect data from the perspective of data collection for data science. It is because you go for an interview; I have seen it several times. I have taken interviews and I have asked this question to people who are good with technology, they know how it works, but they tend to struggle when I ask them what data collection is or I ask them what primary and secondary data is. This may prove to be one of the mistakes while we aim for a data science career. They get confused.
So, basically they are missing the basic point or basic stuff which is a part of data science. You should know how the source of your data which you’re using to build models and algorithms. So, again I am saying you don’t have to do it, but you must know how data collection is done and what the process is. It’s because you will get the data from the client.
So, there are several ways to collect data:
- Online Surveys
- Door to door collection of data
- Collect data from the secondary resource
- Government Organizations
- Research done by the researchers
Quality Control: Since quality control actions take place during or perhaps after the information collection all of the details are meticulously recorded. There’s a need for a clearly defined system as a precondition for creating monitoring methods. Uncertainty regarding the flow of info isn’t suggested as a badly structured communication system results in lax monitoring and can certainly also restrict the possibilities for detecting errors. Quality management is accountable for the identification of measures required for correcting defective data collection methods and also minimizing such later occurrences. A staff is a lot more apt to not recognize the need to do these activities in case their procedures are actually written vaguely and aren’t based on education or feedback.
I am sure all of you guys have gone through some sort of survey or some sort of questionnaire. It can be online or offline. You read out the questions and choose an option as your answer. That is a sort of data collection which is pretty famous these days.