Tuesday, 20 February 2018

What is Data Science

Data science is a multidisciplinary combination of data inference, algorithm development, and technology to solve analytically complex problems.

At the core are the data. Gathering raw information, transmitted and stored in business data stores. Lots to learn when extracting it. Advanced features we can build with it. Data science is ultimately based on the use of this data in a creative way to generate business value:

FIG1.1 DATA SCIENCE LEARNING 


Data science: data information discovery
This aspect of data science is about figuring out the findings of the data. Dive into a granular level to extract and understand complex behaviors, trends and inferences. It's about revealing hidden information that can help companies make smarter business decisions. For example:

Netflix data hampers movie viewing standards to understand what drives the user's interest and uses it to make decisions about what the original Netflix series should produce.

Target identifies the major customer segments within its base and the unique buying behaviors in those segments, which helps direct messages to different audiences in the marketplace.

Procter & Gamble uses time series models to better understand future demand, helping to plan production levels more optimally.

How do data scientists extract information? Start with data mining. When presented with a challenging question, data scientists become detectives. They investigate clues and try to understand the pattern or characteristics within the data. This requires a great deal of analytical creativity.

Then, as needed, data scientists can apply a quantitative technique to get a deeper level, inferential models, segmentation analysis, time series prediction, synthetic control experiments, etc. The intent is to scientifically link a forensic view of what the data actually say.

This data-based information is essential to provide strategic guidance. In this regard, data scientists act as consultants, guiding commercial stakeholders on how to act on results.

FIG1.2.DATA INFORMATION DISCOVERY 

Data science - data product development

A "data product" is a technical resource that: (1) uses data as input and (2) processes that data to return algorithmically generated results. The classic example of a data product is a recommendation mechanism, which ingests user data and makes personalized recommendations based on that data. Here are some examples of data products:

Amazon recommendation engines suggest items to buy, determined by their algorithms. Netflix recommends movies. Spotify recommends music.

Gmail's spam filter is a data product: a behind-the-scenes algorithm processes incoming mail and determines whether a message is garbage or not.

The computational vision used for cars driving alone is also a data product: machine learning algorithms can recognize traffic lights, other vehicles on the road, pedestrians, etc.

This is different from the previous section of "data information," where the result may be to provide advice to an executive to make a smarter business decision. Rather, a data product is a technical feature that encapsulates an algorithm and is designed to integrate directly into core applications. Examples of applications that incorporate data products behind the scenes: the Amazon home page, the Gmail inbox, and standalone driving software.

Data scientists play a central role in the development of data products. This involves building algorithms as well as testing, refining and technical implementation in production systems. In this sense, data scientists act as technical developers, creating features that can be exploited on a large scale.
FIG1.3 Data Science - data product development


Mathematical knowledge

At the heart of data mining and data construction, the product is the ability to see data through a quantitative lens. There are textures, dimensions and correlations in the data that can be expressed mathematically. Finding solutions using data becomes a challenge to the mind of heuristic and quantitative techniques. Solutions to many business problems involve building analytical models based on difficult mathematics, where the ability to understand the underlying mechanics of these models is critical to the success of their construction.

Moreover, one misconception is that data science is concerned with statistics. Although statistics are important, it is not the only kind of mathematics used. First, there are two branches of statistics: classical statistics and Bayesian statistics. When most people referand statistics, generally refer to classical statistics, but knowledge of both types is useful. In addition, many inferential techniques and machine learning algorithms are based on the knowledge of linear algebra. For example, a popular method for discovering hidden features in a dataset is SVD, which is based on matrix mathematics and has much less to do with classic statistics. 

In general, it is useful for data scientists to have breadth and depth in their knowledge of math. Technology and piracy First, let's make it clear that we are not talking about how to get into computers. We refer to the meaning of hacking the subculture of the technological programmer, that is, creativity and ingenuity in the use of technical skills to build things and find intelligent solutions to problems. Why is piracy important? Because data scientists use technology to challenge large datasets and work with complex algorithms, it requires much more sophisticated tools than Excel. 

Data scientists should be able to code, prototype quick fixes and integrate with complex data systems. The main languages associated with data science include SQL, Python, R and SAS. On the periphery are Java, Scala, Julia and others. But it is not just knowing the basics of language. A hacker is a technical ninja, able to creatively navigate through technical challenges to make his code work. In this sense, a data science hacker is a solid algorithmic thinker, who has the ability to break down complicated problems and recompose them. ways that are solvable. This is critical because data scientists operate within a large algorithmic complexity. 

They need to have a strong mental understanding of high-dimensional data and difficult data flow streams. Complete the clarity of how all pieces come together to form a cohesive solution. Strong Business Insight It is important that a data scientist be a tactical business consultant. By working so closely with the data, data scientists are able to learn from the data in a way that no one else can.


FIG1.4 DATA SCIENCE KNOWLEDGE 

This creates the responsibility of translating observations into shared knowledge and contributing to the strategy of solving the key business problems. This means that a central competence in data science is the use of data to convincingly tell a story. Without data, vomiting: instead, they present a coherent narrative of problems and solutions, using data information as supporting pillars, which lead to guidance. Having this vision for business is as important as having a vision of technology and algorithms.

 There must be a clear alignment between data science projects and business objectives. Ultimately, value does not come from data, math, and technology itself. It is about taking advantage of all the above to build valuable capabilities and have a strong commercial influence. What is a data scientist? Curiosity and training. The mentality A common feature of the personality of data scientists is that they are deep thinkers with intense intellectual curiosity. Data science has to do with being curious: asking new questions, making new discoveries, and learning new things. 

Ask data scientists who are more obsessed with their work that leads them to their work, and they will not say "money." The real motivator is to be able to use your creativity and ingenuity to solve difficult problems and constantly enjoy your curiosity. Deriving complex readings of data goes beyond simply making an observation, it is about discovering the "truth" that is hidden beneath the surface. Problem solving is not a task, but an intellectually stimulating journey to a solution. 

Data scientists are passionate about what they do and are very pleased to accept the challenge. Training There is a misconception that you need a doctorate in science or mathematics to become a legitimate data scientist. This view ignores the fact that data science is multidisciplinary. The highly focused academy study is certainly useful, but it does not guarantee that graduates have the full set of experiences and skills to succeed. For example a doctor statisticYou still need to acquire many programming skills and gain business experience to complete the trifecta. 

In fact, data science is a relatively new and growing discipline that universities have failed to develop a comprehensive data science. degree programs, which means no one can say that they "did all the education" to become a data scientist. Where does a large part of the training come from? The unshakable intellectual curiosity of data scientistsleads them to motivate self-learners, led to self-learning the right skills, guided by their own determination. Analysis and Mechanical Learning: How It Relates to Data Science There are many terms closely related to the science of data that we hope will add some clarity. 

What is Analytics?

 Analyzes have risen rapidly in popular corporate jargon in recent years; The term is generally used, but is generally intended to describe critical thinking that is quantitative in nature. Technically, analysis is the "science of analysis," that is, the practice of analyzing information to make decisions. Is "analysis" the same as data science? Depends on the context. Sometimes it is synonymous with the definition of data science that we describe and sometimes represents something else. A data scientist who uses raw data to construct a predictive algorithm falls within the scope of analysis. At the same time, a non-technical business user who interprets the pre-built dashboard reports (eg GA) is also in the field of analysis, but does not intersect with the required set of data science. Analyzes have taken on a broad meaning. 

At the end of the day, as long as you understand beyond the keyword level, the exact semantics do not matter too much. What is the difference between an analyst and a data scientist? "Analyst" is something like an ambiguous job title that can represent many different types of roles (data analyst, marketing analyst, operations analyst, financial analyst, etc.). What does this mean compared to the data scientist? Data Scientist: specialized role with skills in math, technology and vision for companies. Data scientists work at the raw database level to obtain information and generate product data. Analyzer: This can mean many things.


 The common thread is that analysts analyze data to try to get information. Analysts can interact with data at the database level and the summary report level. Therefore, "analyst" and "data scientist" are not exactly synonyms, but are not mutually exclusive. Here is our interpretation of how these job titles relate to skills and the scope of responsibilities: what is machine learning? Machine learning is a term closely related to data science. Refers to a wide class of methods that revolve around data modeling to (1) algorithmically make predictions and (2) algorithm to decipher patterns in data. machine learning to make predictions: the main concept is the use of labeled data to train predictive models. 
FIG1.5 DATA ANALYTICS

The marked data are observations in which the truth of the terrain is already known. Training models mean automatically characterizing the marked data in order to predict the labels for unknown data points. For example A credit card fraud detection model can be trained using a historical record of labeled fraud purchases. The resulting model estimates the likelihood that new purchases will be fraudulent. Common methods for training models range from basic regressions to complex neural networks. 

All follow the same paradigm known as supervised learning. learning machine for pattern discovery: another modeling paradigm known as supervised learning does not attempt to show underlying patterns of data and associations when there is fundamental truth (ie, no comment has been labeled) is not known. Within this broad category of methods, the most common techniques are clustering, which through natural clustering detection algorithms that exist in a dataset are.

VIDEO1.1 DATA SCIENCE TRAINING 


 For example, collation can be used to learn programmatically natural customer segments in a company's user base. Other unsupervised methods for extracting the underlying characteristics include: Principal Component Analysis, Hidden Markov Model, Theme Templates, and more. Not all machine learning methods fit perfectly into the two previous categories. For example, collaborative filtering is a kind of recommendation algorithm with elements related to supervised and unsupervised learning. Contextual bandits are a shift in supervised learning, where predictions are modified adaptively in real time by feedback in real time. This wide range of machine learning techniques is an important part of the data science toolkit. It is up to the data scientist to determine which tool to use in different circumstances (and how to use the tool correctly) to solve analytically open problems. What is Data Munging? Raw data can be unstructured and disordered,

Information from different data sources, incompatible or missing records, and a host of other complicated problems. Data munging is a term to describe the data dispute to collect data in cohesive views, as well as the cleaning work of cleaning data so that it is polished and ready for later use. This requires a good sense of pattern recognition and skillful hacking skills to merge and transform large amounts of information at the database level. If not done correctly, dirty data can obscure the "truth" hidden in the data set and completely confuse the results. 

Therefore, any data scientist must be skillful and agile in confusing data to have accurate and usable data before applying more sophisticated analytic tactics. Final Word For any company that wants to improve its business by being more data-driven, data science is the secret sauce. Data science projects can have a multiplicative return on investment, both from the guideline through the knowledge of the data, and from the development of the data product.

 However, hiring people who carry this powerful combination of different skills is easier said than done. There simply is not enough supply of data scientists in the market to meet the demand (the salary of the data scientists is very high). So when you can hire data scientists, create them. Keep them engaged. Give them autonomy to be their own problem-solving architects. This establishes them in the company to solve highly motivated problems, to face the most difficult analytical challenges.