About Data Science

The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it called data science

Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies.

Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase the organization's competitive advantage.

Data is not the new Oil. This statement shows how every modern IT system is driven by capturing, storing and analysing data for various needs.

Be it about making decision for business, forecasting weather, studying protein structures in biology or designing a marketing campaign.

All of these scenarios involve a multidisciplinary approach of using mathematical models, statistics, graphs, databases and of course the business or scientific logic behind the data analysis.

So we need a programming language which can cater to all these diverse needs of data science.

Python shines bright as one such language as it has numerous libraries and built in features which makes it easy to tackle the needs of Data science.

Data Science is the practice of:

Asking questions (formulating hypothesis), answers to which solve known problems or unearth unknown solutions that in turn drive business value.

Defining the data needed or working with an existing data set and employing tools (computer science based) to collect, store and explore such data generally in huge volume & variety (often more than 1 TB and 1000s of dimensions).

Identifying the type of analysis to be done to get to the answers and performing such analysis by implementing various algorithms/tools (statistics based), often in a distributed and parallel architecture.

Communicating the insights gathered from the analysis in the form of simple stories/visualizations/dashboards (the Data Product) that a non-data scientist can understand and build conversation out of it.

It should be kept in mind that a product can also be an piece of code that is internal to a company and is used by various departments. The presentation, maintenance, scalability, etc of the code are then the product features, which is often not practiced in many organizations.

Building a higher level abstraction that does steps 2-3-4 in an autonomous way, analyzing & taking actions on new data as they are fed to the system.

How Data Science works?

Data science incorporates tools from multi disciplines to gather a data set, process and derive insights from the data set, extract meaningful data from the set, and interpret it for decision-making purposes.

The disciplinary areas that make up the data science field include mining, statistics, machine learning, analytics, and some programming.

Data mining applies algorithms in the complex data set to reveal patterns that are then used to extract usable and relevant data from the set.

Statistical measures like predictive analytics utilize this extracted data to gauge events that are likely to happen in the future based on what the data shows happened in the past.

Machine learning is an artificial intelligence tool that processes mass quantities of data that a human would be unable to process in a lifetime.

Machine learning perfects the decision model presented under predictive analytics by matching the likelihood of an event happening to what actually happened at the predicted time.

So its very prominent that Data Science has a very promising future and has a lot of scope.

Breaking down Data Science

Data is drawn from different sectors and platforms including cell phones, social media, e-commerce sites, healthcare surveys, internet searches, etc.

The increase in the amount of data available opened the door to a new field of study called Big Data or the extremely large data sets that can help produce better operational tools in all sectors.

The continually increasing sets of and easy access to data are made possible by a collaboration of companies known as fintech, which use technology to innovate and enhance traditional financial products and services.

The data produced creates even more data which is easily shared across entities thanks to emergent fintech products like cloud computing and storage. However, the interpretation of vast amounts of unstructured data for effective decision making may prove too complex and time consuming for companies, hence the emergence of data scien.