国内免费纯正的初中级大数据处理、大数据可视化、数据科学、数据智能等系列文档特别少,由于长期关注 DataFlair相关系列文档,给准备学、已经踏入大数据工作圈的同仁搬运一些资料。
胡巴特推荐&原则:
- 英文不错的同学尽量上官方站点 https://data-flair.training/
- 尽量保持每两天一篇文档翻译,一周不低于一篇 (机器翻译 + 人工校准)
- 视文档阅读情况,增加日常大数据实战经验
000 What is Big Data – Importance and Use Cases
1. Objective
1. 目标
This tutorial will answers questions like what is Big data, why to learn big data, why no one can escape from it. We will also discuss why industries are investing heavily in this technology, why professionals are paid huge in big data, why the industry is shifting from legacy system to big data, why it is the biggest paradigm shift IT industry has ever seen, why, why and why???
本教程将回答以下问题
什么是 大数据
为什么要学习大数据
为什么没有能够脱离大数据
我们还将讨论为什么行业在这项技术上投入巨资,为什么专业人士在大数据上获得巨大的报酬,为什么行业从传统系统转向大数据, 为什么这是 it 行业有史以来最大的范式转变,为什么,为什么,为什么 -- 解释你一切心中的为什么?
What is Big Data – Use Cases and Need
2. Why Learn Big Data?
2. 为什么学大数据?
To get an answer to Why You should learn Big Data? Let’s start with what industry leaders say about Big Data:
你为什么要学习大数据?让我们从行业领导者对大数据的解读开始:
Gartner – Big Data is the new Oil.
IDC – Its market will be growing 7 times faster than the overall IT market.
IBM – It is not just a technology – it’s a Business Strategy for capitalizing on information resources.
IBM – Big Data is the biggest buzz word because technology makes it possible to analyze all the available data.
McKinsey – There will be a shortage of 1500000 Big Data professionals by the end of 2018.
Gartner - 大数据是新的石油。
IDC - 它的市场增长速度将是整个 IT 市场的 7 倍。
IBM - 它不仅仅是一种技术-它是一种利用信息资源的商业战略。
IBM - 大数据是最大的流行语,因为技术可以分析所有可用的数据。
McKinsey - 到 2018年底,大数据专业人才将短缺 1500000 人。
Industries today are searching new and better ways to maintain their position and be prepared for the future. According to experts, Big Data analytics provides leaders a path to capture insights and ideas to stay ahead in the tough competition.
今天的行业正在寻找新的更好的方法来维持他们的现有地位,并为未来做好准备。专家表示, 大数据分析 为领导者提供了一条路径,让他们在激烈的竞争中能够保持领先地位。
3. What is Big Data Analytics?
3. 什么是大数据分析?
So, What is Big data? Different publishers have given their own definition for Big data to explain this buzzword.
那么,什么是大数据?为了解释这个流行语,不同的出版商对大数据给出了自己的定义。
**According to Gartner – **It is huge-volume, fast-velocity, and different variety information assets that demand innovative platform for enhanced insights and decision making.
**A Revolution, authors explain it as – **It is a way to solve all the unsolved problems related to data management and handling, an earlier industry was used to live with such problems. With Big data analytics, you can also unlock hidden patterns and know the 360-degree view of customers and better understand their needs.
Gartner- 它是 huge-volume 数据量大、 fast-velocity 速度快、和different variety 不同品种 需要创新平台来增强洞察力 和 决策制定 .
作者解释说,这是一场革命 这是一种解决所有与数据管理和处理相关的未解决问题的方法,一个早期的行业被用来解决这些问题。通过大数据分析,您还可以解锁隐藏的模式,了解客户的 360 度视图,并更好地了解他们的需求。
i. Big Data Definition
I. 数据定义
In other words, big data gets generated in multi terabyte quantities. It changes fast and comes in varieties of forms that are difficult to manage and process using RDBMS or other traditional technologies. Big Data solutions provide the tools, methodologies, and technologies that are used to capture, store, search & analyze the data in seconds to find relationships and insights for innovation and competitive gain that were previously unavailable.
换句话说,大数据以 TB 级别的数量产生。它变化很快,以各种难以管理和处理的形式出现 关系数据库 或者其他传统技术。大数据解决方案提供了用于捕获、存储、在几秒钟内搜索和分析数据,找到以前无法获得的创新和竞争优势的关系和见解。
80% of the data getting generated today is unstructured and cannot be handled by our traditional technologies. Earlier, an amount of data generated was not that high. We kept archiving the data as there was just need of historical analysis of data. But today data generation is in petabytes that it is not possible to archive the data again and again and retrieve it again when needed as data scientists need to play with data now and then for predictive analysis unlike historical as used to be done with traditional.
It is saying that - “An image is a worth of thousand words“ . Hence we have also provided a video tutorial for more understand what is Big data and its need.
今天产生的数据有 80% 是非结构化的,我们的传统技术无法处理。早些时候,生成的数据量没有那么高。由于只需要对数据进行历史分析,我们一直将数据存档。但是今天的数据生成以 pb 为单位,不可能一次又一次地归档数据,并在需要时再次检索数据数据科学家需要不时地使用数据进行预测分析,这与过去传统的历史不同。
它是说-“一个形象就是千言万语“。因此,我们还提供了一个视频教程,以更多地了解什么是大数据及其需求。
Refer the best book to learn Big data and its Technologies.
4. Big Data Use-Cases
4. 大数据的使用案例
After learning what is analytics. Let us now discuss various use cases of Big data. Below are some of the Big data use cases from different domains:
在学习了什么是分析之后。现在我们来讨论一下大数据的各种用例。下面是来自不同领域的一些大数据用例:
Netflix Uses Big Data to Improve Customer Experience
Promotion and campaign analysis by Sears Holding
Sentiment analysis
Customer Churn analysis
Predictive analysis
Real-time ad matching and serving
Netflix 利用大数据提升客户体验
西尔斯控股的促销和活动分析
情感分析
客户流失分析
预测分析
实时广告匹配和服务
Read about Big data Use cases in detail.
5. Big Data Technologies
5. 大数据技术
There are lots of technologies to solve the problem of Big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, etc. Let’s take an overview of these technologies in one by one-
解决大数据存储和处理问题的技术有很多。这些技术有 Apache Hadoop 、 Apache Spark 、 Apache Kafka 等。
i. Apache Hadoop
Big data is creating a Big impact on industries today. Therefore the world’s 50% of the data has already been moved to** Hadoop**. It is predicted that by 2017, more than 75% of the world’s data will be moved to Hadoop and this technology will be the most demanding in the market as it is now.
今天,大数据正在对行业产生巨大影响。因此,世界 50% 的数据已经转移到Hadoop。据预测,到 2017年,全球超过 75% 的数据将被转移到 Hadoop,这一技术将是目前市场上要求最苛刻的技术。
ii. Apache Spark
Further enhancement of this technology has led to an evolution of Apache Spark – lightning fast and general purpose computation engine for large-scale processing. It can process the data up to 100 times faster than** MapReduce**.
这项技术的进一步提高导致了 Apache Spark -用于大规模处理的闪电快速通用计算引擎。它可以处理的数据高达 100 倍的速度比** MapReduce**.
iii. Apache Kafka
Apache Kafka is another addition to this Big data Ecosystem which is a high throughput distributed messaging system frequently used with Hadoop.
IT organizations have started considering Big data initiative for managing their data in a better manner, visualizing this data, gaining insights of this data as and when required and finding new business opportunities to accelerate their business growth. Every CIO wants to transform his company, enhance their business models and identify potential revenue sources whether he being from the** telecom domain, banking domain**, retail or **healthcare domain **etc. Such business transformation requires the right tools and hiring the right people to ensure right insights extract at right time from the available data.
Apache Kafka这是这个大数据生态系统的另一个补充,这是一个经常与 Hadoop 一起使用的高吞吐量分布式消息系统。
IT 组织已经开始考虑大数据计划,以更好的方式管理他们的数据,可视化这些数据, 根据需要随时了解这些数据,并寻找新的商业机会来加快业务增长。每个首席信息官都希望改变他的公司,增强他们的商业模式,并确定潜在的收入来源 电信域, 银行领域, 零售 或者 **医疗保健领域 ** 等等。这种业务转型需要合适的工具和雇佣合适的人,以确保在合适的时间从可用的数据中提取正确的见解。
6. Conclusion
6. 结论
Hence, Big Data is a big deal and a new competitive advantage to give a boost to your career and land your dream job in the industry!!!
Hope this blog helped you to understand what is big data and the need to learn its technologies. If you have any other questions so please let us know by leaving a comment in a section given below.
因此,大数据是一件大事,也是一种新的竞争优势,可以促进你的职业生涯,并在这个行业找到你梦想中的工作!!
希望这个博客可以帮助你理解什么是大数据,以及学习大数据技术的必要性。如果你有任何其他问题,请在下面的章节中留下评论,让我们知道。