The landscape for 大数据 continues to change, but not at the pace seen in previous years.  Highlighted here are a few points of interest from 2016 and predictions for 2017.  

2016-

Enterprise Adoption of 开源 Solutions

  • According to Gartner1 48 percent of companies have invested in 大数据 in 2016.  The number of companies continues to rise and drive the features and requirements that 大数据 solutions are expected to implement.

Pace of Releases Slowed

  • The rise of adoption by enterprise businesses has caused the distributors to slow their release schedule.  Enterprises could not afford to upgrade at the previous pace of releases.  New features are expected to be secure, stable, and complete.  The distributions must respect these expectations if they are to be utilized by large companies.

安全 Moves to the Forefront

  • This is a second effect of the adoption by the enterprise.  Enterprises have established security practices and procedures.  In order to utilize Hadoop within these businesses, providers had to provide solutions for the most common and standard security requirements.

In-Memory 分析 Increasingly Important

  • The desire for in-memory analytics continues to grow.  Spark is one of the most active Apache projects following Ambari and Hadoop2.  Hive has released an in-memory feature called LLAP (live long and process)3.  Kudu is promoted to a top-level Apache project, demonstrating a readiness for wide spread adoption4.

 2017-

Refocus on the Meaning of 大数据

  • While 大数据 has always represented Volume, 速度, 和种类, the focus has historically been on the Volume.  As many adopters are beginning to realize, the volume of data typically available does not always necessitate the use of a 大数据 solution.  What these adopters are finding, 虽然, is that the functionality provided by some 大数据 solutions are solving other problems within the enterprise related to 速度 和种类.

Data Governance

  • This is a requirement driven by the enterprise.  There is a need to provide truth in data, data lineage, and very specific data security.  As 2017 unfolds, there will be an increasing number of solutions implementing some level of data governance.

Cloud, Virtualization, and Automation

  • An increasing number of companies are seeing the benefits of the Cloud.  The benefits include cost of implementation, dynamic sizing, and time to deployment.  The security and benefits of these solutions has been proven.  Automation will see related growth as it will be utilized to magnify the benefits offered by a cloud deployment.

Push Towards Standardization

  • The increasing number of solutions for 大数据 has created a diverging API with custom features and interfaces being created to meet very specific requirements.  ODPi hopes to change that with a set of standard interfaces5.  This will allow a consistent user experience when utilizing solutions from mixed vendors.  No longer will an integrated stack be required.  Instead, the various layers of the solution can easily be exchanged.

 参考文献

  1. http://www.gartner.com/newsroom/id/3466117
  2. http://projects.apache.org/statistics.html
  3. http://cwiki.apache.org/confluence/display/Hive/LLAP
  4. http://kudu.apache.org/2016/07/25/asf-graduation.html
  5. http://www.odpi.org/