TechTrade Asia: Cloudera introduces Apache Spark 2.0, Kudu 1.0

Cloudera, the global provider of data management and analytics built on Apache Hadoop and open source technologies, has announced its release built on the Apache Spark 2.0 (Beta), with enhancements to the API experience, performance improvements, and enhanced machine learning capabilities.

In addition, Cloudera is working with the community to continue developing the Apache Kudu 1.0 open source storage engine, recently released by the Apache Software Foundation. Cloudera’s latest contributions to these open source projects alongside deeper integration for its platform recognise the growing need for streaming and analyzing real-time data in high-demand workloads, including machine learning models deployed in production by Cloudera’s enterprise customers.

Apache Spark
Cloudera was the first Hadoop big data analytics vendor to deliver a commercially supported version of Spark, and has participated actively in the open source community to enhance Spark for the enterprise through its One Platform Initiative. With Spark 2.0, organisations are better able to take advantage of streaming data, develop richer machine learning models and deploy them in real time, enabling more workloads to go into production.

Spark 2.0 features include:

● Structured streaming for better performance and easier ingest of traditional structured data, for time series, tabular and Internet of Things (IoT) data

● Compile-time type safety for user defined functions, for improved reliability in mission-critical applications

● Machine learning model, pipeline persistence and newly supported machine learning libraries to take on new data sets and analytic applications

"Cloudera was the first vendor to offer a commercially supported version of Apache Spark in our big data platform. In the years since then, Spark has become a standard for stream processing and machine learning workloads across the industry," said Mike Olson, founder and Chief Strategy Officer at Cloudera. "As a component of a Cloudera enterprise data hub, Spark benefits from the security, manageability, data governance and compliance services that customers demand. It can handle high-scale, highperformance workloads reliably. Being a part of the global Spark community, and committed to continued enhancements for demanding enterprises."

Apache Kudu

In September of 2015, Cloudera announced the public beta release of Apache Kudu, its high performance columnar store for Hadoop that enabled the combination of fast analytics on fast data. Two months later, Cloudera donated Kudu to the Apache Software Foundation (ASF) to open it to the broader developer community to expand the type and variety of fast analytic use cases. While Spark 2.0 will give businesses better access to streaming data, Kudu 1.0 will enable enterprises to adopt real-time use cases at a greater pace.

“Kudu is a response to the increase in prevalence of real-time analytic use cases in the market,” said Charles Zedlewski, VP, Products at Cloudera. “As far back as 2012, Cloudera recognised the analytic gap in the Hadoop ecosystem that was leading architects to create complex hybrid architectures for real-time analytics. With the Apache Kudu 1.0 launch, the original vision is coming to fruition as users can now rely on a single, simplified project for fast analytics on fast data. We’ve seen the community quickly adopt Kudu and apply it to numerous high-scale, real-time analytic use cases.”

Kudu offers fast scans across data for analytics, and instant read/write capabilities for frequent updates and searches. Kudu also enables enterprises to adopt real-time use cases at a greater rate. Along with its integration with Spark, Kudu 1.0 is also tightly integrated with MapReduce and Impala to enable best-inclass processing.

Kudu 1.0 features include:

● A simplified architecture that enables very fast batch and stream processing

● Fault tolerance and scalability into the hundreds of nodes

● A columnar structure that enables analytic analysis on the latest data, for real-time use cases such as time series data, machine data analytics and online reporting

Interested?

Download Spark 2.0 or Kudu 1.0

TechTrade Asia

Pages

06 October, 2016

Cloudera introduces Apache Spark 2.0, Kudu 1.0

No comments:

Post a Comment