Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the … Continue reading Hadoop Delegation Tokens Explained
Tag: CDH
New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators
Cloudera Data Science Workbench (CDSW) provides data science teams with a self-service platform for quickly developing machine learning workloads in their preferred language, with secure access to enterprise data and simple provisioning of compute. Individuals can request schedulable resources (e.g. compute, memory, GPUs) on a shared cluster that is managed centrally. While self-service provisioning of … Continue reading New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators
Deep Learning with Intel’s BigDL and Apache Spark
how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a … Continue reading Deep Learning with Intel’s BigDL and Apache Spark
Introducing S3Guard: S3 Consistency for Apache Hadoop
Synopsis This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works. Problem Although Apache Hadoop has support for using … Continue reading Introducing S3Guard: S3 Consistency for Apache Hadoop