A beta version of Hadoop has been released by Yahoo! With built in security. It has also open sourced the latest version of its in house workflow engine for the number crunching platform similar to Google. At the Hadoop summit organised in Santa Clara, California sponsored by Yahoo! Eric Baldeshwieler said that the Hadoop having security beta and the Oozie workflow engine have both been outsourced at Apache. As per him it is designed to allow sharing data that is sensitive with adequate permissions and facilitate easy compliance to regulations.

Basic software similar to Google’s

The Hadoop clusters, which is maintained by Yahoo! For research organisations deploy security beta now. The Reg has even been told by Baldschwieler that it now forms a part of Yahoo!’s Hadoop Distro. Hadoop is based on a software infrastructure that is proprietary to Google; it is a way of crunching large amounts of data across a distributed machine network. It gets its name from a stuffed elephant that belongs to the son of the founder of the project, Dough Cutting.

License granted by Google itself

Google had granted the necessary license to Apache Hadoop after three months of securing the MapReduce patent itself. This has eased out the infringement concerns over the opensource project similar to MapReduce.

Based on Google’s research papers [2004]

It is an open source platform that underpins many services available online from face book Yahoo! Twitter to even Microsoft. Google’s distributed file system [GFS] and its distributed number crunching platform called MapReduce. In the year 2004 a couple of research papers were produced by Google on these very technologies which were then used by Cutting to build this platform to back his own open resource web crawler-Nutch. Hadoop, open sourced at apache still has Yahoo! as its largest contributor.At the Hadoop summit organised in Santa Clara, California sponsored by Yahoo!

Baldschwieler, a senior employee at Hadoop said that security enabled Hadoop has integrated a platform with Kerberos an open source authentication standard. It has even added enhanced logging features. As per him it is designed to allow sharing data that is sensitive with adequate permissions and facilitate easy compliance to regulations. It is designed in such a manner to prevent unauthorized access to information that is stored on Hadoop clusters it also helps in co-locating data sensitive to a particular business and cost reduction by cluster consolidation.

The Elephant keeper

Named after the Burmese term for elephant keeper Oozie has a design specifically made for complex workflows and data pipelines. It is the de facto ETL (extract, Transform and Load) processing standard at yahoo!. It is an open source workflow engine that also forms a part of the latest Hadoop distro from start up Cloudera. Baldeschwieler said that Oozie has been in use at Yahoo1 since about six months now.

SUMMARY

Currently, Oozie 2.0 is opensourced, but previously, 1.0 version was open sourced. . In the year 2004 a couple of research papers were produced by Google on these very technologies which were then used by Cutting to build this platform to back his own open resource web.