The Media Industry is changing, so stay tuned!

Jeff Grassinger

Jeff Grassinger

Sales Alliance Manager

The IBC trade show (12-16 Sep) is just around the corner!  At this year’s show we will see consumer technology trends driving rapid changes in the media industry.  One of the key trends is multiplatform content delivery in our non-stop “TV Everywhere” world. This one trend alone will have dramatic effects throughout the media industry, as it will necessitate significant transformation in the workflows in use today.

According to eMarketer, the average time spent with digital media per day will surpass TV viewing time for the first time this year. In addition, Ooyala’s Global Video Index reports that mobile and tablet viewing increased 133% year over year between Q1 2013 and the end of Q1 2014. So digital media consumption is not only growing fast, but specifically video on mobile devices is a significant area of dynamic growth.

As with any change that comes to the industry, media professionals are asking, “what workflow changes do I need to make to engage this new audience and how does that change the way we go to business?”

In the short term, it’s about leveraging the workflows you use today.  In the long term, success depends on adapting your workflows to new technology. Read on to find out how.

What are the goals and keys to success?

Workflows in TV Everywhere applications allow users to access their favorite content on their favorite devices, and discover new interests by searching for content or accepting recommendations. Your TV Everywhere goal is to create “stickiness” or engagement that encourages consumers to stay longer to drive your bottom line advertising and/or subscription revenue.

However unlike traditional broadcast workflows, TV Everywhere requires an array of codecs, streaming formats, digital platforms, and new and vastly different workflow architectures.

  • One of the most important goals when planning and building for the TV Everywhere future is workflow agility. Codecs, streaming formats and devices will change and evolve at a much more rapid pace than broadcast technology. Anticipating TV Everywhere technology, the foundation of your workflow must be agile enough to meet the requirements of today, while also being able to support new workflows with ease. Media organizations with legacy technology based on proprietary infrastructure, proprietary protocols, and inflexible technology will find themselves at a distinct disadvantage to their competitors.
  • Another important goal is to create simplified and consolidated file-based workflows based on a data center infrastructure. The technologies you choose need to work together simply, with a focus on an exponential reduction in operational costs. File-based workflows will significantly reduce the need to manually create and deliver media, maximizing topline growth and bottom line results. Data center technologies such as networking, virtualization, software-defined architectures, data analytics, and cloud solutions will increasingly be part of tomorrow’s delivery of TV Everywhere.

There is no arguing that the media industry is in transformation. While the living room TV screen continues to be the screen of choice for many viewers, it is clear that smaller screens will continue to see dramatic increases in video traffic. Seizing the opportunity to engage and entertain this new viewership, media organizations need to focus on technologies that enable business agility and data center file-based workflows.

Have you started your transition to agile, simplified and consolidated data center file-based media workflows? How has this industry transformation influenced your visit to this year’s IBC trade show?

Stop by the EMC Isilon booth #7.H10 to see our new storage software, platforms, and media solutions. I look forward to talking to you, and showing you how EMC Isilon can help you incorporate new technologies, design adaptable workflows, and improve your bottom line.

Free Hadoop….pun intended!

John Nishikawa

John Nishikawa

Director, Business Development & Alliances

free (adjective, verb, adverb) – to make free; release from imprisonment; to unlock your data

free (adjective, verb, adverb) – no charge; Isilon HDFS license key

Isilon is the #1 Enterprise Shared Storage Provider for Hadoop. We have more customers and more capacity in our storage infrastructure used for Hadoop than any other enterprise shared storage provider.  Are you looking to get more business insight out of your data to drive innovation, provide competitive advantage, improve customer satisfaction, accelerate time-to-market, or even in some cases – save actual lives?  If so, the power of your data is sitting right there in your Isilon cluster.  All you need to do is free Hadoop and bring it to your data.  Join the Free Hadoop revolution!  Here’s how.

Hadoop blog- Pic 2

In five easy and free steps, you can join the Free Hadoop revolution with Isilon at http://www.emc.com/campaign/isilon-hadoop/index.htm

  • Step #1: Request a free HDFS license key to Free Hadoop on Isilon
  • Step #2: Download a free community trial edition of Pivotal PHD or Cloudera CDH to sort of kick the tires on the power of Hadoop
  • Step #3: Download the free Hadoop Starter Kit (HSK) to get step by step instructions on how to deploy Hadoop to your existing Isilon and VMware infrastructure in about an hour
  • Step #4: Conduct a free TCO analysis of a Hadoop DAS architecture versus a Hadoop Isilon architecture and see why many customers are choosing Isilon for Enterprise ready Hadoop
  • Step #5: Enjoy the power of data and recruit others to join the Free Hadoop revolution!

So what are you waiting for?  Join the Free Hadoop revolution now and we’ll also send you this t-shirt to demonstrate unity as we together spread the Free Hadoop mantra across the globe.

Hadoop Blog - Pic 3

So, let the chanting begin in your data center, “Free Hadoop!  Free Hadoop!  Free Hadoop!  Free Hadoop!”

Bringing the Scale-Out Data Lake to Life

Suresh Sathyamurthy

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications

Today is an exciting day for us here at Isilon. It is one of our biggest launches helping bring the Industry’s first enterprise-grade Scale-out Data Lake to life. Here is a quick summary of what we are launching today to reinforce our Scale-Out Data Lake Strategy.

We are announcing two new Platforms, the S210 and the X410. The S210 is our high transactional platform ideal for workloads such as special effects & content rendering in Media & Entertainment, Real Time Ticker Analytics in Financial Services and Chip Design in EDA. The X410 will be our new high throughput platform ideal for workloads such as content streaming, Hadoop Analytics and home directories.

We are also announcing the new version of our operating system, OneFS. The new version will include features such as SmartFlash – 1 PB of globally coherent flash cache to help accelerate read performance. We are also announcing support for updates and new access methods to better enable our multi-protocol scale out Data Lake. We are announcing enhancements to HDFS including access zones enabling secure multi-tenancy for your big data analytics workloads and support for future versions. The second access method announcement is, SMB Multi-channel with 1.4GBPS single stream throughput for emerging media workloads such as 4K. And finally, we are natively integrating OpenStack SWIFT to make Isilon the first and only File & Object Scale-out NAS.

We also announced several innovative solutions partnering with EMC Federation including two new Big Data Analytics Solutions and a comprehensive Scale-Out VDI solution with XtremIO. Check out our blog here.

My team helped put together a series of videos that we call “Learning Bites” to help you better understand our Scale out Data Lake Strategy and how the new announcements reinforce the strategy. Check it out.

EMC Isilon – New Product Overview

EMC Isilon – Data Lake

EMC Isilon – New Hardware Platforms

EMC Isilon – SmartFlash Overview

EMC Isilon – SMB3 Multichannel

Building the Exploration & Production Data Lake

John Nishikawa

John Nishikawa

Director, Business Development & Alliances

When combined, oil and water seem to have only negative connotations associated with them.  In the kitchen, water and hot oil is a recipe for disaster, just as an oil spill in the ocean would be deemed as an environmental disaster.  Is there a positive outcome when oil is combined with water?  One comes to mind as long as that body of water is your Data Lake and that the oil is in the form of seismic interpretation and reservoir modeling data.

Data Lake 2

In talking to customers, more and more of them are being asked by their internal customers (namely Geoscientists), to provide and make available more pre and post-stack data online, accessible to analyze and interpret. A majority of this data is offline and not readily accessible because it is on tape.  We are talking 100’s to 1,000’s of terabytes of valuable data sitting idly on tape.  What if you could have all this data online via a Data Lake?

EMC has been an important part of infrastructure management in oil & gas for over 25-years in an industry where companies have been drilling for hydrocarbons even longer. That’s an enormous amount of potential data. Think of all the subsurface pattern-matching we could be using to accelerate discovery. Think of all the best practices for running operations efficiently and safely, and models for optimized logistics that could have been created and implemented much earlier if we were able to harness that data in a Data Lake. It’s true that 25 years ago we did not have the advancements in technology we enjoy today – the ability to use sensors to capture and analyze real-time data, computing power to store and crunch terabytes of information in a fraction, global communications and mobility to bring new levels of collaboration to drive business agility – but think about the next 5, 10 or 25 years. The trajectory of innovation possible from harnessing an affordable Data Lake now could be exponential. We could simulate large parts of oil & gas operations and make more economically, financially and environmentally sound decisions quicker before a well is even drilled. The results will never be perfect, but every order of magnitude of change we take away from imperfection, the better off we’ll be in the continued pursuit of energy.

I shall pass on providing my definition of a Data Lake.  I will leave that to the analysts and others.  Instead I will give you a few characteristics that I would want from my Data Lake.  I would want my Oil & Gas Data Lake to:

  • Scale to double-digit petabytes
  • Ease of scale and simple management
  • Support multi-protocols to provide broad data ingestion and analysis capabilities
  • Ability to do Hadoop analytics via HDFS
  • Deliver a strong TCO advantage

Our Isilon Exploration and Production customers are quickly realizing that their Isilon storage investment is in fact the foundation of their Data Lake and also realizing all these benefits and much more today.  If using Isilon, you may have a Data Lake already and not even know it!

EAGE

EMC Isilon will be showcasing our E&P solutions at the 76th European Association of Geoscientists and Engineers (EAGE) Conference & Exhibition in Amsterdam from June 16th – 19th.  Please stop by our booth at #3206 for presentations, demos, discussions with our staff of subject-matter-experts or any questions. We look forward to meeting you!

EMC WORLD 2014 – Hadoop Roundup

John Nishikawa

John Nishikawa

Director, Business Development & Alliances

I’m gonna knock you out!  Mama said knock you out! I’m gonna knock you out! blared over the speakers as Isilon entered the building at EMC World to capture the Enterprise HDFS Storage World Championship title belt.  The ring announcer’s introduction went like this, “From Seattle, Washington, weighing in at over 150 petabytes of HDFS storage capacity sold to date, Iiiisssilllonnn.  Let’s get ready to rumble!”

In the first round Isilon came out swinging with a total of 6 Hadoop breakout sessions attended by well over 1,000 people in total.  The breakout session punches ranged from Hadoop with Isilon introduction jabs to deep technical uppercut presentations. Check out a Hadoop session in progress below!

photo 1

The next couple of rounds Isilon toyed with its opponent in the Expo Hall, by having our Big Data Analytic partners speak about our joint solutions in the Isilon booth theater, our Hands on Lab showing how to easily deploy Hadoop with Isilon with our step-by-step Hadoop Starter Kit (HSK) instructions (download here), and our presence in the EMC Federation booth demo featuring a Big Data use case and architecture with Isilon.

The knockout punch came fast and sudden in the 4th round.  As part of Isilon President, Bill Richter’s keynote, he included a thought leadership panel on stage with two of our most strategic ‘corner men’ partners.  The panel included, Paul Maritz, CEO of Pivotal, and Tom Reilly, CEO of Cloudera.  Two recognized leaders in this space.  They discussed Big Data trends and their perspectives of when and how Hadoop will REDEFINE how Enterprises approach their data and the valuable insights it will provide the organization.  It was then the opponent’s knees buckled and a loud thump was heard as it hit the canvas.  “7, 8, 9, 10,” was heard and then the match ended.

Bill's Keynote

Data Lake- EMCW

The arena erupted with cheers as the ring announcer raised Isilon’s right arm and told the crowd of thousands, “And, the winner by knockout.  The still undisputed Enterprise HDFS Storage Champion of the World, Isilon!”

On a related note, a mass gathering of people congregated outside the arena.  Many had shirts and signs reading, “Free Hadoop” with an image of a yellow cartoon elephant behind jail bars (see below).  We can’t confirm, but it looked to be some type of movement.  A “Free Hadoop” movement?

Free Hadoop

More on this breaking news next time!

Musings on EMC World 2014

Robert Peglar

Robert Peglar

CTO Americas

EMC World 2014 was an outstanding event.  I don’t say that because I am an employee; rather, because during my four days there, the reaction from customers, partners and everyone else – including the virtual attendees – was consistent and strong.  The event was well worth their time and effort.  Hats off to all who worked so hard to make this event world-class, the hundreds of people ‘behind the scenes’ that you never see, the unsung heroes.  Well done.

OK – now, let’s get into some technical topics.  During his keynote at the show, Isilon President Bill Richter touched on a topic in front of many a CIO today – that of moving both technology and organizations/people towards third-platform operation and behavior.  The EMC portfolio is entirely geared towards this transformation, not only in product but services, consulting, training, and many other aspects.  But what of Isilon – rightly known as the leader in scale-out NAS?  How does Isilon help move an organization into third-platform operation and behavior?

In my view, this is best answered in two parts.  First, the product itself – OneFS – has been orienting itself towards third-platform operation for more than two years now.  OneFS supports both HDFS natively and also RESTful access to the filesystem (and command space as well, for operations).  These two OneFS behaviors cannot be over-emphasized.  They represent a break from second-platform protocol stacks such as SMB and NFS.  Used in parallel, HDFS and REST form a ‘bridge’ (as Bill Richter put it) into third-platform applications and behaviors.

For example, I can store a file on OneFS using this very simple syntax (the example is from MacOS)

curl -k -u “peglarr:<redacted>” -X “PUT /namespace//ifs/foo” -H “x-isi-ifs-target-type:container” -H “x-isi-ifs-access-control:0755″ https://$1:8080

This one line of curl, sending a PUT command, created a directory on OneFS called ‘foo’ in the directory /ifs with POSIX permissions of ‘rwx’ for myself and ‘rx’ for everyone else.

Then, I can store a file (foo.txt) with another simple line:

curl -k -u “peglarr:<redacted>” -X “PUT /namespace//ifs/foo/foo.txt” -H “x-isi-ifs-target-type:object” -T”foo.txt” https://$1:8080

That’s it.  Simple, straightforward, and I didn’t need a second-platform, client-server protocol session.  But if I need to use that file, foo.txt, in a second-platform application through SMB or NFS, it’s there in OneFS, in that directory, waiting for me to read it.

Better yet, I can read that file via HDFS, immediately (no copying needed) and use it as part of an analytic workflow.

The key here is the immediacy of data.  In third-platform, data not only has business value but time value as well.  If I am company X and can analyze the same data (say, today’s Twitter feed) faster than company Y, I gain competitive advantage.  Everyone knows about the time value of money; I believe in the time value of data.

Hopefully, you’ve enjoyed reading this.  It’s certainly not my first blog overall – but it is my first blog on the new isilonblog.emc.com site.  There will be many more over time.

Lastly, hats off to all the EMC Elect.  The peer-selected nature of this group distinguishes it from other recognition; I am honored to be a part of it, and thoroughly enjoyed my interactions at EMC World with many other Elect members.

Safe travels to all, and best wishes.  Have fun using third-platform tools and apps with Isilon.  Please let me know about your experiences with it – feel free to post blog comments, tweet me (@peglarr) or on other social media such as LinkedIn.

Hadoop roundup at EMCWorld 2014

Suresh Sathyamurthy

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications

At the EMC World 2014, EMC Isilon is re-defining Scale-Out storage for Big Data Hadoop Analytics. Continuing from my previous blog, here is a quick run-down on the momentum we have at EMC World.

Come check out

  • The Isilon Keynote with Bill Richter and an amazing panel of executives from the industry
  • Isilon Big Data Sessions with Ryan Peterson, Claudio Fahey and the experts from Pivotal and Cloudera
  • Ask the expert sessions to help you get started and address your questions
  • Hands-On Labs: Hadoop Starter Kit for the first time demonstrating a pre-configured solution
  • Meet the Isilon Big Data experts at the amazing Isilon Booth #123  

EMC Isilon Keynote with Bill Richter │Wednesday, May 7, 2:30 – 3:30pm, Palazzo L, Level 5

Join Bill Richter, EMC Isilon’s President, along with a Customer Panel and executive from Cloudera and Pivotal solving pressing storage challenges, and industry views.  Circle the date today for this ‘cannot miss’ presentation!   

Redefine Scaleout Image

Isilon: Hadoop Analytics + Isilon Scale-Out Storage | Tuesday 8:30 am, Wednesday 4:00 pm

Cover major benefits of using Isilon as a storage infrastructure for Hadoop.  We review unique feature sets, reduced footprint, cost advantages, and performance improvements.

Isilon: Hadoop Analytics on Isilon Deep Dive | Monday 1:30 pm, Wednesday 8:30 am

Take the covers off of Isilon’s Hadoop as a protocol implementation with the Isilon development team.  Deep experts explore advanced concepts, special considerations when separating storage from compute, and Q&A.

Isilon: Bringing the Power of Cloudera’s Enterprise Data Hub Edition to Isilon| Tuesday 3:00 pm, Wednesday 11:30 am

Learn how to take advantage of deep capabilities from the Hadoop ecosystem with enterprise-level support and management capabilities in this session. Also see how Isilon and Cloudera’s jointly supported solution can allow you to bring capabilities from the Hadoop ecosystem to your data without proliferating redundant infrastructure.

Launching for the first time Hands-on Lab| Hadoop Starter Kit 

Test drive a pre-configured environment encompassing Active Directory, VMware vCenter, ESXi and Big data extensions to deploy a virtualized Hadoop cluster quickly.

Hadoop Lab

Ask the expert session with Ryan Peterson (Twitter Q&A) | Tuesday 1:00 pm

Ryan Peterson takes stage to cover Open-Source HDFS vs Isilon OneFS as a Hadoop File System, advantages of In-place Analytics, Scale-out Data Lake based on Isilon and How you can get started!

Doesn’t this excite you? It certainly excites me as these amazing events unfold at EMCWorld 2014, starting next week! I am going to be there. Will you?

Operating at the convergence of Life Science and Healthcare

Sasha Paegle

Sasha Paegle

Sr. Business Development Manager, Life Sciences

Well it’s that time again – time to prepare for Bio-IT World Conference & Expo where our own CTO of Life Sciences, Sanjay Joshi will be hitting the stage.  It’s always a scramble to prepare for all the customer meetings, the exhibits, and (of course) the anticipation of eating fresh New England lobster!

Isilon has “grown up” in life sciences. And as we’ve grown, it’s been incredibly fulfilling to be a part of the advancements in the “must-haves” of life: life sciences, healthcare and technology.

Big Data and the convergence of life sciences and healthcare

Today’s advancements make it cost effective to have your personal genome sequenced at your local lab, and the relevant genomic and medical data sent to you and your doctor packaged in an email. On the healthcare side, it’s all about personalization and improving your health through diagnoses and treatment that fits your individual genetic profile.

At the heart (excuse the pun) of all this convergence is big data.  At Isilon we’re all about big data—including new ways of addressing the challenges of cataloging, analysis strategies, and security it brings along.

Continuing the Journey

In parallel with the amazing advances in technology, Isilon has cultivated and attracted a great mix of customers and industry partners. It’s what keeps us on the leading edge—helping the industry blaze new trails in life sciences and healthcare. A perfect example is the blending of Isilon’s native implementation of Hadoop, iRODS (Integrated Rule-Oriented Data System), and Intel’s life sciences-enabling technologies (see the white paper “Life Sciences at RENCI”).

During events like Bio-IT World, it’s gratifying to meet with Isilon users and see the ways in which our products have helped them solve their challenges and improved our lives. In turn we’re appreciative of their contribution to our growth and improvements in our products.  It’s a win-win!

Our continuing investments in life sciences and healthcare are not just on the product side. A meaningful solution requires an entire portfolio of technologies, partnerships, and collaboration venues.  For example, we’re a contributing member of the iRODs Consortium— an open-source data management software that abstracts data control away from storage devices and allows users to improve how they use their data. We’re also a member of the Global Alliance for Genomics and Health, and heavily involved in the security working group.

So you can see that we’re in this technology race for the long haul. We bring along dedication, knowledge, and ingenuity along to help our customers and industry partners keep apace of the new trends and technology.  Come see us at Booth 257 at Bio-IT World—oh, and get some lobster while you’re in Boston!

An Interview with Doug Cutting, the Founder of Hadoop

Ryan Peterson

Ryan Peterson

Chief Solutions Strategist

I recently got a chance to catch up with Doug Cutting, the founder of Hadoop, Nutch, Lucene, and various other open-source technologies.  Doug is currently the Chief Architect for Cloudera, the current leader in the Hadoop marketplace.  He has been spending his time supporting the growth of Big Data and concerns himself with the advancement of governmental regulations around the use of that data to help the world and not to be used in mischievous manners. Having been in meetings with various enterprise companies having to decide how they will implement Big Data technologies, I was surprised to hear Doug has an open mind to open-source in that he believes open-source and close-source technologies often marry quite well to solve challenges and create opportunities. As EMC Isilon adds capabilities to Hadoop by using OneFS as the underlying file system and connects using HDFS as an RPC over-the-wire protocol, questions naturally came up as to Doug’s perspective on this as a solution. 

Here is the transcript from that discussion:

Ryan: Can you tell us about the origins of the Hadoop Distributed File Systems (HDFS)?

Doug: It was modelled after Google’s GFS paper.  Mike Cafarella & I were working on Nutch, a scalable, open-source web search engine.  We knew that to scale to the billions of pages on the web we had to distribute computation to economical PCs.  We had a distributed crawler, web analyzer, indexer, and search system working.  This ran on five machines, but it was hard to operate even at that scale, involving a lot of manual steps.  When the GFS and MapReduce papers from Google were published we immediately saw their relevance to our work.

Algorithmically, MapReduce was nearly identical to what were already doing in Nutch.  But Google showed how these kinds of distributed computations could be automated, so they could scale farther than five machines, to tens, hundreds or even thousands, without requiring much manual operation.  GFS’s reliable, distributed storage was a critical sub-component of this, so we reproduced it in Nutch.  We called it NDFS at first and renamed it HDFS in 2006 when Hadoop split out of Nutch, separating the distributed computing parts from the web-search ones.

Ryan: Many people say that HDFS was purpose-built for Hadoop and that the replication count of three was required for performance reasons and not for protection purposes.  We have been able to prove comparable performance without the need for three copies using today’s technologies.  What is the real reason for the replication count?

Doug: Google suggested a replication count of three primarily for reliability, not performance.  The odds of losing data with a replication count of two are too high, while with three replicas they’re acceptable.

Ryan: Following up to that question, we hear that open-source HDFS has plans to follow suit with Isilon with respect to Erasure Coding [where cross-node parity is used for data protection] instead of 3X replication.  Why do you believe the community has decided to take that approach?

Doug: Google’s original GFS paper mentioned erasure coding as a possible optimization, so the idea’s not new.  Facebook implemented erasure coding for HDFS years ago, but their implementation never got merged back into the Apache version.

The rationale for erasure coding is simply to save more on storage costs.  Affordability is a big component of scalability.  If a system uses fewer drives per petabyte then folks can afford to store & analyze more petabytes.

Ryan: Isilon customers worry about the added cost to migrate their content from Isilon to a dedicated Hadoop environment.  What is your take on the architecture with Isilon and Hadoop working together with tasks such as ETL, archiving etc.?

Doug: Each case needs to be evaluated on its own merits.  I’m sure there are lots of cases where it makes sense to put a Hadoop cluster next to an Isilon cluster and use it to analyze the data in Isilon.

Ryan: The term Data Lake has become popular in the industry.  What is your opinion on the data lake strategy to big data storage?

Doug: We call it the Enterprise Data Hub.  A defining factor is that you can bring multiple workloads to a shared dataset repository, providing a wide variety of tools for both exploration and production uses. Instead of designing and building solutions for each data problem, you build a general-purpose data storage and processing facility where your solutions can develop and evolve.

Ryan: You made a comment to me in the past that I’d like to dig deeper into (and I am paraphrasing): “there is no one specific Hadoop stack that must be used as long as it solves the problem a customer has to solve”.  Can you elaborate on what you mean by that?

Doug: I will say that not only does the Hadoop ecosystem support evolving, exploratory, agile applications, but the ecosystem itself is designed to evolve.  It’s built on a loose confederation of open-source projects, which is a key strength.  If some component is superseded by a superior technology, there’s no single organization that can stop its replacement for its own interest at the expense of the ecosystem.  This may sound like the crazy wild-west.  That’s where vendors come in.  A vendor will commit to long-term support of components so that each production system need not evolve at the pace of the ecosystem.

Ryan: Where do you see Hadoop 5-years from now and 10-years from now?

Doug: In five years it will be equal to the RDBMs in adoption.  In ten years it will have eclipsed the RDBMs.  It will be the center of every major enterprise’s data system.

Ryan: In your opinion, what is needed to see accelerated rapid adoption of Hadoop in the Enterprise?

Doug: Attention to detail.  We need security facilities that make adoption easy in each industry, each with their different compliance needs.  We need industry-specific applications and tools.  We need broader familiarity with the technology stack.  The Hadoop stack is still young relative to most enterprise technologies.  But it’s growing fast and we’re seeing it meet the needs of more and more applications each quarter.

Ryan: Knowing what you know now versus 2005, would you have done anything differently having some of today’s technologies at your disposal?

Doug: If I had had today’s technologies then it wouldn’t have been 2005!  I was attracted to the technologies that Google described because they were clearly useful and there was nothing similar available to developers outside of Google.  I knew that open-source would be a great way to make these tools widely available, so put 2 and 2 together and started building Hadoop.  If I saw another opportunity, where there was a broadly applicable technology that wasn’t generally available, then I might do the same thing again.  Or I might now leave it to younger, more energetic folks next time around!

Ryan: So Doug, most people know already that the name Hadoop comes from the name of your son’s stuffed Elephant. Do you still have the elephant? Do you think it will end up in the Smithsonian someday?

Doug: Smithsonian?  Wow!  I don’t think he’s that famous.  Computer History Museum, perhaps! I still have him.  He lives in my drawer now.

Ryan: Doug, thanks for taking the time to meet with me and discuss Hadoop!  I look forward to speaking to you more often about the industry as it grows and matures.

To all you readers out there, if you are interested in learning more about the budding relationship between Cloudera and EMC Isilon, I invite you to join us for the keynote at EMC World.

EMC Isilon Building Blocks – Isilon as the Foundation of Hadoop Analytics

Suresh Sathyamurthy

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications

In my previous blog, I referred to the EMC Digital Universe Study which reveals that the digital universe will double in size every two years opening new opportunities. In 2013, only 5% of this was valuable rich data. This is expected to double by 2020 with the emergence of new Big Data Analytics technologies as well as new data sources. The digital universe study has a section on High Value of Data which is about finding the prize fish in the data lake referring to it as the key to success in the era of the third platform. According to Harvard Business Review, companies in the top third of their industry in the use of data driven decision making, are, on average, 5% more productive and 6% more profitable than their competitors.

In this blog, I will talk about how EMC Isilon, by being the foundation of your Hadoop analytics, enables you to be more productive and profitable than the peers in your industry.

Digital universe - Image 2

Let me start with a quick introduction to the basics – What is Hadoop? What is HDFS? And then dwell into how Isilon leverages this framework and protocol to meet your analytics needs.

Hadoop, an open-source framework from Apache, enables parallel batch processing of very large data sets to extract information that can drive business benefits. Hadoop is built on top of the Hadoop Distributed File System (HDFS) providing data protection and locality with multiple mirrors (usually 3 times); running a brand new protocol different from those used by traditional applications. Hadoop enables massive scalability using shared nothing architecture and is built using the architecture depicted below.

Hadoop Blog Image

You have a name node and a back-up name node- in case the name node fails; and data nodes combining compute and direct attached storage (DAS). Although this architecture scales well, an infrastructure that talks a completely new protocol and with limited data protection and compliance abilities, adds risks and costs to enterprises.

Isilon is the first and only scale-out NAS platform to support HDFS natively along with NFS, SMB, HTTP and FTP- file systems protocols supported by traditional enterprise systems. This enables you to build your big data strategy starting with the right storage foundation. A big part of “Big Data” is “DATA”. Capture all the unstructured data that you think is valuable from various sources and then consolidate into one place in a multi-protocol data lake where you store and secure it; and then run analytics in-place using Hadoop compute that you can build as necessary. This strategy enables you to reduce the risks associated with IT projects decoupling compute from storage. You can run simultaneous pilots from various vendors starting with Pivotal, and get to insights faster by running in-place analytics, where you don’t move data around.   

IDC classifies big data as greater than 100TB, which on a 10 Gbps network would take over 24 hours to move. With Isilon such huge movement of data is not necessary as your analytics happen in-place within Isilon’s scale-out data lake. Check out our white paper on In-place Hadoop data analytics here. With Isilon you stay compliant with privacy and regulatory compliance such as SEC 17-a4 and you do not have to worry about hardware failures. You can also scale up-to 20 Petabytes in a single cluster which means lower management overhead.

This strategy is not limited to Hadoop and can be successfully implemented for your other big data analytics implementations like Splunk. Are you interested in learning more? I will be posting another blog this week on the Isilon-Hadoop Momentum at EMC World giving you the details of all sessions, events and activities around Isilon and Hadoop. Don’t forget to register for EMC World. I will see you there!