Wal-Mart probably knows more about you than your doctor…

James Han

James Han

Sr. Business Development Manager-Healthcare

When you walk into a Wal-Mart, they likely know more about you than your local hospital. They know when and what you’ve purchased, your income, your family members, your political affiliation, and probably even the habitual route you take while walking through the store.  Like many other companies, Wal-Mart mines tons of big data to improve their marketing campaigns, sell more, and generally improve their bottom line.

Healthcare is behind in employing big data analytics tools. For example, when you go to the hospital or clinic, it is often treated as a single visit—you may even have to update all of your demographic information each time.  Your data is generally only important during your visit and is often archived immediately after your visit—essentially making it inaccessible for subsequent visits. What if healthcare could employ big data analytics to the level of commercial enterprises like Wal-Mart?

Let’s look at some statistics related to healthcare spending.  A 2012 report (of 2009 data) from the National Institute for Health Care Management (NIHCM) reveals that spending for healthcare services is extremely uneven—a small proportion of the population is responsible for a very high portion of spending. The study finds that the top 5% of spenders account for almost half of spending ($623 billion), and the top 1% of spenders account for over 20% of spending ($275 billion)[1] (See Figure).


It wouldn’t take much improvement in efficiency when dealing with that 1% of the population to make a substantial payoff. If trends could be identified, or procedures developed that would lower costs for those few utilizers to keep them healthier and lower their consumption the impact can be dramatic.

Unfortunately, many healthcare providers are still trying to figure out what data they need to perform the equivalent of Wal-Mart’s analytics. Or they have the data, but can’t figure out how to get it all in one place.

EMC Isilon can help. Isilon is in the business of big data—making big data analytics more cost-effective and—perhaps most importantly with respect to healthcare—easier to implement. Isilon provides the foundation for a scale-out data lake—a key capability that provides simplicity, agility, and efficiency to store and manage unstructured data. Starting with a scale-out data lake, healthcare organizations can:

  • Invest in the infrastructure they need today to get started today,
  • Realize the value of their data, store, process, and analyze it—in the most cost effective manner, and
  • Grow capabilities as needs grow in the future.

In short, EMC Isilon can help healthcare organizations get on the road to leveraging their data to improve patient comfort, lower costs, and streamline healthcare procedures.



Source: [1]“The Concentration of Healthcare Spending: NIHCM Foundation Data Brief July 2012” http://www.nihcm.org/component/content/article/326-publications-health-care-spending/679-the-concentration-of-health-care-spending

Converged Infrastructure for Big Data Storage and Analytics

Michael Noble

Michael Noble

Sr. Product Marketing Manager

It’s no secret that unstructured data is growing rapidly and poses significant challenges to organizations across virtually every industry segment to store, manage, secure and protect their data. According to IDC, the total amount of data storage world-wide will reach 133 exabytes by the year 2017, of which 80 percent will be required for unstructured data.

Unstructured Data Growth

To meet this challenge, VCE has just introduced a compelling new converged infrastructure (CI) platform – VCE™ Technology Extension for EMC® Isilon® – for organizations looking to consolidate and modernize their big data environment. With this technology extension, existing VCE Vblock® Systems can leverage Isilon’s massive scalability and built-in multi-protocol data access capabilities to easily expand capacity to address large-scale data storage needs while supporting a wide range of applications and traditional and next-gen workloads including Hadoop data analytics.

Isilon Scale Out Data Lake

With VCE technology extension for EMC Isilon as a foundational element of a scale-out data lake infrastructure, organizations can eliminate costly storage silos, streamline management, increase data protection and gain more value from their big data assets. A great example of this is in the area of big data analytics and Hadoop. As the first and only scale-out NAS platform that natively integrates with the Hadoop Distributed File System (HDFS), Isilon is a game-changing big data storage and analytics platform.

Before Isilon came into the picture, Hadoop deployments have largely been implemented on a dedicated infrastructure, not integrated with any other applications, and based on direct attached storage (DAS) that is typically mirrored up to three times or more. In addition to requiring a separate capital investment and added management resources, this approach poses a number of other inefficiencies. By leveraging Isilon’s native HDFS support and in-place analytics capabilities, organizations can avoid capital expenditures and related risks and costs associated with a separate, dedicated Hadoop infrastructure by extending the Vblock System-based environments.

Isilon’s in-place analytics approach also eliminates the time and resources required to replicate big data into a separate infrastructure. For example, it can take over 24 hours to copy 100 TB of data over a 10GE line. Instead, with VCE technology extension for EMC Isilon, data analytics projects can be initiated immediately to get results in a matter of minutes. And when data changes, analytics jobs can quickly be re-run with no need to re-ingest updated data.

VCE technology extension for EMC Isilon also leverages Isilon’s ability to support multiple instances of Apache Hadoop distributions from different vendors simultaneously including Pivotal HD, Cloudera Enterprise and Hortonworks Data Platform. These same data sets on EMC Isilon can be extended to other analytics such as SAS and Splunk. This means that organizations gain the flexibility to use whichever tools they need for their analytics projects.

Along with these powerful big data storage and analytics capabilities, VCE technology extension for EMC Isilon brings the convenience and assurance of proven VCE engineering expertise. VCE’s technology extensions are tightly integrated and fully tested and validated. This allows organizations to quickly increase processing power or add storage capacity, without typical technology risks.

All-in-all, VCE technology extension for EMC Isilon does indeed look like a compelling approach to help tame the big data storage challenge and unlock the value of big data assets. Please let us know what you think.


Converged Infrastructure for Scale-out Data Lakes

Carl Washington

Carl Washington

Sr. Business Development Manager

For many organizations today, the rapid growth of unstructured data has put a spotlight on the challenges of a decentralized IT infrastructure burdened with data storage “silos” across the enterprise. These limitations include:

  • Complex management of multiple silo data sets
  • Inefficient storage utilization
  • Storage “hot spots” and related performance issues
  • Data protection and security risks
  • Inability to support emerging workloads

As many enterprises have experienced, converged infrastructure (CI) systems are a great way to easily eliminate silos, consolidate and simplify IT environments while addressing increasing demands on IT. Additionally, with the rapid growth of unstructured data, many organizations are attracted to a CI infrastructure strategy to implement a scale-out data lake. As used here, “data lake” is a large reservoir of unstructured and semi-structured data consolidated from different traditional and next generation application workload sources.  These next generation applications, including mobile computing, social media, cloud computing, and big data are fueling the enormous growth of this unstructured data.

Foundation for Scale-Out Data Lake

The storage infrastructure to support these next gen applications and scale-out data lakes must be highly scalable both from a capacity standpoint as well as performance. It must also have the operational flexibility to support a wide range of applications and workloads. One of the other great advantages of a data lake is the potential to leverage powerful analytics like Hadoop to gain new insight and uncover new opportunities by harnessing big data assets. To address these needs, VCE has just announced an exciting new product, VCE™ technology extension for EMC® Isilon that allows organizations to quickly implement an efficient converged infrastructure for scale-out data lakes.     

Designed to be deployed as a pooled resource with Vblock® Systems, VCE technology extension for EMC Isilon enables a scale-out data lake approach to the modern datacenter. This provides enterprises and service providers an infrastructure to consolidate unstructured data, support traditional workloads as well as next-gen applications, and enable in-place big data analytics using Isilon’s native HDFS support. These capabilities help organizations to reduce costs, simplify management, gain new efficiencies and accelerate time to insight while avoiding the need for separate infrastructure investments. Organizations that already rely on VCE to run their mission-critical applications and manage data can easily augment their existing environments with this new offering, VCE technology extension for EMC Isilon.

Isilon Scale Out Data Lake_1

The EMC Isilon Scale-out Data Lake can collect unstructured data from multiple workloads such as HPC, Mobile Home Directories, Video Surveillance, Large Scale Archives, File Shares, and more. A reservoir of unstructured data from different sources can become immediately available to have analytics performed against it.

What’s more, the source data can be written to Isilon using one protocol and then accessed using a different protocol. Isilon’s support for multiple protocols, such as NFS, SMB, HDFS, HMTL, etc. is a compelling feature that provides enormous flexibility and agility for next generation of applications. Given these points, Isilon’s ability to consolidate data from multiple sources and run in-place analytics strengthens the advantages provided by VCE Vblock Systems while extending its applicable use cases and accelerating adoption for next generation of applications.



In sum, VCE Technology Extension for Isilon allows enterprises to implement a scale-out data lake infrastructure that provides a number of advantages including:

  • Lower cost and increased efficiency
  • Simplified management
  • Increased operational flexibility
  • Faster time to service and market
  • Robust security and data protection

Another Important Advantage: The VCE Experience

By extending the full VCE Experience – engineered, built, validated, shipped and supported by VCE — to EMC Isilon, VCE is delivering virtualized cloud infrastructure systems optimized for traditional and next generation application workloads with scale-out data lakes. 

In addition to the technology and infrastructure capabilities, the VCE Experience helps IT executives drive agility into their operations and enable IT as-a-service, which makes IT more responsive to new business applications while shortening the time to provision new infrastructure.

VCE Technology Extension for Isilon offers a great value for organizations looking to implement an IT infrastructure for their own scale-out data lakes. Please let us know what you think.

An interview with Innovator, Hugh Williams

Ryan Peterson

Ryan Peterson

Chief Solutions Strategist

I recently got a chance to sit down with Hugh Williams, the SVP of R&D for Pivotal. Hugh was previously at eBay and Microsoft and comes with an impressive background with respect to Big Data technologies that includes industry patents and numerous publications.  Here is a transcript of our discussion:

Ryan: Hi Hugh, thanks for taking the time to sit down with me.  Getting straight to the questions I have for you:  How would you define Data Lake?

Hugh: Great question, the basic premise of a Data Lake is that you have one place where all of your data is stored, and it allows everyone in an organization to extract value from the data using the tools they want to use.  Without a data lake, data is usually silo’d in legacy systems where only a couple of applications or a subgroup of users have access to the data.

Ryan: What would you consider to be the most important attributes of a data lake?

Hugh: Having all of the data in one place.  Of course, you need the right tools to be able to accomplish that – ingestion and connection to existing sources is still more challenging than it should be.

Ryan: How do customers build data lakes?

Hugh: Most companies start out a data lake with a set of folks who build out a small Hadoop capability, they demonstrate that capability, the noise gets louder, and the company says that rather than having all of these solutions throughout the organization, let’s look at collecting all of that into one place.

Ryan: I call those Data Puddles!  What have you seen inhibit adoption?

Hugh: I think a few things come to mind: Ingestion and egestion is problematic.  How am I going to get all of that data from various places into the central place?  The second thing is that the Hadoop ecosystem is relatively immature.  Although an impressive toolbox, there is still a barrier on setting up the infrastructure, the standing up, the training, getting all the right pieces.  The last thing I’ll say is using Hadoop to extract business value is not easy.  You have to employ Data Science folks.  Pivotal is making SQL much more mature on Hadoop to help solve this issue.

Ryan: What interests you about the Isilon partnership with Pivotal?

Hugh: Hadoop will rule the world, but its maturity is a problem today.   Isilon is mature and companies bet their businesses on it.  If you want one thing to be reliable, it has to be the storage – and so the partnership between Pivotal and Isilon really matters

Ryan: Customers often lump HAWQ with Stinger, Impala, and even Hive.  How do you differentiate HAWQ from other SQL solutions?

Hugh: HIVE is a relatively elementary implementation of SQL access to Hadoop with basic features of SQL.  It was revolutionary when it happened, but it doesn’t have what a Data Scientist would need.  Impala is a nice step forward from HIVE.  The really interesting thing about HAWQ is that we took 10+ years of experience with SQL from the Data Warehouse space and ported that to work with Hadoop.  What you get with HAWQ is GreenPlum database heritage adapted to Hadoop.  Pivotal has the most advanced solution for SQL access to Hadoop.

Ryan: Can you provide an example of something you can do with HAWQ that cannot be done with the others?

Hugh: There are benchmarks such as TPC-DS that help validate whether various typical SQL queries can be evaluated and optimized on different systems.  In rough terms, when we used TPC-DS to test SQL compliance, 100% of queries are successful with HAWQ, only 30% with Impala, and around 20% for HIVE. We published an independently peer reviewed study that shows these results in this year’s SIGMOD, the leading database conference

Ryan: You recently announced GemXD, a new product in the GemFire family.  What is an example of a problem that GemXD solves?

Hugh: You can think of it as Cassandra or HBase done really, really well – with a SQL interface, full ACID capabilities, the ability to upgrade with no downtime, the ability to read and write to Hadoop’s HDFS storage layer when there’s too much data for memory, and much more.

Ryan: What’s your favorite “Big Data changed the world” story?

Hugh: Here’s a fun story. When I was at Microsoft, I decided to research what drugs caused stomach cramps by looking at what queries customers ran in sessions on Microsoft’s search engine. I reverse engineered a list of drugs that caused stomach cramps, and checked the FDA literature – and, sure enough, it was right.

Ryan: How does Cloud Foundry fit into the Big Data / Hadoop storyline?

Hugh: Today they’re somewhat separate stories, but they won’t be for long.  It’s of critical importance to the future of PaaS and the future of Big Data that they converge.  In the future, most applications will be data-centric, and great companies will be built on those applications. Developers are demanding the convergence. PaaS and Big Data exist within Pivotal to build the future platform for software.

Video Surveillance: A Tale of Two Markets

Christopher Chute

Christopher Chute

Vice President, Global Imaging Practice, IDC
Christopher Chute is a Vice President with IDC's Global Imaging Practice. His research portfolio focuses on transformative technology trends impacting the imaging market. Mr. Chute's specific coverage includes digital imaging and video technology adoption across professional, broadcast/creative, consumer and physical security/surveillance. He conducts forecasting, product and service analysis and user segmentation for these vertical markets through a variety of supply and demand-side studies and consulting engagements. Mr. Chute's research has often centered around charting the disruption caused by technology transformations on status-quo industries, such as the migration from film to digital in the photography market, commoditization/democratization of broadcast/cinema video capture and photofinishing, and the impact of cloud services and tablet usage on the imaging industry. Writers from a variety of publications rely on Mr. Chute for a deep understanding of these markets, including Time Magazine, The Wall Street Journal, Fortune Magazine, USA Today, Investor’s Business Daily, San Jose Mercury News, Bloomberg, and The Financial Times. His television and radio appearances include ABC World News Tonight, Fox & Friends, CNBC, National Public Radio, and Into Tomorrow With Dave Graveline. Mr. Chute also speaks regularly at a variety of international trade shows and technical industry group meetings, including ASIS, CES, Creative Storage, Computex and Photokina. Mr. Chute holds both undergraduate and MBA degrees from Boston College.

As a longtime video surveillance industry observer, I remember when it was fully reliant on analog technology to secure people, places and things. However, it seems the industry is still reliant on similar solutions, whether they are analog cameras capturing video that’s digitized and stored on reused tape, or security personnel who act as both preventative and post-event resources. It became clear to me that the market was bifurcating into two types of deployments: large, fully-IP installations that are fully IT-led, and analog-prone, fragmented, traditional eyeballs-to-screens installations.

IP surveillance is often thought of in terms of image quality, megapixels and other visually related terms, yet the IT-led side of the market has centered itself more on workloads, specifically a combination of video content and analytics that are woven into a broader physical security initiative. These deployments aim to be far more preventative than forensic. IT-led, multi-dimensional installations like these tend to be physically large and extend across several facilities across long distances. This requires both fixed and mobile surveillance cameras and other sensors connected to local edge storage sources that communicate with a core system: cities, transportation, government facilities, and large public complexes. The resultant workloads are collated into a large data set that undergoes extensive analytics processes at a centralized command facility. This workflow is modeled on an enterprise datacenter rather than a security room.

In many ways, workloads and analytics now define physical security more than cops and cameras.

Thus, IT storage leader EMC’s announcements at ASIS 2014 are timely in addressing market growing pains. For instance, to complement its enterprise-class core EMC Isilon storage platform, EMC is now offering EMC VNX-VSS100 (built on proven VNX technology), a purpose-built storage system that can act in an edge capacity with EMC Isilon, or serve as a cost-effective, scalable hub for smaller network-based security installations. The company is offering lab-level validation for Video Management Software (VMS) Providers  partners like Genetec, Verint, and Milestone, which will allow system integrators to deploy solutions more quickly. EMC is also creating greater partner enablement through training resource sponsorship and partner investment in high-growth countries.

IDC’s Digital Universe Project forecasts that video surveillance workloads will grow an average of 22% by 2020. And while system integrators have been successful partnering with a wide range of surveillance hardware, VMS, storage and analytics vendors, what’s been lacking is a strong, experienced third-party IT leader like EMC that can create a foundation for surveillance-specific vendors and integrators to work and partner with – while keeping pace with surveillance trends.

From “Edge-to-Core”: Redefining Video Surveillance

Suresh Sathyamurthy

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications

You’re probably familiar with the terms “Edge” and “Core” as they apply to a networking infrastructure—but video surveillance?  It turns out that the edge and core terms are a perfect description of today’s larger scale IP-based surveillance architectures. Out on the edge are the cameras themselves, as well as local, lighter-weight processing and storage.  At the core is the majority of the capacity (PBs), archiving capabilities, and, of increasing importance, analytics.

VSS Blog- Image 1

Organizations adopting edge-to-core architectures are asking for their surveillance technology to be scalable, flexible, and open and future-proof.

In my last blog (see “Bringing the Scale-out Data Lake to Life,” July 2014) I talked about the significance of the new capabilities we launched in July. These new capabilities bring the scale-out data lake model to surveillance, and directly hit the needs of edge-to-core video surveillance requirements.

VSS Blog- Image 2

That’s why the Isilon scale-out data lake storage model is so relevant to edge-to-core video surveillance.  For example, when surveillance video moves from edge cameras into the core data lake it can be actively used and leveraged simultaneously by multiple applications (e.g. Hadoop can be used to deliver valuable intelligence). At the same time, surveillance data can be securely accessed using Syncplicity on mobile devices, or used within cloud-based applications.

Today, over one million surveillance cameras capture their image data on Isilon and Isilon protects and manages this image data on over 160 PBs of capacity.  With the Isilon scale-out data lake at the core, we’re able to satisfy the current and future needs of video surveillance in industries such as Transportation, Federal and Local Government, enabling them to become more vigilant in their efforts to protect both people and property.

Check out this video where I provide an overview of the Video Surveillance market and deep dive in to EMC Isilon’s Core solutions that address customer needs.

Please let me know about your experiences with these solutions – feel free to post blog comments, tweet me (@sureshcs) or on other social media such as LinkedIn.

The Media Industry is changing, so stay tuned!

Jeff Grassinger

Jeff Grassinger

Sales Alliance Manager

The IBC trade show (12-16 Sep) is just around the corner!  At this year’s show we will see consumer technology trends driving rapid changes in the media industry.  One of the key trends is multiplatform content delivery in our non-stop “TV Everywhere” world. This one trend alone will have dramatic effects throughout the media industry, as it will necessitate significant transformation in the workflows in use today.

According to eMarketer, the average time spent with digital media per day will surpass TV viewing time for the first time this year. In addition, Ooyala’s Global Video Index reports that mobile and tablet viewing increased 133% year over year between Q1 2013 and the end of Q1 2014. So digital media consumption is not only growing fast, but specifically video on mobile devices is a significant area of dynamic growth.

As with any change that comes to the industry, media professionals are asking, “what workflow changes do I need to make to engage this new audience and how does that change the way we go to business?”

In the short term, it’s about leveraging the workflows you use today.  In the long term, success depends on adapting your workflows to new technology. Read on to find out how.

What are the goals and keys to success?

Workflows in TV Everywhere applications allow users to access their favorite content on their favorite devices, and discover new interests by searching for content or accepting recommendations. Your TV Everywhere goal is to create “stickiness” or engagement that encourages consumers to stay longer to drive your bottom line advertising and/or subscription revenue.

However unlike traditional broadcast workflows, TV Everywhere requires an array of codecs, streaming formats, digital platforms, and new and vastly different workflow architectures.

  • One of the most important goals when planning and building for the TV Everywhere future is workflow agility. Codecs, streaming formats and devices will change and evolve at a much more rapid pace than broadcast technology. Anticipating TV Everywhere technology, the foundation of your workflow must be agile enough to meet the requirements of today, while also being able to support new workflows with ease. Media organizations with legacy technology based on proprietary infrastructure, proprietary protocols, and inflexible technology will find themselves at a distinct disadvantage to their competitors.
  • Another important goal is to create simplified and consolidated file-based workflows based on a data center infrastructure. The technologies you choose need to work together simply, with a focus on an exponential reduction in operational costs. File-based workflows will significantly reduce the need to manually create and deliver media, maximizing topline growth and bottom line results. Data center technologies such as networking, virtualization, software-defined architectures, data analytics, and cloud solutions will increasingly be part of tomorrow’s delivery of TV Everywhere.

There is no arguing that the media industry is in transformation. While the living room TV screen continues to be the screen of choice for many viewers, it is clear that smaller screens will continue to see dramatic increases in video traffic. Seizing the opportunity to engage and entertain this new viewership, media organizations need to focus on technologies that enable business agility and data center file-based workflows.

Have you started your transition to agile, simplified and consolidated data center file-based media workflows? How has this industry transformation influenced your visit to this year’s IBC trade show?

Stop by the EMC Isilon booth #7.H10 to see our new storage software, platforms, and media solutions. I look forward to talking to you, and showing you how EMC Isilon can help you incorporate new technologies, design adaptable workflows, and improve your bottom line.

Free Hadoop….pun intended!

John Nishikawa

John Nishikawa

Director, Business Development & Alliances

free (adjective, verb, adverb) – to make free; release from imprisonment; to unlock your data

free (adjective, verb, adverb) – no charge; Isilon HDFS license key

Isilon is the #1 Enterprise Shared Storage Provider for Hadoop. We have more customers and more capacity in our storage infrastructure used for Hadoop than any other enterprise shared storage provider.  Are you looking to get more business insight out of your data to drive innovation, provide competitive advantage, improve customer satisfaction, accelerate time-to-market, or even in some cases – save actual lives?  If so, the power of your data is sitting right there in your Isilon cluster.  All you need to do is free Hadoop and bring it to your data.  Join the Free Hadoop revolution!  Here’s how.

Hadoop blog- Pic 2

In five easy and free steps, you can join the Free Hadoop revolution with Isilon at http://www.emc.com/campaign/isilon-hadoop/index.htm

  • Step #1: Request a free HDFS license key to Free Hadoop on Isilon
  • Step #2: Download a free community trial edition of Pivotal PHD or Cloudera CDH to sort of kick the tires on the power of Hadoop
  • Step #3: Download the free Hadoop Starter Kit (HSK) to get step by step instructions on how to deploy Hadoop to your existing Isilon and VMware infrastructure in about an hour
  • Step #4: Conduct a free TCO analysis of a Hadoop DAS architecture versus a Hadoop Isilon architecture and see why many customers are choosing Isilon for Enterprise ready Hadoop
  • Step #5: Enjoy the power of data and recruit others to join the Free Hadoop revolution!

So what are you waiting for?  Join the Free Hadoop revolution now and we’ll also send you this t-shirt to demonstrate unity as we together spread the Free Hadoop mantra across the globe.

Hadoop Blog - Pic 3

So, let the chanting begin in your data center, “Free Hadoop!  Free Hadoop!  Free Hadoop!  Free Hadoop!”

Bringing the Scale-Out Data Lake to Life

Suresh Sathyamurthy

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications

Today is an exciting day for us here at Isilon. It is one of our biggest launches helping bring the Industry’s first enterprise-grade Scale-out Data Lake to life. Here is a quick summary of what we are launching today to reinforce our Scale-Out Data Lake Strategy.

We are announcing two new Platforms, the S210 and the X410. The S210 is our high transactional platform ideal for workloads such as special effects & content rendering in Media & Entertainment, Real Time Ticker Analytics in Financial Services and Chip Design in EDA. The X410 will be our new high throughput platform ideal for workloads such as content streaming, Hadoop Analytics and home directories.

We are also announcing the new version of our operating system, OneFS. The new version will include features such as SmartFlash – 1 PB of globally coherent flash cache to help accelerate read performance. We are also announcing support for updates and new access methods to better enable our multi-protocol scale out Data Lake. We are announcing enhancements to HDFS including access zones enabling secure multi-tenancy for your big data analytics workloads and support for future versions. The second access method announcement is, SMB Multi-channel with 1.4GBPS single stream throughput for emerging media workloads such as 4K. And finally, we are natively integrating OpenStack SWIFT to make Isilon the first and only File & Object Scale-out NAS.

We also announced several innovative solutions partnering with EMC Federation including two new Big Data Analytics Solutions and a comprehensive Scale-Out VDI solution with XtremIO. Check out our blog here.

My team helped put together a series of videos that we call “Learning Bites” to help you better understand our Scale out Data Lake Strategy and how the new announcements reinforce the strategy. Check it out.

EMC Isilon – New Product Overview

EMC Isilon – Data Lake

EMC Isilon – New Hardware Platforms

EMC Isilon – SmartFlash Overview

EMC Isilon – SMB3 Multichannel

Building the Exploration & Production Data Lake

John Nishikawa

John Nishikawa

Director, Business Development & Alliances

When combined, oil and water seem to have only negative connotations associated with them.  In the kitchen, water and hot oil is a recipe for disaster, just as an oil spill in the ocean would be deemed as an environmental disaster.  Is there a positive outcome when oil is combined with water?  One comes to mind as long as that body of water is your Data Lake and that the oil is in the form of seismic interpretation and reservoir modeling data.

Data Lake 2

In talking to customers, more and more of them are being asked by their internal customers (namely Geoscientists), to provide and make available more pre and post-stack data online, accessible to analyze and interpret. A majority of this data is offline and not readily accessible because it is on tape.  We are talking 100’s to 1,000’s of terabytes of valuable data sitting idly on tape.  What if you could have all this data online via a Data Lake?

EMC has been an important part of infrastructure management in oil & gas for over 25-years in an industry where companies have been drilling for hydrocarbons even longer. That’s an enormous amount of potential data. Think of all the subsurface pattern-matching we could be using to accelerate discovery. Think of all the best practices for running operations efficiently and safely, and models for optimized logistics that could have been created and implemented much earlier if we were able to harness that data in a Data Lake. It’s true that 25 years ago we did not have the advancements in technology we enjoy today – the ability to use sensors to capture and analyze real-time data, computing power to store and crunch terabytes of information in a fraction, global communications and mobility to bring new levels of collaboration to drive business agility – but think about the next 5, 10 or 25 years. The trajectory of innovation possible from harnessing an affordable Data Lake now could be exponential. We could simulate large parts of oil & gas operations and make more economically, financially and environmentally sound decisions quicker before a well is even drilled. The results will never be perfect, but every order of magnitude of change we take away from imperfection, the better off we’ll be in the continued pursuit of energy.

I shall pass on providing my definition of a Data Lake.  I will leave that to the analysts and others.  Instead I will give you a few characteristics that I would want from my Data Lake.  I would want my Oil & Gas Data Lake to:

  • Scale to double-digit petabytes
  • Ease of scale and simple management
  • Support multi-protocols to provide broad data ingestion and analysis capabilities
  • Ability to do Hadoop analytics via HDFS
  • Deliver a strong TCO advantage

Our Isilon Exploration and Production customers are quickly realizing that their Isilon storage investment is in fact the foundation of their Data Lake and also realizing all these benefits and much more today.  If using Isilon, you may have a Data Lake already and not even know it!


EMC Isilon will be showcasing our E&P solutions at the 76th European Association of Geoscientists and Engineers (EAGE) Conference & Exhibition in Amsterdam from June 16th – 19th.  Please stop by our booth at #3206 for presentations, demos, discussions with our staff of subject-matter-experts or any questions. We look forward to meeting you!