Third Platform Applications in a Mixed Workload Environment

Carl Washington

Carl Washington

Sr. Business Development Manager

The implications for what IDC calls the “3rd platform” of computing — social, mobile, cloud and big data — are clear and compelling:  enterprises and service providers must advance to the next generation of applications or expect disruption from new entrants and reinvented incumbents. But making this transition can be easier said than done. Many organizations are discovering that providing an IT infrastructure to enable the transformation to the 3rd platform can be complex, time consuming, and expensive.

A recent IDC Technology Spotlight report, The Importance of a Modular and Flexible Approach to Integrated Infrastructure[1], sheds light on one approach that can be used to overcome these challenges. This paper examines the benefits of using converged infrastructure – an information technology deployment strategy that combines server, storage, network, virtualization technologies and the software to manage the overall system as one block of infrastructure – to support 3rd platform applications.  The benefits of this approach are described and include “optimize operations, increase risk and compliance posture, reduce cost, and innovate effectively from increased use of 3rd platform applications.”

Essential elements for IT infrastructure to effectively support 3rd platform applications include how well it stores, provides access to, and manages the massive amount of unstructured data generated from these applications.  Simplified deployment and operations of infrastructure to support 3rd Platform applications is also an important factor that is mentioned in another recent blog, Advancing 3rd Platform Strategies on Modern Infrastructures in 2015.

The IDC report features VCE Vblock Systems and several of its technology components, including VCE™ technology extension for EMC® Isilon® storage. The report provides a fascinating view of how VCE technology extension for EMC Isilon storage embodies the capabilities to satisfy the storage requirements for 3rd platform applications, particularly applications that generate unstructured data. A representation of this approach is provided in Figure 1.

ThirdParty_1

Figure 1. VCE Vblock Systems and VCE technology extension for EMC Isilon mixed workloads

The IDC report also cites the ability of EMC Isilon storage to “deliver native Hadoop Distributed File System (HDFS) integration with the ability to support multiple Hadoop distributions and versions.” This is important because many organizations are looking to consolidate their unstructured data assets, eliminate costly silos of storage and then use Hadoop analytics to gain new insight that can benefit their business. To achieve this, organizations need an infrastructure that can easily support multiple workloads with different capacity and performance requirements. Such an infrastructure is the foundation for an enterprise data lake.  With its built-in multi-protocol capabilities, including native HDFS support, EMC Isilon provides a highly efficient and flexible foundation for a vibrant enterprise data lake that supports both traditional workloads and 3rd platform applications. With Isilon organizations can consolidate unstructured data from a wide range of mixed workloads and ingest and access data from multiple sources. In this way, Isilon can be used to eliminate silos of storage, lower costs, simplify management and meet the demands of unstructured data from 3rd platform applications.

Deploying mixed 3rd platform workloads on a single infrastructure platform can be extended to include block-based data when supported by Vblock Systems with EMC XtremIO flash arrays and VCE technology extension for EMC Isilon.   A great example of this is with the expanded use Virtual Desktop Infrastructure (VDI) for mobile and social applications, both 3rd platform cloud-based application as shown in Figure 2, with user data such as documents, spreadsheets, videos, pictures, and other rapidly growth unstructured data.  In this use case, scale-out flash meets scale-out NAS.  By combing the power of Vblock System 540 (or Vblock System for Extreme Applications) based on EMC XtremIO and extended with EMC Isilon storage organizations can get the ultimate in performance, flexibility, scalability and efficiency.  This “hot edge/cold core” approach allows for independent scalability of performance and file capacity.  For example, if more IOPS performance is required, you can easily scale more XtremIO bricks. And if more user data storage capacity is needed, you simply add more Isilon nodes.  Ultimately, this approach dramatically reduces the cost and complexity of owning and managing a VDI infrastructure.

ThirdParty_2

Figure 2. VCE Vblock Systems and VCE technology extension for EMC Isilon for VDI workload

Just as their notion of the 3rd platform of computing is compelling, IDC’s perspective on the advantages of VCE Vblock Systems with VCE technology extension for Isilon to support 3rd platform applications is clear:  VCE Vblock Systems with VCE technology extension for Isilon is an enterprise-class converged infrastructure system for 3rd platform applications running in a mixed workload environment.  The benefits to enterprises and services providers are faster time to deployment, simplified operations, lower total cost of ownership, and ultimately, a smoother transition to the 3rd platform and the transformation of their business.

Source:

[1] IDC Technology Spotlight, The Importance of a Modular and Flexible Approach to integrated Infrastructure, November 2014

A New Paradigm for Hadoop

Michael Noble

Michael Noble

Sr. Product Marketing Manager

A new ESG Lab Review is a “must read” for any organization looking to consolidate unstructured data, eliminate infrastructure ‘silos’ and leverage Hadoop analytics to gain insight (And who isn’t these days?). The ESG Lab Review: VCE Vblock Systems with EMC Isilon for Enterprise Hadoop,[1] documents lab testing of a converged infrastructure (CI) solution based on VCE Vblock Systems and VCE technology extension for EMC Isilon storage. The ESG Lab Review also describes how VCE Vblock Systems and EMC Isilon storage can be combined with VMware vSphere Big Data Extensions (BDE) to provide a fully integrated platform that easily supports growing big data and analytics requirements. This platform is also easily extensible for a wide range of traditional and next-generation workloads.

VCE1

Figure 1. Enterprise Hadoop with VCE Vblock Systems and VCE technology extension for EMC Isilon storage

 

As shown in Figure 1, Hadoop compute resources are provided by the VCE Vblock Systems while EMC Isilon shared storage is used to store the unstructured data and provide Hadoop storage functionality. In addition to streamlining the analytic workflow, this approach provides break-through efficiency and cost savings relative to a “traditional” DAS-based approach. With Isilon, there is no need to create a separate environment to ingest data into a Hadoop cluster because the data can be written directly to Isilon using NFS, SMB, HTTP, or FTP and read by the Hadoop cluster using HDFS. Isilon allows for Hadoop analytics to be done on data that is in-place, while eliminating the need for 3x replication required with traditional direct attached storage (DAS). This lowers costs and simplifies management which is especially important for those organizations to expand from R&D or POC environments to full production. (To get an idea of how much your organization can save with in-place Hadoop analytics on Isilon, be sure to check out the on-line TCO analysis tool here).

Along with very appealing money saving prospects, in-place analytics with Isilon scale-out storage provides a number of other important advantages relative to a “traditional” Hadoop infrastructure utilizing direct-attached storage. As described in the new ESG Lab Review as well as in a previous EMC white paper, EMC Isilon Scale-Out NAS for In-Place Hadoop Analytics, these include increased resiliency, improved data protection and security.  In their analysis, ESG found that with VCE Vblock Systems and EMC Isilon, security and compliance were robust, enabling multi-tenancy and read-only access to data when needed. ESG Lab also validated that vSphere Big Data Extensions (BDE) allow the automatic provisioning of Hadoop nodes, as needed, for both virtual Hadoop clusters and as virtualized node additions to existing bare-metal clusters. This enables Hadoop clusters to be expanded quickly and easily.

This is all great, but for me, the most exciting finding in the ESG Lab Review, is in their analysis of Hadoop performance. Using three different tests, ESG Lab used the Hadoop TeraSort suite to validate the HDFS and MapReduce layers of a VCE Vblock Systems and EMC Isilon joint-solution. In the testing process, the data set size was scaled from 100GB to 1TB and job completion time was monitored in each test case. The results from these tests were then compared to the performance of a traditional Hadoop cluster consisting of commodity servers and DAS. An example of these test results is summarized in Table 2.

Hadoop and Isilon

Table 2. Performance Comparison: Traditional Hadoop versus Hadoop on VCE Vblock Systems with EMC Isilon Scale-out NAS

 

These results show that VCE Vblock Systems with EMC Isilon are well suited to deliver levels of virtualized Hadoop performance comparable to bare-metal installations in a scalable, flexible package. In ESG Lab’s TeraSort Suite tests, the VCE Vblock Systems and EMC Isilon solution delivered significant performance benefits, completing Hadoop jobs in as little as half the time compared to a traditional Hadoop configuration. Obviously, this is one of those “your mileage may vary” things but the evidence is compelling.

Interestingly, the ESG Lab Review also describes how ESG Lab measured the performance impact of the loss of a node in the Isilon cluster by intentionally powering down one of the eight Isilon nodes in the tested configuration. They observed Isilon’s data resiliency (attributable to the built-in data protection of Isilon’s OneFS operating system) and confirmed only a 12 percent performance difference, or seven-eighths of the performance of the healthy eight-node cluster (this also demonstrates the linear scalability of Isilon with respect to capacity and performance).

Taking all of these factors into account – efficiency, cost savings, management simplicity, data protection, security and performance – it seems clear that organizations interested in consolidating their Big Data and using Hadoop analytics to accelerate time to insight should check out this ESG Lab Review and learn more about how VCE Vblock Systems with EMC Isilon can benefit their business and transform the ways you run your analytics.

Welcome to the new paradigm for Hadoop!

 

Source:

[1] ESG Lab Review, VCE Vblock Systems with EMC Isilon for Enterprise Hadoop, November 2014

Reliability, Flexibility & Speed to Screen: The benefits of an all-IP infrastructure core to Fox Sports Australia

Charles Sevior

Charles Sevior

EMC Isilon CTO (Asia Pacific & Japan)
Charles Sevior is a member of the Office of CTO for the Isilon Storage Division of EMC². With a strong background in the M&E sector, he also provides focus on solutions for Life Sciences, Healthcare, EIS and other sectors across the Asia-Pacific-Japan region. Charles has had almost 30 years of engineering experience, most recently as an independent media technology consultant, and before that as head of Engineering and IT for leading commercial television broadcaster Nine Network Australia. There, he oversaw the business and technology transformation required to adopt a fully digital file-based workflow across News and Presentation, including construction and fitout of state of the art broadcast facilities. He has enjoyed hands-on experience at major live events such as the Australian Grand Prix and Olympics, and was Project Director of the first terrestrial free-to-air broadcast of a live 3D sporting event in Australia (and the world) in May 2010. He also served as Chair of the Free TV Australia Engineering Committee during the period of government negotiation and development of the analogue TV switch-off / digital frequency restacking process that is now nearing completion in Australia. Charles has held positions of Director of Global Television, the leading Australian outside broadcast and studio facilities provider, and Director of TX Australia, the owner and provider of television broadcast transmission facilities in Australia’s major metropolitan markets. Charles enjoys working with adopters of leading technology solutions for digital media and associated analytics of massively scalable big data repositories. He prefers an approach favouring collaborative solutions with leading application partner vendors to yield excellent results for EMC’s customers. Charles Sevior holds a Bachelor of Engineering (Hons) degree from the University of Melbourne, and a Master of Business & Technology (MBT) from the University of NSW / AGSM in Sydney. He is also a standards member of the Society of Motion Picture & Television Engineers (SMPTE). http://au.linkedin.com/in/csevior/

Facing the impacts of a rapidly changing consumer landscape, increasing HD channel count and the need to relocate to a brand new facility created the perfect storm that allowed Fox Sports Australia to implement an innovative IT-based live television operation in Sydney, Australia. This facility and operation is a great example of the next generation of media facilities – built upon a generic and expandable IT core with all components loosely coupled and abstracted via an Enterprise Services Bus approach.

Fox Sports CTO Michael Tomkins – with a unique background spanning live television, post-production, network design and high-speed radio data systems – reflects on the result of the solution his team has developed to address the needs of a multi-channel HD live sports broadcaster servicing one of the most sports-mad markets in the world.

The trend to a converged IT infrastructure at the core of media and broadcast facilities is now very clear, with most specialist media application vendors adopting standards-based product designs capable of running on virtualised IT systems. Leading industry analysts Devoncroft Partners recently documented this industry change extensively in their report – IBC 2014 – Observations and Analysis of the Media Technology Industry. A diagram from that report (reproduced below) illustrates very well how companies such as Fox Sports Australia are making the best use of IT technology and creating flexible and efficient content engines for the business operation.

There is still a strong need for specialised equipment (like cameras and vision switchers) and specialised applications (such as editing and media asset management) which the media industry vendors continue to develop, compete with and innovate on. But the process of rebadging server, storage and network solutions and reselling these to the media industry with very little added intellectual property and innovation is becoming unpopular. In many cases also, customers are looking at solutions that are not bound to physical infrastructure but can be virtualised and run in public, private and hybrid cloud environments.

Next Focus

Taking this approach to infrastructure solution architecture allows you to leverage the enormous investment in R&D and the benefits of Moore’s Law that the IT industry can bring to market sooner than would be the case if you were purely dependant on specialist vendors building kit just designed to address the media industry. As long as the technology domains are abstracted but loosely coupled, using well-defined standards, each domain can be upgraded or expanded as the business needs evolve – with fewer future dead-ends requiring a complete “rip and replace” response.

This also leads to an infrastructure solution that is effectively a private/hybrid cloud solution. This helps to balance the cost and control benefits of privately owned & managed infrastructure, with the opportunity for burst compute as required to support special events.

One of the most important infrastructure requirements is reliability. Live sports broadcasting is one of the most demanding environments for continuous up-time and careful change control processes. Any errors or instability can lead to instant loss of transmission or incorrect content to air, which can impact on the business & revenue with immediate effect! Selecting a core storage technology based on fault-tolerant scale-out clustered Isilon nodes is an important step towards an inherently reliable system.

Speed of operations and efficiency is derived from the benefits of using a single volume media content and business information Data Lake, handling the storage and I/O workloads of multiple different processes. We see many customers experiencing the benefits of collaborative editing, where all incoming content is available to all editors (even with live growing clips), and rapid production of highlights, promos and alternate versions can be managed in parallel – with no file locking issues! Having all of the content online, available, and deliverable with scale-out distribution metrics dramatically simplifies origin server design for large-scale content delivery networks (CDN). This opens up the content to monetisation via new distribution platforms, and to capture and store usage clickstreams to derive Data-Driven Business Insights which can open up further revenue opportunities to media companies.

Content Delivery

Isilon’s flexibility led to an innovative solution at Fox Sports Australia when it came time to transition a live broadcast facility and staff from the old location to the new facility. By temporarily breaking the storage cluster into two, maintaining bi-directional synchronous replication across the two sites, staff and workloads could be transitioned in stages with no data jockeying or manual migration processes.

In addition to EMC Isilon NAS storage, Fox Sports Australia also adopted the VCE VBlock converged infrastructure solution for their business-IT core, and EMC Data Domain backup & recovery solutions.

Fox Sports Australia – an efficient and profitable media business keeping Aussie sports fans happy!

If you liked this post please share on LinkedIn, Facebook and Twitter. We can help you as you take next steps with an IP infrastructure for file based workflows in your media or broadcast facility, please reach out to me or your EMC local contact.

Think you’re cut out for an efficient edit workflow?

Jeff Grassinger

Jeff Grassinger

Sales Alliance Manager

Why one media organization loves their video post-production workflow (And you should, too!)

Do you think your editing workflow is efficient? This question of efficient workflow around edit continues to come up in many of our conversations with media professionals. Edit workflows are a challenging landscape as in most environments there’s a lot to consider; codecs, network interfaces, switches, shared storage and many more – not to mention the never ending debate on which is the “best” edit solution – Avid , Adobe, or Apple.

Efficient edit made easy

While this edit debate continues and we wait for the emergence of the dominant edit platform, many media organizations are taking advantage of a key opportunity to drive efficient, flexible video editing workflows. Collaboration within your edit workflow is a great example of how a media organization can become much more efficient.

Recently we had the opportunity to talk with Martin de Bernardo the Manager of Technical Services at Sheridan College (the second largest art college in North America) about the challenges they faced in their media environment and the success they had in becoming more efficient. Specifically, Sheridan was looking to create unlimited access to media for their students to collaborate on projects between fellow classmates and instructors.

See why they love their workflow

When you watch to video you’ll understand how the team at Sheridan addressed their access and collaboration challenges in an environment with editing applications Apple Final Cut Pro, Adobe Premiere Pro and Avid Media Composer. You’ll see why Sheridan is breaking new ground on how an educational institution uses shared storage and collaboration tools from MXFserver.

 

Let us know if we can help in your environment to build efficient, collaborative workflows and if this video was helpful, please share it with your network.

 

To read more about Sheridan College’s Isilon solutions please click here.

Strata & Hadoop World 2014 Recap

Ryan Peterson

Ryan Peterson

Chief Solutions Strategist

Preventing terrorist attacks, feeding the hungry, capturing bad guys, and enabling cubic splined segmented continuous polynomial non-linear regression models. I promise to try and explain the last one later in this blog.

This week was Strata + Hadoop World, a fast-growing convention and exposition directly pointed at Statisticians, Engineers, and Data Scientists. The topics were diverse and ranged from machine learning to the Michael J Fox Foundation’s use of Hadoop to help discover Parkinson’s disease earlier on the cycle for patients.

What is clear from the messaging this year is that Hadoop has made it into the mainstream technology people are using in their organizations. Customers from all walks of life spoke about their projects, planned projects, or how they changed their business, the economy, the world, or even saved lives.

One of our customers discussed a scenario they were involved in where their software with Cloudera Haodop and Isilon was used to save a life: “A child contacted a helpline service online, indicating that he had self-harmed and was intending to commit suicide. This was passed on to CEOP who acquired the communications data to reconcile the IP address to an individual. They did so in a very short space of time and passed it on to the local police force. When they got into the address the child had already hanged himself, but was still breathing. If there had been any delay, or if the child had been unlucky enough to be using one of those service providers that do not keep subscriber data relating to IP addresses, that child would now be dead.” – Page 11 at http://www.publications.parliament.uk/pa/jt201213/jtselect/jtdraftcomuni/79/79.pdf

We also saw Mark Grabb of GE explain their use of EMC technology to create the Industrial Internet and what that means to the innovation engine (pardon the pun) at GE.

What we are most excited about this year is a fundamental transition that flips the thinking that data must be moved into a new repository in order for that data to be included in analysis operations. Don’t get me wrong, data lakes simplify the management and correlation of data by getting as much into one place as possible. That in mind, there are some fundamental issues we are starting to address. Take this math from a real customer: 130PB of object storage used to house video and images + 8PB of file data used for home directories, weblogs, click stream, and more. Add in a desire to run analysis on ALL of that data and you’ll need 3-4x the capacity in a central Hadoop system. Do you want to build a 400-500PB raw capacity hyper-converged Hadoop cluster? What if we can flip the process and offer the right storage solution for the data being stored at the location where that data needs to be stored, and for the primary workload that originally captures and uses that data? That changes the conversation to creating a highly capable platform full of all of the ecosystem applications and pointing to the data. I had the opportunity to discuss this flipping of the process with customers during a session at Strata and it was met with great enthusiasm.

Mike Olson announced the partnership with EMC and the enablement of Isilon as the first platform to be certified with CDH. See his blog at http://vision.cloudera.com/turn-your-data-lake-into-an-enterprise-data-hub/. It reflects on the idea of bringing an Enterprise Data Hub to layer above all of the data in data lakes to enable a central system for correlating data from many sources. Mike Olson and I discussed our newly found partnership with David Vellante on theCube.

We cannot be happier about these announcements and look forward to a long and mutually prosperous relationship. Let me say here that the Cloudera team encompasses some of the most humble and talented people in the world and they are a joy to work with. Tom Reilly, Cloudera CEO and Mike Olson both took multiple stage opportunities to talk about the new partnership from Mike’s keynote to Tom and Mike’s Cloudera Partner Summit discussions.

During the event, I had the great privilege to hold a joint “Jam Session” with Ailey Crow from Pivotal’s Data Science team. The goal of the session was to riff on projects we have worked on that range from Healthcare and Life Sciences to Government, Telco, and Banking. With a packed house, she and I had an incredible time answering questions, discussing use cases around Big Data and more. Ailey is one of the smartest people I have met and I am truly honored to have shared the stage. A couple of examples from the discussion include banks using social sentiment analysis to look at trends of stocks; enabling traders to use one more data point before investing in particular securities. Another Ailey spoke about correlated air quality information with patient’s experiencing asthma and who also haven’t refilled their prescriptions; the result of which enables notifications to those patients to refill their prescriptions when air quality drops below certain thresholds.

An offshoot conversation with Ailey and B. Scott Cassell (Director, Solutions Architecture for EMC) went into an idea B. Scott has for modeling performance of storage. As he explained what he was doing, Ailey explained that what he wanted to do was create a “cubic splined segmented continuous polynomial non-linear regression model”. Roughly what that means is to create a specific model of performance based on specific plot points, but in order to keep that model as accurate as possible, break it into multiple chunks (segmented), but in order to connect those segments, use a cubic spline (I have no idea what that is – but they did), and ultimately graph a continuous polynomial. Here is what one looks like:

Yep, that hurts my brain. And I bring it all up for good reason. This year at Hadoop World we began to see new products that do all of that for you and put together the neat graph, chart, or even turn it into an application (perhaps an easy to use performance predictor is in our future). Hadoop is becoming an underlying toolset that will be the base for the next generation of technology. Similar to the RDBMS, Hadoop will soon become a term and less of “the application”.

The EMC Federation was there in full force. The EMC booth displayed Isilon, ECS, and DCA. Experts on each platform manned the booth and hundreds of attendees came through to learn more about our HDFS Storage solutions. Sitting just across from VMware and down the hall from Pivotal, I was reminded how strong of a force the federation already is in the Big Data space. With the newly announced DCA+Isilon+Pivotal bundle v2, the federation is able to provide the “Data Lake in a box” that so many have been asking for. See the Press Release at http://wallstreetpr.com/emc-corporation-nyseemc-and-pivotal-unveils-data-lake-hadoop-bundle-2-0-34175

Aidan O’Brien (Head of Global Big Data Solutions for the EMC Federation) and Sam Grocott (SVP, Emerging Technologies Marketing & Product Management) discuss the newly formed Emerging Technologies Division and the plans for EVP solutions around Big Data.

People often ask me what excites me about working for a storage company. I like to answer them with a couple of key points. EMC is no longer a storage company in my mind. We are a data company. And we’re tackling challenges that had previously gone unsolved. EMC stores data, but with protocol access to that data such as HDFS (Hadoop), EMC is able to unlock the potential for that data and allow new harder questions to be asked. So whether you’re trying to prevent terror, increase food production, return kids to their parents, or answer a complex technology performance question in an easier way, EMC has the tools and rich partnerships to help you do that.

Wal-Mart probably knows more about you than your doctor…

James Han

James Han

Sr. Business Development Manager-Healthcare

When you walk into a Wal-Mart, they likely know more about you than your local hospital. They know when and what you’ve purchased, your income, your family members, your political affiliation, and probably even the habitual route you take while walking through the store.  Like many other companies, Wal-Mart mines tons of big data to improve their marketing campaigns, sell more, and generally improve their bottom line.

Healthcare is behind in employing big data analytics tools. For example, when you go to the hospital or clinic, it is often treated as a single visit—you may even have to update all of your demographic information each time.  Your data is generally only important during your visit and is often archived immediately after your visit—essentially making it inaccessible for subsequent visits. What if healthcare could employ big data analytics to the level of commercial enterprises like Wal-Mart?

Let’s look at some statistics related to healthcare spending.  A 2012 report (of 2009 data) from the National Institute for Health Care Management (NIHCM) reveals that spending for healthcare services is extremely uneven—a small proportion of the population is responsible for a very high portion of spending. The study finds that the top 5% of spenders account for almost half of spending ($623 billion), and the top 1% of spenders account for over 20% of spending ($275 billion)[1] (See Figure).

Healthcare_1

It wouldn’t take much improvement in efficiency when dealing with that 1% of the population to make a substantial payoff. If trends could be identified, or procedures developed that would lower costs for those few utilizers to keep them healthier and lower their consumption the impact can be dramatic.

Unfortunately, many healthcare providers are still trying to figure out what data they need to perform the equivalent of Wal-Mart’s analytics. Or they have the data, but can’t figure out how to get it all in one place.

EMC Isilon can help. Isilon is in the business of big data—making big data analytics more cost-effective and—perhaps most importantly with respect to healthcare—easier to implement. Isilon provides the foundation for a scale-out data lake—a key capability that provides simplicity, agility, and efficiency to store and manage unstructured data. Starting with a scale-out data lake, healthcare organizations can:

  • Invest in the infrastructure they need today to get started today,
  • Realize the value of their data, store, process, and analyze it—in the most cost effective manner, and
  • Grow capabilities as needs grow in the future.

In short, EMC Isilon can help healthcare organizations get on the road to leveraging their data to improve patient comfort, lower costs, and streamline healthcare procedures.

 

 

Source: [1]“The Concentration of Healthcare Spending: NIHCM Foundation Data Brief July 2012” http://www.nihcm.org/component/content/article/326-publications-health-care-spending/679-the-concentration-of-health-care-spending

Converged Infrastructure for Big Data Storage and Analytics

Michael Noble

Michael Noble

Sr. Product Marketing Manager

It’s no secret that unstructured data is growing rapidly and poses significant challenges to organizations across virtually every industry segment to store, manage, secure and protect their data. According to IDC, the total amount of data storage world-wide will reach 133 exabytes by the year 2017, of which 80 percent will be required for unstructured data.

Unstructured Data Growth

To meet this challenge, VCE has just introduced a compelling new converged infrastructure (CI) platform – VCE™ Technology Extension for EMC® Isilon® – for organizations looking to consolidate and modernize their big data environment. With this technology extension, existing VCE Vblock® Systems can leverage Isilon’s massive scalability and built-in multi-protocol data access capabilities to easily expand capacity to address large-scale data storage needs while supporting a wide range of applications and traditional and next-gen workloads including Hadoop data analytics.

Isilon Scale Out Data Lake

With VCE technology extension for EMC Isilon as a foundational element of a scale-out data lake infrastructure, organizations can eliminate costly storage silos, streamline management, increase data protection and gain more value from their big data assets. A great example of this is in the area of big data analytics and Hadoop. As the first and only scale-out NAS platform that natively integrates with the Hadoop Distributed File System (HDFS), Isilon is a game-changing big data storage and analytics platform.

Before Isilon came into the picture, Hadoop deployments have largely been implemented on a dedicated infrastructure, not integrated with any other applications, and based on direct attached storage (DAS) that is typically mirrored up to three times or more. In addition to requiring a separate capital investment and added management resources, this approach poses a number of other inefficiencies. By leveraging Isilon’s native HDFS support and in-place analytics capabilities, organizations can avoid capital expenditures and related risks and costs associated with a separate, dedicated Hadoop infrastructure by extending the Vblock System-based environments.

Isilon’s in-place analytics approach also eliminates the time and resources required to replicate big data into a separate infrastructure. For example, it can take over 24 hours to copy 100 TB of data over a 10GE line. Instead, with VCE technology extension for EMC Isilon, data analytics projects can be initiated immediately to get results in a matter of minutes. And when data changes, analytics jobs can quickly be re-run with no need to re-ingest updated data.

VCE technology extension for EMC Isilon also leverages Isilon’s ability to support multiple instances of Apache Hadoop distributions from different vendors simultaneously including Pivotal HD, Cloudera Enterprise and Hortonworks Data Platform. These same data sets on EMC Isilon can be extended to other analytics such as SAS and Splunk. This means that organizations gain the flexibility to use whichever tools they need for their analytics projects.

Along with these powerful big data storage and analytics capabilities, VCE technology extension for EMC Isilon brings the convenience and assurance of proven VCE engineering expertise. VCE’s technology extensions are tightly integrated and fully tested and validated. This allows organizations to quickly increase processing power or add storage capacity, without typical technology risks.

All-in-all, VCE technology extension for EMC Isilon does indeed look like a compelling approach to help tame the big data storage challenge and unlock the value of big data assets. Please let us know what you think.

 

Converged Infrastructure for Scale-out Data Lakes

Carl Washington

Carl Washington

Sr. Business Development Manager

For many organizations today, the rapid growth of unstructured data has put a spotlight on the challenges of a decentralized IT infrastructure burdened with data storage “silos” across the enterprise. These limitations include:

  • Complex management of multiple silo data sets
  • Inefficient storage utilization
  • Storage “hot spots” and related performance issues
  • Data protection and security risks
  • Inability to support emerging workloads

As many enterprises have experienced, converged infrastructure (CI) systems are a great way to easily eliminate silos, consolidate and simplify IT environments while addressing increasing demands on IT. Additionally, with the rapid growth of unstructured data, many organizations are attracted to a CI infrastructure strategy to implement a scale-out data lake. As used here, “data lake” is a large reservoir of unstructured and semi-structured data consolidated from different traditional and next generation application workload sources.  These next generation applications, including mobile computing, social media, cloud computing, and big data are fueling the enormous growth of this unstructured data.

Foundation for Scale-Out Data Lake

The storage infrastructure to support these next gen applications and scale-out data lakes must be highly scalable both from a capacity standpoint as well as performance. It must also have the operational flexibility to support a wide range of applications and workloads. One of the other great advantages of a data lake is the potential to leverage powerful analytics like Hadoop to gain new insight and uncover new opportunities by harnessing big data assets. To address these needs, VCE has just announced an exciting new product, VCE™ technology extension for EMC® Isilon that allows organizations to quickly implement an efficient converged infrastructure for scale-out data lakes.     

Designed to be deployed as a pooled resource with Vblock® Systems, VCE technology extension for EMC Isilon enables a scale-out data lake approach to the modern datacenter. This provides enterprises and service providers an infrastructure to consolidate unstructured data, support traditional workloads as well as next-gen applications, and enable in-place big data analytics using Isilon’s native HDFS support. These capabilities help organizations to reduce costs, simplify management, gain new efficiencies and accelerate time to insight while avoiding the need for separate infrastructure investments. Organizations that already rely on VCE to run their mission-critical applications and manage data can easily augment their existing environments with this new offering, VCE technology extension for EMC Isilon.

Isilon Scale Out Data Lake_1

The EMC Isilon Scale-out Data Lake can collect unstructured data from multiple workloads such as HPC, Mobile Home Directories, Video Surveillance, Large Scale Archives, File Shares, and more. A reservoir of unstructured data from different sources can become immediately available to have analytics performed against it.

What’s more, the source data can be written to Isilon using one protocol and then accessed using a different protocol. Isilon’s support for multiple protocols, such as NFS, SMB, HDFS, HMTL, etc. is a compelling feature that provides enormous flexibility and agility for next generation of applications. Given these points, Isilon’s ability to consolidate data from multiple sources and run in-place analytics strengthens the advantages provided by VCE Vblock Systems while extending its applicable use cases and accelerating adoption for next generation of applications.

VCE_1

 

In sum, VCE Technology Extension for Isilon allows enterprises to implement a scale-out data lake infrastructure that provides a number of advantages including:

  • Lower cost and increased efficiency
  • Simplified management
  • Increased operational flexibility
  • Faster time to service and market
  • Robust security and data protection

Another Important Advantage: The VCE Experience

By extending the full VCE Experience – engineered, built, validated, shipped and supported by VCE — to EMC Isilon, VCE is delivering virtualized cloud infrastructure systems optimized for traditional and next generation application workloads with scale-out data lakes. 

In addition to the technology and infrastructure capabilities, the VCE Experience helps IT executives drive agility into their operations and enable IT as-a-service, which makes IT more responsive to new business applications while shortening the time to provision new infrastructure.

VCE Technology Extension for Isilon offers a great value for organizations looking to implement an IT infrastructure for their own scale-out data lakes. Please let us know what you think.

An interview with Innovator, Hugh Williams

Ryan Peterson

Ryan Peterson

Chief Solutions Strategist

I recently got a chance to sit down with Hugh Williams, the SVP of R&D for Pivotal. Hugh was previously at eBay and Microsoft and comes with an impressive background with respect to Big Data technologies that includes industry patents and numerous publications.  Here is a transcript of our discussion:

Ryan: Hi Hugh, thanks for taking the time to sit down with me.  Getting straight to the questions I have for you:  How would you define Data Lake?

Hugh: Great question, the basic premise of a Data Lake is that you have one place where all of your data is stored, and it allows everyone in an organization to extract value from the data using the tools they want to use.  Without a data lake, data is usually silo’d in legacy systems where only a couple of applications or a subgroup of users have access to the data.

Ryan: What would you consider to be the most important attributes of a data lake?

Hugh: Having all of the data in one place.  Of course, you need the right tools to be able to accomplish that – ingestion and connection to existing sources is still more challenging than it should be.

Ryan: How do customers build data lakes?

Hugh: Most companies start out a data lake with a set of folks who build out a small Hadoop capability, they demonstrate that capability, the noise gets louder, and the company says that rather than having all of these solutions throughout the organization, let’s look at collecting all of that into one place.

Ryan: I call those Data Puddles!  What have you seen inhibit adoption?

Hugh: I think a few things come to mind: Ingestion and egestion is problematic.  How am I going to get all of that data from various places into the central place?  The second thing is that the Hadoop ecosystem is relatively immature.  Although an impressive toolbox, there is still a barrier on setting up the infrastructure, the standing up, the training, getting all the right pieces.  The last thing I’ll say is using Hadoop to extract business value is not easy.  You have to employ Data Science folks.  Pivotal is making SQL much more mature on Hadoop to help solve this issue.

Ryan: What interests you about the Isilon partnership with Pivotal?

Hugh: Hadoop will rule the world, but its maturity is a problem today.   Isilon is mature and companies bet their businesses on it.  If you want one thing to be reliable, it has to be the storage – and so the partnership between Pivotal and Isilon really matters

Ryan: Customers often lump HAWQ with Stinger, Impala, and even Hive.  How do you differentiate HAWQ from other SQL solutions?

Hugh: HIVE is a relatively elementary implementation of SQL access to Hadoop with basic features of SQL.  It was revolutionary when it happened, but it doesn’t have what a Data Scientist would need.  Impala is a nice step forward from HIVE.  The really interesting thing about HAWQ is that we took 10+ years of experience with SQL from the Data Warehouse space and ported that to work with Hadoop.  What you get with HAWQ is GreenPlum database heritage adapted to Hadoop.  Pivotal has the most advanced solution for SQL access to Hadoop.

Ryan: Can you provide an example of something you can do with HAWQ that cannot be done with the others?

Hugh: There are benchmarks such as TPC-DS that help validate whether various typical SQL queries can be evaluated and optimized on different systems.  In rough terms, when we used TPC-DS to test SQL compliance, 100% of queries are successful with HAWQ, only 30% with Impala, and around 20% for HIVE. We published an independently peer reviewed study that shows these results in this year’s SIGMOD, the leading database conference

Ryan: You recently announced GemXD, a new product in the GemFire family.  What is an example of a problem that GemXD solves?

Hugh: You can think of it as Cassandra or HBase done really, really well – with a SQL interface, full ACID capabilities, the ability to upgrade with no downtime, the ability to read and write to Hadoop’s HDFS storage layer when there’s too much data for memory, and much more.

Ryan: What’s your favorite “Big Data changed the world” story?

Hugh: Here’s a fun story. When I was at Microsoft, I decided to research what drugs caused stomach cramps by looking at what queries customers ran in sessions on Microsoft’s search engine. I reverse engineered a list of drugs that caused stomach cramps, and checked the FDA literature – and, sure enough, it was right.

Ryan: How does Cloud Foundry fit into the Big Data / Hadoop storyline?

Hugh: Today they’re somewhat separate stories, but they won’t be for long.  It’s of critical importance to the future of PaaS and the future of Big Data that they converge.  In the future, most applications will be data-centric, and great companies will be built on those applications. Developers are demanding the convergence. PaaS and Big Data exist within Pivotal to build the future platform for software.

Video Surveillance: A Tale of Two Markets

Christopher Chute

Christopher Chute

Vice President, Global Imaging Practice, IDC
Christopher Chute is a Vice President with IDC's Global Imaging Practice. His research portfolio focuses on transformative technology trends impacting the imaging market. Mr. Chute's specific coverage includes digital imaging and video technology adoption across professional, broadcast/creative, consumer and physical security/surveillance. He conducts forecasting, product and service analysis and user segmentation for these vertical markets through a variety of supply and demand-side studies and consulting engagements. Mr. Chute's research has often centered around charting the disruption caused by technology transformations on status-quo industries, such as the migration from film to digital in the photography market, commoditization/democratization of broadcast/cinema video capture and photofinishing, and the impact of cloud services and tablet usage on the imaging industry. Writers from a variety of publications rely on Mr. Chute for a deep understanding of these markets, including Time Magazine, The Wall Street Journal, Fortune Magazine, USA Today, Investor’s Business Daily, San Jose Mercury News, Bloomberg, and The Financial Times. His television and radio appearances include ABC World News Tonight, Fox & Friends, CNBC, National Public Radio, and Into Tomorrow With Dave Graveline. Mr. Chute also speaks regularly at a variety of international trade shows and technical industry group meetings, including ASIS, CES, Creative Storage, Computex and Photokina. Mr. Chute holds both undergraduate and MBA degrees from Boston College.

As a longtime video surveillance industry observer, I remember when it was fully reliant on analog technology to secure people, places and things. However, it seems the industry is still reliant on similar solutions, whether they are analog cameras capturing video that’s digitized and stored on reused tape, or security personnel who act as both preventative and post-event resources. It became clear to me that the market was bifurcating into two types of deployments: large, fully-IP installations that are fully IT-led, and analog-prone, fragmented, traditional eyeballs-to-screens installations.

IP surveillance is often thought of in terms of image quality, megapixels and other visually related terms, yet the IT-led side of the market has centered itself more on workloads, specifically a combination of video content and analytics that are woven into a broader physical security initiative. These deployments aim to be far more preventative than forensic. IT-led, multi-dimensional installations like these tend to be physically large and extend across several facilities across long distances. This requires both fixed and mobile surveillance cameras and other sensors connected to local edge storage sources that communicate with a core system: cities, transportation, government facilities, and large public complexes. The resultant workloads are collated into a large data set that undergoes extensive analytics processes at a centralized command facility. This workflow is modeled on an enterprise datacenter rather than a security room.

In many ways, workloads and analytics now define physical security more than cops and cameras.

Thus, IT storage leader EMC’s announcements at ASIS 2014 are timely in addressing market growing pains. For instance, to complement its enterprise-class core EMC Isilon storage platform, EMC is now offering EMC VNX-VSS100 (built on proven VNX technology), a purpose-built storage system that can act in an edge capacity with EMC Isilon, or serve as a cost-effective, scalable hub for smaller network-based security installations. The company is offering lab-level validation for Video Management Software (VMS) Providers  partners like Genetec, Verint, and Milestone, which will allow system integrators to deploy solutions more quickly. EMC is also creating greater partner enablement through training resource sponsorship and partner investment in high-growth countries.

IDC’s Digital Universe Project forecasts that video surveillance workloads will grow an average of 22% by 2020. And while system integrators have been successful partnering with a wide range of surveillance hardware, VMS, storage and analytics vendors, what’s been lacking is a strong, experienced third-party IT leader like EMC that can create a foundation for surveillance-specific vendors and integrators to work and partner with – while keeping pace with surveillance trends.