What are some cool projects using Cloudera?

Updated on : December 3, 2021 by Duncan Baldwin



What are some cool projects using Cloudera?

Below are some examples from our library of case studies 1. Note that many of these have a video and a whitepaper, while some only have one or the other. Be sure to check the "Related Resources" box on the right of the page for all related media.

You can also check out the "Use Case" tag on our developer blog 2 to read more cool use cases. I have put out a few below.

Finally, you can browse our Customers page and hover over a customer's logo to find information related to that customer's use case.

Named Clients

  • Explorys Medical: Improving Healthcare Quality and Costs Using a Big Data Platform
  • Nokia: Using Big Data to Bring the Virtual and Physical Worlds Together
  • Opower uses Big Data to help consumers save $ 320 million on utility bills
  • Patterns and Predictions Uses Big Data to Predict Veterans Suicide Risk
  • Gravity creates a personalized web experience
  • DataSift: Enabling Businesses to Benefit from Social Data Using Hadoop
  • Experian - 360 degree view
  • Rapleaf works smarter with Cloudera
  • NetApp Improves Customer Support by Deploying Cloudera Enterprise
  • SecureWorks Slashes Storage Cost with Dell | Cloudera Solution
  • Persado Supports Marketing Language Engineering with Data Analyst Training


Unnamed Clients

  • Global Payment Processor Detects Hadoop Fraud While Saving $ 30 Million
  • Optimizing Healthcare Connectivity with Big Data
  • Shift Center of Gravity for Data Management to Hadoop


Blog use cases

  • BinaryPig: Scalable Static Binary Analysis on Hadoop
  • Email indexing with Cloudera Search and HBase
  • Cloudera Support Secrets: Impala and Search Make the Customer Experience Even Better
  • Customer Support: Motorola Mobility's Award-Winning Unified Data Repository
  • How Treato Analyzes Health-Related Social Media Big Data with Hadoop and HBase
  • Take a look at Skybox Imaging's Cloudera-powered satellite system
  • Customer Support: Six3 Systems' Wayne Wheeles Drives Cybersecurity Innovation With Impala
  • How Apache Hadoop Helps Scan the Internet for Security Risks
  • Apache Hadoop Developer Training Helps Query Big Telecom Data
  • The 2012 Government Big Data Solutions Award winner is the National Cancer Institute.
  • Rat Brain Neural Signal Processing Using an Apache Hadoop Computing Cluster - Part I
  • Rat Brain Neural Signal Processing Using an Apache Hadoop Computing Cluster - Part II
  • Rat Brain Neural Signal Processing Using an Apache Hadoop Computing Cluster - Part III
  • Building case-control studies with Apache Hadoop
  • Using Apache Hadoop to Find Signals in Noise - Drug Adverse Event Analysis
  • Evolution of the Hadoop Ecosystem - AOL Ad Experience
  • High Energy Hadoop
  • Seismic Data Science: Reflection Seismology and Hadoop


1 http://www.cloudera.com/content/cloudera/en/resources/library.html?category=cloudera-resources:why-cloudera/case-studies
2 http://blog.cloudera.com / blog / category / use case /

MapR has developed a highly differentiated Hadoop distribution with many benefits over other distributions. Here are some examples:

  1. NFS. The ability to mount the cluster on NFS. Your applications can now write directly to the cluster. You can use all the file-based applications that have been developed over the last 20 years, from BI to file browsers and command line utilities (eg grep, rsync).
  2. Transparent compression. MapR automatically compresses data that is not yet compressed. You no longer have to compress yourself before copying to the cluster. You no longer have to
Keep reading

MapR has developed a highly differentiated Hadoop distribution with many benefits over other distributions. Here are some examples:

  1. NFS. The ability to mount the cluster on NFS. Your applications can now write directly to the cluster. You can use all the file-based applications that have been developed over the last 20 years, from BI to file browsers and command line utilities (eg grep, rsync).
  2. Transparent compression. MapR automatically compresses data that is not yet compressed. You no longer have to compress yourself before copying to the cluster. You no longer have to run a MapReduce indexer job (DistributedLzoIndexer). You no longer have to use a special InputFormat (like LzoTextInputFormat, which prevents you from using other useful InputFormats).
  3. An integrated monitoring and heat map interface that makes it easy to detect and treat problems with the cluster.
  4. HA distributed. The MapR distribution provides distributed HA. This is not available in any other distro (they all suffer from single points of failure, both in NameNode and JobTracker), and as far as I know, it is not on any other distro's roadmap.
  5. Snapshots With other Hadoop distributions, if a user accidentally deletes a file, the deletion is replicated three times and data is lost. The same applies to application data corruption. Many Hadoop users, including Yahoo! They have lost valuable data due to such incidents. With MapR, you can easily retrieve your data to a point in time, just as you would on any enterprise-class file system (eg, NetApp) or database (eg, Oracle). MapR snapshots are similar to NetApp snapshots in that the underlying storage services use redirection on write, so there is no storage or performance penalty (you can take a petabyte snapshot in seconds).
  6. Reflecting. With MapR you can mirror data between clusters. This is useful for disaster recovery or remote backup, as well as for scenarios where you need to have a production cluster and a research cluster. MapR duplication is differential, so only the actual deltas are transferred each time (and automatically compressed).
  7. Data location control. With MapR, you can control the location of your data. This allows you to be more efficient in the way you use hardware. For example, if you have some nodes with more drives than others, or some nodes with more memory / CPU than others, you can specify which subset of your data should live in each node set.

In addition to all the features, MapR distribution is 2 to 5 times faster, not only on standard benchmarks like terasort and DFSIO, but more importantly on typical Hadoop workloads (see video at http : //www.mapr.com/resources /introducing-m3-and-m5-editions.html for real users talking about their experience). Due to this performance advantage, most Hadoop users can reduce their hardware (and power) costs by 50% simply by using the MapR layout.

After a long beta period, MapR is now publicly available and many companies (eg comScore, Narus / Boeing) have already migrated to this distribution and are running it in production. Additionally, EMC recently decided to enter the Hadoop market with a distribution and chose MapR technology because of all these capabilities.

Finally, a few words about the future. The MapR engineering team is currently working on the next version of MapR, which will be as powerful as the first. Unfortunately, I can't say much at this point, but you can certainly expect the gap to grow rapidly over the next 6 to 12 months.

(Regarding Allen's answer, MapR will certainly support MAPREDUCE-279 once it is ready. At this point MAPREDUCE-279 is not ready and therefore not available in any Hadoop distribution, including MapR. ).

Other Guides:


GET SPECIAL OFFER FROM OUR PARTNER.