Leveraging Full Potential of the Data Lake

Geared to serve a large biopharma team with a wide range of data transformation and wrangling functions to slice and dice the data.

faster pipeline creation


reduction in deploy time

“The visual wrangling feature and ability to save workflows greatly improved collaboration within our team. Reuse has drastically increased and our time to insight has gone down significantly.”


Data Science teams at a large biopharma organization face multiple challenges while creating data packs, which is a prerequisite for their models. The prolonged data pack creation timeline occurs due to the lack of an intuitive way of building data packs and dependency on advanced skill sets.

Data Analysts spend a lot of time on code-driven feature engineering and exploratory data analysis as data profile availability is low.

They also have additional challenges like;

  • Pulling data from varied data sources
  • To be on par with the technical skills to build a model or generate desired insights
  • Maintaining and tracking end-to-end processes from building pipelines to scheduling them
  • Collaborating b/w fellow data scientists on the developed pipelines
  • Keeping the infra cost lower through the process


D Cube Analytics has customized a product with a lot of additional features to not only address the foregoing challenges but also to spread across governance and security of the data, and we call it DDS IRIS™.

DDS IRIS™ solution components are based on the following major pillars;

  1. Infrastructure & Security
  2. Data Management
  3. Collaboration


  • Building data pipelines with an intuitive UI helps Data onboarding, Pipeline creation, and Scheduling the pipelines more quickly and efficiently. Heavy lifting and processing of the data is achieved by leveraging the power of Apache Spark running on Amazon EMR. It can leverage Databricks too. With DDS IRIS™, the Pipeline creation was 3X times faster which turned out to be a big boost for the customer
  • Processed data was persisted in Amazon Simple Storage Service and Databricks. We can write data to Amazon Redshift as well. DDS IRIS™ also enables user to define optimization techniques like partitioning, file compressions with just click of buttons. This meant that even non-tech savvy users could build efficient data wrangling pipelines. This was a big win for the client who already had a big team of analysts specializing in pharma domain
  • To avoid wait time on reporting while published layer was still under construction, we leveraged DDS IRIS™ to export curated data sets directly to Tableau server. This capability by-passes multiple hops and intermediate storage and makes data available in the reporting server in a short time
  • DDS IRIS™ comes bundled with a wide range data transformation and wrangling functions to slice and dice the data as the user wants. These functions perform at scale as it pushes down the computation to Amazon EMR. Here again we could use Databricks too. The key differentiator here was that we could even build some key domain specific and client specific computations into DDS IRIS™ to be used by broader team
  • Users can bring in their created SQL workflows and onboard them on DDS IRIS™ as simple as few clicks and will be able to share it with the desired user with ease
  • DDS IRIS™ gives you the end-to-end picture of the entire scope of the workflow from creation, publishing results, and scheduling workflow all in one place

Related Case Studies

Data Management That’s Truly Next-Gen

Data Management That’s Truly Next-Gen

PRODUCTData Management That’s Truly Next-Gen A large pharma company might require highly customized data warehouses with reporting capabilities to support a multitude of teams. A smaller company about to launch an asset might require a subset of these capabilities...

read more