Blogs

Analyze clickstream data with IBM EventStore for customer insights

Post Comment
Software Architect - IBM Analytics, IBM

This is final blog in the 3-part series on IBM EventStore. Part 1 introduces IBM EventStore, and part 2 describes details on ingesting event data into IBM EventStore using the scenario of Clickstream analysis for a retail business. In this blog, we will look at analyzing the Clickstream data with IBM EventStore to derive timely insights on interests of retail customers. 

Typically, ingesting streaming event data, persisting with low latency and analyzing it along with historical event data requires integrating multiple analytic systems. IBM EventStore is purpose built to simplify the complexity of harnessing event data with a single system. Its unique architecture enables analytics on both — the just arrived event data and historical event data. 

Analyze Web Events using IBM EventStore OLAP API

As the web events are ingested and persisted they can be analyzed using EventStore OLAP API in Scala. IBM EventStore is integrated with IBM Data Science Experience in a single distribution. IBM Data Science Experience offers the Notebook environment which data scientists and data analysts can use to interactively explore, analyze and visualize the event data in IBM EventStore. 

The Scala notebook built to query and visualize web metrics based on a sample Clickstream dataset is available in Github repository

Here are the steps and insights derived from analyzing Clickstream data for CYBERSHOP. 

  1. Connect to the database in EventStore and access web events data from a notebook

  1. Query and aggregate web metrics across all product lines

The Customer browsing behavior is best understood with two metrics ‘page views’ and ‘time on page’. The ‘page views’ indicate how many times customers have visited the page and ‘time on page’ indicates how much time was spent exploring the content. 

The OLAP queries aggregate these metrics from Clickstream data of all users. Here are the visualized metrics from query results. 

The visualized metrics show ‘Smart Phones’ are the leading category of interest across all product lines. 

  1. Drill down to ‘Smart Phones’ category and aggregate metrics for all products with a new query

The OLAP query aggregates metrics for all products in ‘Smart Phone’ category. Here are the visualized metrics from query results.

 

The results indicate ‘A-phone’ is the leading product of interest in ‘Smart Phones’. 

Though ‘X-Phone’ trails in ‘page views’ metric compared to other phones, from the relatively high ‘time on page’ we can infer that there is high interest in X-phones among a section of customers. 

  1. Drill down to ‘Features’ of ‘A-Phone’ and aggregate metrics for all features with a new query

The OLAP query aggregates metrics for all features in ‘A-Phone’.  Here are the visualized metrics from query results.

 

 

From the visualized metrics, we can infer ‘Color’ and ‘Camera’ are leading features of interest in ‘A-Phones’ among all customers. 

  1. Drill down to individual user and aggregate metrics with a new query

Moving beyond aggregated metrics across all users, the queries can target data of individual users for different timelines. Drilling down to user level data helps understand the interests of the individual customer. 

Here are visualized metrics of user ‘David’ for the past week.

The interactive visual allows to select timelines of each day to understand what product pages ‘David’ has visited and how much time he spent exploring these products. From the visualized results, we can infer user ‘David’ is a repeat visitor and has spent significant time exploring ‘Smart Phones’ in the last few days. 

The analysis of recent and historical web events using IBM EventStore gives CYBERSHOP insights into the trending interests across the general customer base and individual users as well.  These insights will help the business to target customers with personalized promotions.

Conclusion

IBM EventStore offers a well-integrated system to ingest, persist and analyze event data at scale. The system supports high speed ingests while enabling high performance analytics on most recent and historical event data. Delivered on Docker containers and Kubernetes engine, the system offers ease of deployment and elastic scale that is synonymous with the cloud experience. Integrated with IBM Data Science Experience, EventStore enables data scientists to use the familiar notebook interface to explore, analyze, visualize and build ML models with event data. 

In summary, with IBM EventStore enterprises can harness their event data with ease in on-premises environments without the necessity to expose data to public Cloud services.

Resources

The sample dataset and the notebooks shared in the blog are available here.  A short video demo of the use case is available on YouTube. For a free technology preview of IBM EventStore, download it from https://www.ibm.com/us-en/marketplace/project-eventstore

Acknowledgements

Thanks to my colleagues Loic Julien, Adam Storm, David Thomason and Avijit Chatterjee for their key contributions in implementing the use case.