Competition in the eCommerce market is fierce. With numerous brands running the rat race to acquire more and more shelf space and market share, it is crucial for these e-retailers to do everything they can, to pursue their customers and stand out of the crowd. One of the best ways they believe, which is true; is to provide consumers with a personalized shopping experience.
For tailoring such experience, a huge amount of data is required to be collected and managed to glean customer insights. The massive amount of click-stream and transactional data is what makes data collection, cleansing, transformation, and management; a challenge for the eCommerce domain. This is an effort to put across few of the data mining, data cleansing, and data processing challenges in the eCommerce industry.
Data collection challenges in the eCommerce domain
1. User sessions and storage requirements
5-10 million page views on an average, is on the lower end that large eCommerce websites generate any given day. This tends to nearly double up during festive seasons. Logging these user sessions in backend servers, while managing the storage space to handle such humongous data load, cost-effectively is one of the primary challenges. Sampling at the source is what a lot of eCommerce players have been resorting to lately. Sampling clickstream collection has proved its worth in effectively addressing both these issues.
This remedy, too, has some hidden challenges. Sampled data is not available for accurately capturing erratic events where the customer has searched for a particular term or may be a credit card authorization failure and its reasons. To add more, exact statistics for payment for advertising click-through referrals is also not conveniently available.
2. Ever changing customer demographics
People get married, their children grow, salaries change, their residential and official addresses change and so on so forth. Along with these changes; customers’ needs change too. They do not remain the same as they were modeled at the time of strategic planning. How should eCommerce companies keep track of these changes? Data collection experts, who not only do in-depth web research to collect data but also cleanse, validate and classify that data at regular intervals is what these e-retailers need.
3. Data mining algorithms
In order to make searches and product filtering a convenience, eCommerce players tag their products in the inventory with attributes like color, size, and weight. These are very useful elements for data mining purpose as they may be used to find generalizations and patterns in user behavior based on the attributes.
Not all attributes that affect user behavior for a class or product are searchable, i.e., size is fine for clothes and shoes, but what about books? Books would have a “Null” value for size attribute, where ‘Null” is not applicable and not unknown. This makes it necessary to treat it differently, with help of meta-data. Not many data mining algorithms can correctly accommodate this subtle difference.
Data cleansing challenges in the eCommerce domain
1. Inaccurate clickstream measurement
Bots, or as we say spiders and crawlers, are programmed to automatically visit websites. The thing that needs attention here is that depending on the kind of traffic generated by web search engines (like Google), site monitoring software, and price and email harvesters, bots can dramatically change click-stream patterns at a website. Also, the possibility that these bots may skew up the clickstream statistics cannot be ruled out.
Page views when bot visits are included increase by 1.5 to 2 times, as compared to average page views when bot visits are excluded. E-commerce websites receive anywhere between 5 to 40% of their traffic due to bots and various other automated traffic sources. These bots do not identify themselves as bots and pretend to be real visitors, creating hurdles for clickstream analytics. Performing heuristic and manual labeling on a continuous basis is the only way eCommerce sites can filter bot visitors from skewing the clickstream analysis.
2. De-duplicating customer accounts regularly
Use of transactional systems is increasing, in spite of the fact that check to prevent duplication of customer records is a challenge in here. Several businesses have customers with multiple accounts and it is one of the pains to consolidate all activities performed by one customer in a single record.
This was from the customer end. Now walks into the picture, the challenge of managing data, where same kiosk or workstation is used by multiple people to log on to a website. If the website or platform is not adequately equipped with tracking parameters to uniquely identify every individual separately, there are chances that clickstream analysis of one of the user could be contaminated with information from several other users. Raw data collected from eCommerce sites requires not only de-duplication but comprehensive data cleansing for the assortment of standardization, validation, and correction; to maximize its integrity, value and improve its quality.
Data processing challenges in the eCommerce domain
1. Algorithms do not scale to large hierarchies
Practice makes support to hierarchical attributes, mandatory. Algorithms though designed with the latest strategy, fail to scale to large hierarchies. Automating the process of utilizing hierarchies efficiently is still a challenge and the eCommerce domain is struggling with it.
2. Drive the average order value up
E-commerce websites cannot keep their hands off from promoting “related products” to drive the average order value up; the most common strategies they put at work. Data processing algorithms make use of customer journey analytics to map the purchasing behavior of every customer and identify purchasing patterns. To attain desired results they extrapolate the purchasing history of other customers who made similar purchases. This is where that human intervention or the human instinct of data scientists could precisely predict the kind of products shoppers may find interesting.
Aforementioned challenges should be addressed to have that undue advantage over the competition. Resolving these will help both, the service providers and the customers in several ways. Increasing number of eCommerce websites needs accurate data management processes as an assortment of web research for data collection, to data entry, and data processing, categorization, and validation. Data management and analytics experts assisting these websites should be equipped with deep industry knowledge & scalable operations, to deliver data management solutions that answer real business challenges.