First we need to load raw weblogs data together with customer and product data information into Hadoop HDFS. It would contain user data in multiple format from many places.
- Omniture logs:
Website log files containing information such as URL, timestamps, IP address, geo-coded IP address, and user ID (SWID).
- User Log:
CRM user data listing SWIDs (Software User IDs) along with date of birth and gender.
- Product logs:
CMS data that maps product categories to website URLs.
Download the sample data from the below links.
Upload it into the file browser in the same way you uploaded the csv file in Chapter 3.
After this you should be able see all 3 zip data file in File Browser. Omniture.0.tsv.gz, users.tsv.gz, products.tsv.gz