SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
SAP HANA Overview
+
-
SAP HANA Architecture
+
-
SAP HANA Studio
+
-
Reporting in HANA
+
-
SAP HANA Text analysis
+
-
SAP BW on HANA
+
-
Miscellaneous
Implement SAP HANA Text Analysis in 10 minutes



In SAP HANA Text Analysis - One of the coolest features of SAP HANA we explained what is Text Analysis and why it is so important for business now-a-days.
In this article we will show you how you can easily implement Text Analysis in SAP HANA.

Use-case:

Suppose I am planning to buy a new iPhone 5 and I want to know the review of this over internet. I wanted to get a pulse of the iPhone 5 before I buy it not just from the critics but actual users like me. I also want to search the blogs, news and social media to find out whether people's review are positive, negative or neutral.
Lets see how we can do this with the help of SAP HANA Text Analysis.

Prerequisites:

Download unstructured data (iPhone-News.pdf)
To save time, I have created a pdf file which contains news and blog articles on iPhone 5. Download this from here .

Create Table in SAP HANA
Create a table in SAP HANA which will contain this unstructured data. Replace <SCHEMA_NAME> with your schema.
CREATE COLUMN TABLE <SCHEMA_NAME>."IPHONE_NEWS" (
          "File_Name" NVARCHAR(20),
          "File_Content" BLOB ,
          PRIMARY KEY ("File_Name"));

Upload pdf file to SAP HANA using Python
Use the below Python code to upload pdf file to SAP HANA.

Note: Check below article to configure Python before running the Python code.
Power of Python Integrated with SAP HANA

import dbapi
# assume HANA host id is abcd1234 and instance no is 00
# and SAP HANA user id is USER1 and password is Password1
conn = dbapi.connect('abcd1234', 30015, 'USER1', 'Password1')

#Check if database connection was successful or not
print conn.isconnected()

#Open a cursor
cur = conn.cursor()
#Open file in read-only and binary
file = open('iPhone-News.pdf', 'rb')
#Save the content of the file in a variable
content = file.read()

#Save the content to the table - Replace SCHEMA1 with your schema
cur.execute("INSERT INTO SCHEMA1.IPHONE_NEWS VALUES(?,?)", ('iPhone-News.pdf',content))
print 'pdf file uploaded to HANA'

#Close the file
file.close()
#Close the cursor
cur.close()
#Close the connection
conn.close()

After executing the above Python script the pdf data will be uploaded in HANA table.

Implement Text Analysis in SAP HANA:

The most impressive thing about Text Analysis is how easy it is to implement it.
The only thing we need to do is run the following statement:

Create FullText Index "PDF_FTI" On <SCHEMA_NAME>."IPHONE_NEWS"("File_Content")
TEXT ANALYSIS ON
CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER';

This will create a full text index called "PDF_FTI" (you can use any name) on the BLOB column "File_Content" of the table "IPHONE_NEWS".
With the execution of this script a new column table is created called $TA_PDF_FTI ($TA_<Index_Name>) that contains the result of our Text Analysis Process.

Note: If you do not see this table under your schema, try to refresh that.

SAP HANA Text Analysis

That's it. Yes, Text Analysis is implemented. Rest everything is done by SAP HANA.

Further Analysis:

2 columns of the table $TA_PDF_FTI is very important for us.
TA_TOKEN
This column contains the extracted entity or element (for example, an identifiable person, place, topic, organization, or sentiment).
TA_TYPE
This is the category the entity falls under. For example PERSON, PLACE, PRODUCT etc.

To know people's review and sentiments about iPhone, we can query the table $TA_PDF_FTI like this.

SELECT "TA_TYPE", ROUND("SENTIMENT_VALUE"/ "TOTAL_SENTIMENT_VALAUE" * 100,2) AS "SENTIMENT_VALAUE_PERCENTAGE"
FROM
(
SELECT "TA_TYPE", SUM("TA_COUNTER") AS "SENTIMENT_VALUE"
     FROM <SCHEMA_NAME>."$TA_PDF_FTI"
    where TA_TYPE in('WeakPositiveSentiment','StrongPositiveSentiment','NeutralSentiment',
       'WeakNegativeSentiment','StrongNegativeSentiment','MajorProblem','MinorProblem')
    GROUP BY "TA_TYPE"
) AS TABLE1,

(
SELECT SUM("TA_COUNTER") AS "TOTAL_SENTIMENT_VALAUE"
    FROM <SCHEMA_NAME>."$TA_PDF_FTI"
    where TA_TYPE in('WeakPositiveSentiment','StrongPositiveSentiment','NeutralSentiment',
       'WeakNegativeSentiment','StrongNegativeSentiment','MajorProblem','MinorProblem')
) AS TABLE2

You will get the output like this.
SAP HANA Text Analysis

The result shows that more percentage of people are giving positive review of this product.
Good, now i can go ahead and buy my new iPhone 5.

What's Next:

We can use this full text index table to get a lot of information other than just sentiments.
Lets take a look into the structure of this table.
Column Name Key Description Data Type
File_Name Yes This is the primary key of my table. If you have more than one column in your primary key, the $TA table will include every single column Same as in source table. In this case: NVARCHAR(20)
RULE Yes Stores the rule package that yielded the token. In my case: "Entity Extraction" NVARCHAR(200)
COUNTER Yes Counts all tokens across the document BIGINT
TOKEN No The token that was extracted (the "who", "what", "where", "when" and "how much") NVARCHAR(250)
LANGUAGE No You can either specify a language column when you create the fulltext index or it can be derived from the text. In my case it was derived from the text and is English (en) NVARCHAR(2)
TYPE No The Token Type, whether it is a "who", a "what", a "where", etc. NVARCHAR(100)
NORMALIZED No Stores a normalized representation of the token. This becomes relevant e.g. for German with umlauts, or ./ss. Normalization with regards to capitalization would not be as important as to justify this column. NVARCHAR(250)
STEM No Stores the linguistic stemming information, e.g. the singular nominative for nouns, or the indicative for verbs. If text analysis yields several stems, only the first stem will be stored, assuming this to be the best match. NVARCHAR(300)
PARAGRAPH No The paragraph number where my token is located in the document INTEGER
SENTENCE No The sentence number where my token is located in the document INTEGER
CREATED_AT No Creation timestamp TIMESTAMP


Hope you liked this article. If you have any question please leave a comment.
Continue reading:





Support us by sharing this article.



Explore More
Close X
Close X

5 thoughts on “Implement SAP HANA Text Analysis in 10 minutes

  1. RichardLi says:

    Hi,
    Thanks for sharing this nice and neat piece for Text Analysis in HANA.
    I noticed there is this parameter ‘EXTRACTION_CORE_VOICEOFCUSTOMER’ in below SQL script which seems to be the key on how to analysis the given text content. Is pre-defined by HANA? Otherwise can you please go a little bit detail into how & where to build this parameter?

    Create FullText Index “PDF_FTI” On .”IPHONE_NEWS”(“File_Content”)
    TEXT ANALYSIS ON
    CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’;

    Thanks & Regards,
    Richard

  2. sipun says:

    Is there any alternative to upload file not using python?

  3. Kuldeep says:

    Hi,
    I have resume paragraph, when an applicant has mentioned some technical/domain skill names. Now from that text we need to fetch, what employee knows and is s/he suitable for job description or not. Resume paragraph looks like as below. How we can use HANA to get relevant information
    “20 Years of IT Experience covering all the phases of software development life cycle, with experience ranging from Business analysis,Design, Development , Deployment and post production support in the areas of Web based enterprise applications. I worked around 6 yrs at onsite for major Clients like x, y, z in US & UK. I have extensive experience in onsite coordination, business requirements gathering and subsequent conversion into tangible software deliveries. I have significant hands on experience in Building Enterprise Web based Applications, Client – Server technology(n-tier architecture), requirements, analysis to design, development, and implementation of software applications. Worked and managed projects of different size through onsite-offshore model. I have managed team and have analytical and domain skills in People Supply, Logistics,Finance, Human Resources, Logistics, Physical Security domains. I have worked extensively on Java and oracle based projects.”

  4. LucaTo says:

    Amazing, very well done and really useful. Congratulation.

  5. Aditya says:

    Hello,

    Amazing Info.. but one question can we create the index table using configuration file in some other way other than SQL Console.?

    Thanks & Regards,
    Aditya

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy