SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
SAP HANA Overview
+
-
SAP HANA Architecture
+
-
SAP HANA Studio
+
-
Reporting in HANA
+
-
SAP HANA Text analysis
+
-
SAP BW on HANA
+
-
Miscellaneous
SAP HANA Text Analysis using Twitter Data

In this tutorial, we are going to do following things.
  1. Use the Twitter API to get the tweets
  2. Save the tweets into SAP HANA system using JDBC connection
  3. Run the Text Analysis in HANA on top of the tweets.
After this tutorial, you will be able to learn:
  • SAP HANA integration with Twitter
  • Program with SAP HANA using JDBC in Java language
  • SAP HANA Text Analysis

Prerequisites:

Register an Application at Twitter Developers:
As we are going to use the Twitter API to extract the data from Twitter, it is required to create an application at Twitter Developer and we will need the authentication information of the application and use them to invoke the APIs later.

In case you haven't use Twitter before, you need to create your twitter account firstly.
You can register an application and create your oAuth Tokens at Twitter Developers by following below steps.
  1. Logon with your twitter account, click your profile picture and click on the "My applications".

    SAP HANA Text Analysis on Twitter
  2. Click on the button "Create a new application".

    SAP HANA Text Analysis on Twitter
  3. Provide the information. You can give any name and description of your choice.

    SAP HANA Text Analysis on Twitter
  4. Follow the instructions and finally click on "Create your Twitter application"
  5. Scroll down the screen and you will see the button "Create my access token", click it to generate the token.

    SAP HANA Text Analysis on Twitter
  6. After that, you will be able to see the oAuth settings like below, save the values of Consumer Key, Consumer secret, Access token and Access token secret.

    SAP HANA Text Analysis on Twitter

    SAP HANA Text Analysis on Twitter

Download Twitter API Java library - Twitter4J
Twitter4J is an unofficial open source Java library for the Twitter API. With Twitter4J, you can easily integrate your Java application with the Twitter services.
The link to download it is http://twitter4j.org/en/index.html

Download "twitter4j-3.0.5.zip" and save it. We will need it later.


Prepare the HANA jdbc library
In order to access SAP HANA from java, we will need the jdbc library, which you can find it at
  • C:Program FilesSAPhdbclientngdbc.jar in windows
  • and /usr/sap/hdbclient/ngdbc.jar in Linux.


Download Eclipse IDE for Java Developers
In this exercise, we will use Eclipse IDE for Java Developers to run the Java Project.
You can add the Plugins in your HANA Studio or directly download the new IDE from here.

Now we are ready!! Let's fetch data from Twitter and save it in HANA.


Create a column table in HANA:

Before running the Java program, we need to create a table in HANA, where we want to store the tweets we fetched from the twitter services.
Copy and paste below script in SQL editor and execute.
Note: You need to replace the <SCHEMA_NAME> with your own schema.

CREATE COLUMN TABLE <SCHEMA_NAME>.TWEETS(
      "ID" INTEGER NOT NULL,
      "USER_NAME" NVARCHAR(100),
      "CREATED_AT" DATE,
      "TEXT" NVARCHAR (140),
      "HASH_TAGS" NVARCHAR (100),
      PRIMARY KEY("ID")
);

CREATE SEQUENCE <SCHEMA_NAME>."TWEET_SEQUENCE"
     INCREMENT BY 1 START WITH 1 NO CYCLE;

Create and configure JAVA program:

  1. Download the JAVA Project "TwitterAnalysis.zip" from here and save it to your local computer.
  2. Open JAVA Eclipse and create a Java project called "TwitterAnalysis".
    SAP HANA Text Analysis on Twitter


    SAP HANA Text Analysis on Twitter
  3. Go to File -> Import and select "Archive File"
    SAP HANA Text Analysis on Twitter
  4. Click on browse and select the "TwitterAnalysis.zip" file you downloaded in step 1. Click on finish.
    SAP HANA Text Analysis on Twitter
  5. Now you will be able to see the project with the structures like this:
    SAP HANA Text Analysis on Twitter

Understanding the Java Project:

TwitterConnection.java
Build the connection to twitter services

HDBConnection.java
Build the jdbc connection to HANA

Configurations.java
The public interface for the network, twitter authentication configurations, override it by your own account or settings

Tweet.java
The java bean class for the tweet objects

TweetDAO.java
The data access object

ngdbc.jar
SAP HANA jdbc library

twitter4j-core-3.0.3.ja
Twitter4j library for twitter services in java


Update the configurations

In the purpose to maintain the configurations easily, we put all the required information in a single interface and it is mandatory for you update it with your own account or settings before you can connect to either HANA or Twitter.

Open the file Configurations.java in your project. Basically, there are 4 category of setting you can override:
Network Proxy Settings:
The proxy host and port, set the HAS_PROXY as false if you do not need to use proxy.
To get the proxy host is, open command prompt and type "ping proxy". This will show you proxy host.
SAP HANA Text Analysis on Twitter

SAP HANA Text Analysis on Twitter

HANA Connection Settings:
Replace the HANA URL with your own HANA host and port, user, password and the schema where you created your table.
SAP HANA Text Analysis on Twitter

Twitter Authentication Settings:
Replace with your own authentication information from your twitter application as described in the prerequisites.

Search Term:
We will search the twitter based on the search term "HANA Training" and we want to know what people were talking around the HANA Training in twitter. You can always replace it with your own term if you are interested in other topics.


Test Connection to Twitter

Once have the twitter authentication maintained correctly in the previous step. You can open TwitterConnection.java and run it.

You will see the message "Connection to Twitter Successfully!" following with your twitter user id in the console as the screenshot shows below.
SAP HANA Text Analysis on Twitter

Test Connection to SAP HANA

Now let us open the file HDBConnection.java and run it.
You will see the message "Connection to HANA Successfully!" in the console as the screenshot shows below.
Check the Configurations.java if you encountering any issue.
SAP HANA Text Analysis on Twitter

Invoke Twitter API and save the tweets into HANA:

Now it's time to the do the real stuff. Open the file SearchTweets.java and run it, which will search the tweets based on the search term we specified in the Configurations.java and everything we got will saved to HANA table.
You will see the messages in the console indicate the tweets have been inserted to HANA successfully like the screenshot shows:

SAP HANA Text Analysis on Twitter

After that, you can run the data preview in HANA studio and see the contents of the table TWEETS in your schema like this:

SAP HANA Text Analysis on Twitter

Run text analysis in HANA:

Now we already have the tweets stored in the HANA table. The next step, we are going to run the text analysis to see what people are talking around the "HANA Training" in twitter.

To run the text analysis, the only thing we need to do is create a Full Text index for the column of the table we want to analysis and HANA will process the linguistic analysis, entity extraction, stemming for us and save the results in a generated table $TA_YOUR_INDEX_NAME at the same schema.
After that, you can build views on top of the table and leverage all existing analysis tools around HANA to do the visualization even the predictive analysis.

Copy the SQL statement and execute it in SQL console:
Note: Replace the <Scheme_Name> with your own Schema

Create FullText Index <Scheme_Name>."TWEETS_FTI"
On <Scheme_Name>."TWEETS"("TEXT")
TEXT ANALYSIS ON
CONFIGURATION 'EXTRACTION_CORE';

You will see Full-Text Index $TA_TWEETS_FTI under your schema.
In case you don't see that try to refresh the folder.

SAP HANA Text Analysis on Twitter

Text Analysis is done!! Yes it was that simple.


Do the data preview of $TA_TWEETS_FTI and to the Analysis tab.
Select the chart type as "Other" - "Tag Cloud" to have a better view.

SAP HANA Text Analysis on Twitter

Reference: This example was taken from SAP Startup Focus Program.
If you are from a startup, interested in developing on top of the in-memory database and application platform SAP HANA, then you may check the SAP Startup Focus program for help.


Support us by sharing this article.



Explore More
Close X
Close X

13 thoughts on “sap hana text analysis Using Twitter Data

  1. Chinmay Padhi says:

    Hi

    While running TwitterConnection.java,i am getting error ‘Selection does not contain a main type’.
    *I did clean and build

    • Admin says:

      Hi Chinmay,
      This is probably because your project is not built properly. There is one work around which may work for you.
      Create a new project and copy the files manually to the new project. Then most probably this problem will be gone.

  2. Asif says:

    I’m having issues with this tutorial and the error message when trying to run TwitterConnection.java:
    message – SSL is required
    code – 92

    Anyone know how to resolve this?

  3. Dahmani says:

    Hi,

    I’m analysing Twitter Data with SAP HANA,I can work only with Tweets from users that I follow, can I search tweets with a keyword in all Twitter (not only from users that I follow)?

    • Admin says:

      Hello,

      You would find authorization issue while doing it, but yes if you have the complete privilege of performing this, you could.

      Thanks,
      Admin

  4. gayatri says:

    how to add ngdbc.jar in my hana studio in windows7

    • Admin says:

      Hello,

      Send us the screenshots at “admin@saphanatutorial.com” along with mentioning steps you were following.
      The screenshots would help us to understand better.

      Thanks,
      Admin

  5. gayatri says:

    i have ngdbc file but now when i run my project the selection does not contain main type error is occure

  6. priyanka says:

    when i run SearchTweets.java following error is occure
    Connection refused: connect
    Relevant discussions can be found on the Internet at:
    http://www.google.co.jp/search?q=d35baff5 or
    http://www.google.co.jp/search?q=12c94143
    TwitterException{exceptionCode=[d35baff5-12c94143 43208640-465ee2e3], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
    at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:192)
    at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61)
    at twitter4j.internal.http.HttpClientWrapper.get(HttpClientWrapper.java:81)
    at twitter4j.TwitterImpl.get(TwitterImpl.java:1835)
    at twitter4j.TwitterImpl.search(TwitterImpl.java:282)
    at com.saphanatutorial.search.SearchTweets.search(SearchTweets.java:34)
    at com.saphanatutorial.search.SearchTweets.main(SearchTweets.java:58)
    Caused by: java.net.ConnectException: Connection refused: connect
    at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.http.HttpClient$1.run(Unknown Source)
    at sun.net.www.http.HttpClient$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.http.HttpClient.privilegedOpenServer(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.http.HttpClient.(Unknown Source)
    at sun.net.www.http.HttpClient.New(Unknown Source)
    at sun.net.www.http.HttpClient.New(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at java.net.HttpURLConnection.getResponseCode(Unknown Source)
    at twitter4j.internal.http.HttpResponseImpl.(HttpResponseImpl.java:34)
    at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:156)

  7. Anurag kulkarni says:

    when i try to create a table i’m getting error . i’m using trial landscape database . i have created a schema in hana cloud cockpit and able to access it through sap hana development with my trial account , but when i create a table i’m getting the following errror ??? please help

    Could not execute ‘CREATE COLUMN TABLE Twitter.TWEETS( “ID” INTEGER NOT NULL, “USER_NAME” NVARCHAR(100), “CREATED_AT” …’
    SAP DBTech JDBC: [362]: invalid schema name: STUDENTS: line 1 col 21 (at pos 20)
    Could not execute ‘CREATE SEQUENCE Twitter.”TWEET_SEQUENCE” INCREMENT BY 1 START WITH 1 NO CYCLE’
    SAP DBTech JDBC: [362]: invalid schema name: STUDENTS: line 1 col 17 (at pos 16)

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy