SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
SAP HANA Overview
+
-
SAP HANA Architecture
+
-
SAP HANA Studio
+
-
Reporting in HANA
+
-
SAP HANA Text analysis
+
-
SAP BW on HANA
+
-
Miscellaneous
How to use the Fuzzy Search in SAP HANA



In this article we will talk about
    • What is Fuzzy Search?
    • Why Fuzzy Search is important?
    • Real Time Example of Fuzzy Search Based Applications.
    • How to Implement Fuzzy Search in SAP HANA?

What is Fuzzy Search?

Also known as approximate string matching.
Fuzzy search is the technique of finding strings that match a pattern approximately (rather than exactly).
It is a type of search that will find matches even when users misspell words or enter in only partial words for the search.
purpose:
With the help of Fuzzy Search Misspellings and typos still provide relevant results.

A Real World Example:
If a user types "SAP HANA Tutorl" into Yahoo or Google (both of which use fuzzy matching), a list of hits is returned along with the question, "Did you mean "SAP HANA Tutorial"?"

Fuzzy Search in SAP HANA:

In SAP HANA, you can call the fuzzy search by using the CONTAINS predicate with the FUZZY option in the WHERE clause of a SELECT statement.
Syntax:
SELECT * FROM <tablename>
WHERE CONTAINS (<column_name>, <search_string>, FUZZY (0.8))

A search with FUZZY(x) returns all values that have a fuzzy score greater than or equal to x.

The SCORE() Function
The fuzzy search algorithm calculates a fuzzy score for each string comparison. The higher the score, the more similar the strings are. A score of 1.0 means the strings are identical. A score of 0.0 means the strings have nothing in common.

You can request the score in the SELECT statement by using the SCORE() function.

You can sort the results of a query by score in descending order to get the best records first (the best record is the record that is most similar to the user input). When a fuzzy search of multiple columns is used in a SELECT statement, the score is returned as an average of the scores of all columns used.

So not only does it find a "fault tolerant" match, it also puts a score behind it.

Example:
When searching with 'SAP', a record like 'SAP AG' gets a high score, because the term 'SAP' exists in the texts. A record like "BSAP Corp" gets a lower score, because 'SAP' is only a part of the longer term 'BSAP Corp'.

Create the table and data:
-- REPLACE <Schema_Name> WITH YOUR SCHEMA NAME
CREATE COLUMN TABLE <Schema_Name>.COMPANIES(
               ID INTEGER PRIMARY KEY,
               COMPANY_NAME SHORTTEXT(200) FUZZY SEARCH INDEX ON);

INSERT INTO <Schema_Name>.COMPANIES VALUES (1, 'SAP');
INSERT INTO <Schema_Name>.COMPANIES VALUES (2, 'SAP in Walldorf');
INSERT INTO <Schema_Name>.COMPANIES VALUES (3, 'SAP AG');
INSERT INTO <Schema_Name>.COMPANIES VALUES (4, 'ASAP Corp');
INSERT INTO <Schema_Name>.COMPANIES VALUES (5, 'BSAP orp');
INSERT INTO <Schema_Name>.COMPANIES VALUES (6, 'IBM Corp');

Perform the search on one column:
SELECT SCORE() AS score, * FROM <Schema_Name>.COMPANIES
        WHERE CONTAINS(COMPANY_NAME,'SAP',
                FUZZY(0.7,'textSearch=compare,bestMatchingTokenWeight=0.7'))
        ORDER BY score DESC;

The output of fuzzy search contains 5 entries. Based on the fuzzy search factor (which is 0.7 in this case), it will also consider the similar words. In this case "SAP AG", "BSAP orp" etc.

SAP HANA Fuzzy Search

A Real Time Example of Fuzzy Search:

Use Case
A call center agent who receives an order by phone needs to know the customer number or, in the case of a new entry, the system has to inform him about a potentially duplicate entry.
There are chances that name can be misspelled or there can be different person with same name but different spellings. For example "Jimi Hendricks" can be misspelled as "Jimy Hendricks" or "Jimi Hendrix". Or the address can also be spelled differently. For example "Berliner Platz 43" or "Berliner Plats 43" or "Berliner Platz"

Without fuzzy search system can only find the exact match means the only entries that are 100% identical. But with fuzzy search system can find the misspelled words too.

Create table and some data:
-- REPLACE <Schema_Name> WITH YOUR SCHEMA NAME
create column table <Schema_Name>."CUSTOMERS"(
                "CUSTOMER_ID" VARCHAR (5) not null default '',
                "FIRST_NAME" VARCHAR (20) null default '',
                "LAST_NAME" VARCHAR (20) null default '',
                "STREET" VARCHAR (20) null default '',
                "CITY" VARCHAR (20) null default '',
                "COUNTRY" VARCHAR (20) null default '',
                "POSTAL_CODE" VARCHAR (20) null default '',
                primary key ("CUSTOMER_ID"));

insert into <Schema_Name>."CUSTOMERS" values('00001','Jimi','Hendricks','Berliner Platz 43','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00002','Jimy','Hendricks','Berlinr Platz 43','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00003','Jimi','Hendrix','Berliner Plats 43','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00004','Jimy','Feuer','Berliner','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00006','Sven','Ottlieb','Walserweg 21','Aachen','Germany','52066');
insert into <Schema_Name>."CUSTOMERS" values('00007','Philip','Cramer','Maubelstr. 90','Brandenburg','Germany','14776');
insert into <Schema_Name>."CUSTOMERS" values('00008','Renate','Messner','Magazinweg 7','Frankfurt','Germany','60528');
insert into <Schema_Name>."CUSTOMERS" values('00009','Alexander','Feuer','Heerstr. 22','Leipzig','Germany','04179');
insert into <Schema_Name>."CUSTOMERS" values('00010','Antonio','Moreno','Mataderos 2312','Mexico','Mexico','05023');
insert into <Schema_Name>."CUSTOMERS" values('00011','Thomas','Hardy','120 Hanover','London','UK','WA1 1DP');
insert into <Schema_Name>."CUSTOMERS" values('00012','Christina','Berglund','Berguvsvagen 8','Lulea','Sweden','S-958 22');


Without Fuzzy Search:
Suppose you want to search a customer with name "Jimi".
SQL Query:
SELECT * FROM <Schema_Name>."CUSTOMERS"
                WHERE CONTAINS(FIRST_NAME, 'Jimi')
                ORDER BY "CUSTOMER_ID" DESC;

The output will contain only one entry which contains exact match of "Jimi".
SAP HANA Fuzzy Search

Now let us try the fuzzy search function.
SQL Query:
SELECT SCORE() AS score, * FROM <Schema_Name>."CUSTOMERS"
                WHERE
               CONTAINS(FIRST_NAME, 'Jimi', FUZZY(0.7))
               ORDER BY score DESC;

The output of fuzzy search contains 4 entries. Based on the fuzzy search factor (which is 0.7 in this case), it will also consider the similar words. In this case "Jimy".
SAP HANA Fuzzy Search
We can also do fuzzy search on 2 columns. For example First Name and Last Name.
SQL Query:
SELECT SCORE() AS score, * FROM <Schema_Name>."CUSTOMERS"
                WHERE
                          CONTAINS(FIRST_NAME, 'Jimi', FUZZY(0.7))
                          and CONTAINS(LAST_NAME, 'Hendricks', FUZZY(0.7))
                ORDER BY score DESC;

The output contains 3 entries. Based on the fuzzy search factor (which is 0.7 in this case), it will also consider the similar names. In this case "Jimy Hendricks" and "Jimi Hendrix".
SAP HANA Fuzzy Search





Support us by sharing this article.



Explore More
Close X
Close X

16 thoughts on “SAP HANA Fuzzy Search

  1. Mahesh says:

    You are extraordinary.

    You published great documents.

  2. manjeet says:

    GREAT WORK!!

  3. keerthipati sudheer says:

    Excellent !!!! Great explanation.

  4. Nachappa says:

    The explanation of the complicated concepts are provided in a simple way! Thanks.

  5. Akash says:

    One of the very simplified and structured way of explaining FUZZY logic…
    great work mate..
    ::)

  6. tarak says:

    Great work,thank you!

  7. vipin saini says:

    Excellent !

  8. Kasiap says:

    Keep up the good work,great job @admin

  9. Amit says:

    I would like to thank you for your great work !!..
    I have a query in above article can you please explain the difference between lines FUZZY(0.7,’textSearch=compare,bestMatchingTokenWeight=0.7′)) and FUZZY(0.7)) because both gives different result in score column.

  10. Rahul says:

    Thank you for the post!!

  11. Jonathan says:

    Great. Thanks for the tutorial. Awesome job!

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy