Database.ca SQL Server Blog

My Blog

Monday, March 10 2014

Keywords: find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

Find # of times a substring is found within a string, and differences between len() and datalength()

You could actually see this in action if you search my site for something...I sort the search results first by descending order of occurrence of the sought word...and then by descending publication date...so between the blogs with matching occurrences, the newer ones would show first.

 

Searching the web for such script...you will run into some identical scripts as displayed below in many sites...be careful before using them however...they do have various bugs or shortcomings.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

In fact...start searching for a space character ' ' and soon you will encounter one of its problems!

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

In T-SQL, if you do len(' ') with a space character in between the two single quotes...you don't get 1! You get back a zero! I have seen an corrected many bugs caused by this fact.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

The solution to this shortcoming would be the use of datalength() function. it returns the correct number of space characters contained in a string. But wait...len() has another limitation as well! It ignores trailing zeros too! But again datalength() function returns the correct value in both cases.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

Let's also do a quick test with Unicode values.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

So as strange it might look to some...in T-SQL ' ' is equal to ''...a string comprised of 10 space characters is actually equivalent to an empty string...but they are NOT equivalent to a NULL.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

So let's go ahead and make some changes to that common script...and make it handle spaces in the search words, trailing spaces in the long strings, as well as Unicode!

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

But that script still has some issues...per example it doesn't behave well when either of the two strings (the long or the short) is a NULL. So I went ahead and wrote myself a function that wraps all the above notions in it...and it handles the NULLs as well.

 

find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences

You could download and use the script for my function down below.

 

Written by Ramin Haghighat