Monday, March 10 2014
Keywords: find a string within another, number of occurrences, sql server, t-sql, len() string function, datalength() function differences
Find # of times a substring is found within a string, and differences between len() and datalength()
You could actually see this in action if you search my site for something...I sort the search results first by descending order of occurrence of the sought word...and then by descending publication date...so between the blogs with matching occurrences, the newer ones would show first.
Searching the web for such script...you will run into some identical scripts as displayed below in many sites...be careful before using them however...they do have various bugs or shortcomings.
In fact...start searching for a space character ' ' and soon you will encounter one of its problems!
In T-SQL, if you do len(' ') with a space character in between the two single quotes...you don't get 1! You get back a zero! I have seen an corrected many bugs caused by this fact.
The solution to this shortcoming would be the use of datalength() function. it returns the correct number of space characters contained in a string. But wait...len() has another limitation as well! It ignores trailing zeros too! But again datalength() function returns the correct value in both cases.
Let's also do a quick test with Unicode values.
So as strange it might look to some...in T-SQL ' ' is equal to ''...a string comprised of 10 space characters is actually equivalent to an empty string...but they are NOT equivalent to a NULL.
So let's go ahead and make some changes to that common script...and make it handle spaces in the search words, trailing spaces in the long strings, as well as Unicode!
But that script still has some issues...per example it doesn't behave well when either of the two strings (the long or the short) is a NULL. So I went ahead and wrote myself a function that wraps all the above notions in it...and it handles the NULLs as well.
You could download and use the script for my function down below.
Written by Ramin Haghighat