This week I have a lot of good challenges in SQL as we are building an SSIS package to unify data from different systems like Solomon, Raisers Edge, Goldmine, etc. A part of that SSIS Package is to cleanse the data and the toughest to cleanse is the addresses (which means putting the correct data on the correct fields like Post Code and Suburbs). Now I was quite challenged in one scenario where I need to extract Suburb field which I need to get that data form a consolidated address field (everything in there) an I want to do it in one Select statement!
So what are my components? I have a table of correct suburbs and a table of unclean addresses.
So how would I extract the correct suburb in one Select? Initially what I was doing was
SELECT a.CompleteAddress, s.Suburb FROM tblCrap_Addresses a INNER JOIN tblLookup_Suburb s ON a.CompleteAddress LIKE (SELECT TOP 1 '%' + s.Suburb + '%' )
But there’s a problem with this one as if I have an address like 999 X Street, Mount Albert and in my suburbs I have Mount, Mount Al, and Mount Albert I will definitely get a wrong join on the suburb and will get the Mount as the suburb instead of Mt Albert. I tried a lot of stuff like indexing the Suburb table to the length of Mt Albert so it gets the longest length DESC so it joins on the longest string first but still does not work, Its also impossible to put a condition on the LIKE while doing a JOIN. Then I suddenly remember the APPLY Operator.
So what is an APPLY Operator? According to Technet.
“The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.”
So my new command now is:
SELECT a.CompleteAddress s1.Suburb FROM tblCrap_Address a OUTER APPLY (SELECT TOP 1 s2.Suburb FROM tblLookup_Suburb s2 where a.CompleteAddress LIKE '%' + s2.Suburb + '%' ORDER BY LEN(s2.Suburb) DESC) s1