How to find a certain string on all columns on all tables on all of your databases using TSQL

By | March 26, 2015

You might be in a scenario where you were assigned to find a certain text on all the records on your database.  Well you might think that’s easy, well yes if you have only one database with few tables and records but what if you have a large database with tons of tables then its quite next to impossible if you do that manually.

In real world application for this might be something like finding Credit Card Numbers in all columns on all tables on all of your databases because these numbers should never ever be stored in the database.  Now think of how you can do this and contemplate on that thought, see how you would want to achieve that task.


Let’s see.

Lets start from the smallest unit of operation, lets begin with selecting that record with Credit Card information from a query.

You might have thought of using the LIKE operator because using this operation would be the simplest and fastest way, you just need some wildcards(%) and square brackets([]) to get your result.   If you are heading this direction you might do a Select Query like this.

SELECT Something FROM YourTable where Something like '%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'

Well this is good but not good enough as wildcards and brackets while it can be powerful it will be really difficult to write a code by using only SQL Server inbuilt string functions.  For example SQL Pattern Matching does not support Grouping Constructs or Word Boundaries, so you end up making simple ones like the one above which show results where there are at least 12 digits in that column called Something, this means even Phone Numbers and Other Numbers with length of 12 will also be displayed.   If you want to further level it up to grab only Credit Card Number you will end up making multiple ‘OR‘ for all possibilities.

Have you ever thought of using regular expressions? Well this is our answer but complex Regular Expressions cannot be used on the LIKE operator.  So a regular expression to get Credit Card numbers like this

\b(?:4[0-9]{12}(?:[0-9]{3})?|5[12345][0-9]{14}|3[47][0-9]{13}|3(?:0[012345]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35[0-9]{3})[0-9]{11})\b

cannot be used on the Select statement above, try it to see the result for yourself.  BTW that’s the most effective Regular Expression I used for checking credit cards thanks to this site, it captures most of the numbers used in the Luhn algorithm, meaning cards for Visa, MasterCard, Discover, American Express, Diner’s Club and JCB are properly validated.


Now how would I use that regular expression? You have two options first one is by CLR Integration, meaning using Visual Studio to code the Function and use native Regular Expression class so if you are not a programmer and don’t know how to deploy CRL Stored Procedures in SQL then it will be tricky for you.  This means you are left with the only option, accessing a COM Object to use Regular Expression via TSQL.

Yes you can do that! by using  sp_OACreate and importing the ‘VBScript.RegExp’ COM Object but before we start we need to activate Ole Automation Procedures in SQL Server by running the script below.

sp_configure 'Show Advanced Options', 1
GO 
RECONFIGURE; 
GO 
sp_configure 'Ole Automation Procedures', 1
GO 
RECONFIGURE; 
GO 
sp_configure 'Show Advanced Options', 1
GO 
RECONFIGURE;

Now that you activated it lets start coding.

This first part is to validate credit card numbers using TSQL

DECLARE @InputParameter nvarchar(4000) = '4123456789012' --Sample Valid Credit Card Number 
DECLARE @RegularExpression VARCHAR(4000) = '\b(?:4[0-9]{12}(?:[0-9]{3})?|5[12345][0-9]{14}|3[47][0-9]{13}|3(?:0[012345]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35[0-9]{3})[0-9]{11})\b' --Pattern from http://www.regexmagic.com/manual/xmppatterncreditcard.html
DECLARE @RegularExpressionObject INT, @OutputParameter INT, @ReturnValue VARCHAR(4000)
 
--Create and set properties to the VBScript.RegExp COM object
EXEC sp_OACreate 'VBScript.RegExp', @RegularExpressionObject OUT
EXEC sp_OASetProperty @RegularExpressionObject, 'Pattern', @RegularExpression
EXEC sp_OASetProperty @RegularExpressionObject, 'MultiLine', 1
EXEC sp_OASetProperty @RegularExpressionObject, 'IgnoreCase', 1
EXEC sp_OASetProperty @RegularExpressionObject, 'CultureInvariant', true
EXEC sp_OASetProperty @RegularExpressionObject, 'Global', false
 
--Perform Execute and Assign to the OutputParameter
EXEC sp_OAMethod @RegularExpressionObject, 'Execute', @OutputParameter OUT, @InputParameter
 
--Get the First Value of the Output Parameter, this is the matching value
EXEC sp_OAGetProperty @OutputParameter, 'Item(0).Value' , @ReturnValue OUT
 
--Destroy COM object 
EXEC sp_OADestroy @RegularExpressionObject
EXEC sp_OADestroy @OutputParameter
 
SELECT @ReturnValue

As you can see above we created an input parameter for your credit card and a regular expression, you can use this with any regular expressions like Phone Numbers, SSS Number, Zip Code, etc but in this instance since we are looking for credit card numbers we are using that expression.

Then after that you call that COM object and set the properties like Pattern, Multiline, IgnoreCase, CultureInvariant and Global, to see what each properties mean head over to MSDN for a detailed explanation.

Finally we Execute, get the results and clear the COM Object.

Now you have that info all you need to do is to loop through all your tables in the database and to achieve that is to just query sys.Tables and sys.Columns to find out all you tables.


This part then lists all columns on all tables in your SQL database

SELECT 
    Tables.name, Columns.name, Types.name
FROM 
sys.tables AS Tables 
INNER JOIN sys.columns as Columns 
    ON Tables.OBJECT_ID = Columns.OBJECT_ID
INNER JOIN sys.types as Types 
    ON Columns.system_type_id = Types.system_type_id

You can copy and paste this and see all your columns and their data types in the database you decide to run this.  Since we only need certain data types where Credit Card information can possibly be stored lets limit it by a WHERE clause not to include data types like boolean, date, etc.

So you end up in something like this

SELECT 
    Tables.name, Columns.name, Types.name
FROM 
sys.tables AS Tables 
INNER JOIN sys.columns as Columns 
    ON Tables.OBJECT_ID = Columns.OBJECT_ID
INNER JOIN sys.types as Types 
    ON Columns.system_type_id = Types.system_type_id
WHERE 
    Types.name in ('text','real','money','float','sql_variant','ntext','numeric','bigint','varchar','char','nvarchar','nchar','xml') 

 

Now your nearly there, what you do next is to create loads of SQL Select Queries for each column on each table which we will be basing on that results the query above gives us.   But before we do that we need to use that Regular Expression validator above by making it a Function that we will call CreditCardMatch, this means we can now easily use that on any query.   So let convert the above query to make it a function.

CREATE FUNCTION CreditCardMatch 
(
    @InputParameter nvarchar(4000)
)
RETURNS nvarchar(4000)
AS
BEGIN
 
    DECLARE @RegularExpression VARCHAR(4000) = '\b(?:4[0-9]{12}(?:[0-9]{3})?|5[12345][0-9]{14}|3[47][0-9]{13}|3(?:0[012345]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35[0-9]{3})[0-9]{11})\b' --Pattern from http://www.regexmagic.com/manual/xmppatterncreditcard.html
 
    DECLARE @RegularExpressionObject INT, @OutputParameter INT, @ReturnValue VARCHAR(4000)
 
    --Create and set properties to the VBScript.RegExp COM object
    EXEC sp_OACreate 'VBScript.RegExp', @RegularExpressionObject OUT
    EXEC sp_OASetProperty @RegularExpressionObject, 'Pattern', @RegularExpression
    EXEC sp_OASetProperty @RegularExpressionObject, 'MultiLine', 1
    EXEC sp_OASetProperty @RegularExpressionObject, 'IgnoreCase', 1
    EXEC sp_OASetProperty @RegularExpressionObject, 'CultureInvariant', true
    EXEC sp_OASetProperty @RegularExpressionObject, 'Global', false
 
    --Perform Execute and Assign to the OutputParameter
    EXEC sp_OAMethod @RegularExpressionObject, 'Execute', @OutputParameter OUT, @InputParameter
 
    --Get the First Value of the Output Parameter, this is the matching value
    EXEC sp_OAGetProperty @OutputParameter, 'Item(0).Value' , @ReturnValue OUT
 
    --Destroy COM object 
    EXEC sp_OADestroy @RegularExpressionObject
    EXEC sp_OADestroy @OutputParameter
 
    RETURN @ReturnValue
 
END
GO

Now using the Table and Column list above we need to create an individual query which we can Execute on a loop and store the results somewhere later on.  The output individual query we are targeting to build and to execute must be something like this.

SELECT 'YourColumn' as ColumnName, 'YourTable' as TableName, REPLACE(CONVERT(nvarchar,ID,4000), '-', '')  AS RecordWithCreditCardInformation FROM dbo.YourTable WHERE 
LTRIM(RTRIM(CONVERT(nvarchar,YourColumn,4000))) <> '' AND 
YourColumn IS NOT NULL AND 
LEN(REPLACE(CONVERT(nvarchar,YourColumn,4000), '-', '')) > 12 AND 
dbo.CreditCardMatch(REPLACE(CONVERT(nvarchar,YourColumn,4000), '-', '')) IS NOT NULL

You see from the query we had used the CreditCardMatch function where the condition is null, this is because that function gives out a result when it sees the credit card number in the column if not then it will return null, we also limited the result where the column is more 12 or more in length which is the minimum length a Credit Card Number can have and we also need to make sure the Column is not empty.  You might think of some other tweaks to this one, let us know by commenting below.

Now you have that in mind lets generate that programatically via a query, it will now look like this.

SELECT 
'SELECT ''' + Columns.name + ''' as ColumnName, ''' + Tables.name + ''' as TableName, REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', '''')  AS RecordWithCreditCardInformation FROM ' + SCHEMA_NAME(Tables.schema_id) + '.' + Tables.name + 
' WHERE LTRIM(RTRIM(CONVERT(nvarchar,' + Columns.name +',4000))) <> '''' AND 
' + Columns.name + ' IS NOT NULL 
AND LEN(REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', '''')) > 12 AND 
dbo.CreditCardMatch(REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', ''''))  IS NOT NULL'
FROM 
sys.tables AS Tables 
INNER JOIN sys.columns as Columns 
    ON Tables.OBJECT_ID = Columns.OBJECT_ID
INNER JOIN sys.types as Types 
    ON Columns.system_type_id = Types.system_type_id
WHERE 
    Types.name in ('text','real','money','float','sql_variant','ntext','numeric','bigint','varchar','char','nvarchar','nchar','xml') 

Now if you want to test and run you will see that it will create Select queries for each column on all your tables in the database

01 Dynamic Query

All you got left is to execute all of that generated select statement in a loop and also save all results to a separate table because this process might take ages specially if you have a very large database, waiting for the result on screen is pointless.  For the tables to be created we only need two, one is a repository of all the dynamically generated queries we will then call it ZZ_AuditQueries and another for all the results the query will return, we call it ZZ_AuditResults.

IF EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'ZZ_AuditQueries')
DROP TABLE ZZ_AuditQueries
GO
 
CREATE TABLE ZZ_AuditQueries
(
[ID] [int] IDENTITY(1,1) NOT NULL, [Query] [varchar](4000) NOT NULL,
)
 
IF EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'ZZ_AuditResults')
DROP TABLE ZZ_AuditResults
CREATE TABLE ZZ_AuditResults
(
[ID] [int] IDENTITY(1,1) NOT NULL, [ColumnName] [varchar](4000) NOT NULL, [TableName] [varchar](4000) NOT NULL, [RecordWithCreditCardInformation] [varchar](4000) NOT NULL
)

We have prefixed the table ZZ_ so its easy to find later because it will be in the bottom of the list

00 ZZ Tables

and if you notice, the structure of both tables is quite simple to understand, no need to explain.

Now lets bring them all together, the script below can be copied and pasted without any manipulation, it will work straight out of the box.

------------------------------------------------------------------------------------------------------------------------ 
--Create Stored Procedure to Use
------------------------------------------------------------------------------------------------------------------------ 
IF EXISTS (SELECT * FROM sys.objects WHERE type = 'FN' AND name = 'CreditCardMatch')
DROP FUNCTION  CreditCardMatch 
GO
 
CREATE FUNCTION CreditCardMatch 
(
    @InputParameter nvarchar(4000)
)
RETURNS nvarchar(4000)
AS
BEGIN
 
    DECLARE @RegularExpression VARCHAR(4000) = '\b(?:4[0-9]{12}(?:[0-9]{3})?|5[12345][0-9]{14}|3[47][0-9]{13}|3(?:0[012345]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35[0-9]{3})[0-9]{11})\b' --Pattern from http://www.regexmagic.com/manual/xmppatterncreditcard.html
 
    DECLARE @RegularExpressionObject INT, @OutputParameter INT, @ReturnValue VARCHAR(4000)
 
    --Create and set properties to the VBScript.RegExp COM object
    EXEC sp_OACreate 'VBScript.RegExp', @RegularExpressionObject OUT
    EXEC sp_OASetProperty @RegularExpressionObject, 'Pattern', @RegularExpression
    EXEC sp_OASetProperty @RegularExpressionObject, 'MultiLine', 1
    EXEC sp_OASetProperty @RegularExpressionObject, 'IgnoreCase', 1
    EXEC sp_OASetProperty @RegularExpressionObject, 'CultureInvariant', true
    EXEC sp_OASetProperty @RegularExpressionObject, 'Global', false
 
    --Perform Execute and Assign to the OutputParameter
    EXEC sp_OAMethod @RegularExpressionObject, 'Execute', @OutputParameter OUT, @InputParameter
 
    --Get the First Value of the Output Parameter, this is the matching value
    EXEC sp_OAGetProperty @OutputParameter, 'Item(0).Value' , @ReturnValue OUT
 
    --Destroy COM object 
    EXEC sp_OADestroy @RegularExpressionObject
    EXEC sp_OADestroy @OutputParameter
 
    RETURN @ReturnValue
 
END
GO
 
------------------------------------------------------------------------------------------------------------------------ 
--Create Table Temp Tables for Queries and Table for Results
------------------------------------------------------------------------------------------------------------------------ 
IF EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'ZZ_AuditQueries')
DROP TABLE ZZ_AuditQueries
GO
 
CREATE TABLE ZZ_AuditQueries
(
[ID] [int] IDENTITY(1,1) NOT NULL, [Query] [varchar](4000) NOT NULL,
)
 
IF EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'ZZ_AuditResults')
DROP TABLE ZZ_AuditResults
CREATE TABLE ZZ_AuditResults
(
[ID] [int] IDENTITY(1,1) NOT NULL, [ColumnName] [varchar](4000) NOT NULL, [TableName] [varchar](4000) NOT NULL, [RecordWithCreditCardInformation] [varchar](4000) NOT NULL
)
 
 
------------------------------------------------------------------------------------------------------------------------ 
--Populate Temp Table with Queries to Run
------------------------------------------------------------------------------------------------------------------------ 
INSERT INTO ZZ_AuditQueries
SELECT 
'INSERT INTO ZZ_AuditResults SELECT ''' + Columns.name + ''' as ColumnName, ''' + Tables.name + ''' as TableName, REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', '''')  AS RecordWithCreditCardInformation FROM ' + SCHEMA_NAME(Tables.schema_id) + '.' + Tables.name + 
' WHERE LTRIM(RTRIM(CONVERT(nvarchar,' + Columns.name +',4000))) <> '''' AND 
' + Columns.name + ' IS NOT NULL 
AND LEN(REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', '''')) > 12 AND 
dbo.CreditCardMatch(REPLACE(CONVERT(nvarchar,' + Columns.name + ',4000), ''-'', ''''))  IS NOT NULL'
FROM 
sys.tables AS Tables 
INNER JOIN sys.columns as Columns 
    ON Tables.OBJECT_ID = Columns.OBJECT_ID
INNER JOIN sys.types as Types 
    ON Columns.system_type_id = Types.system_type_id
WHERE 
    Types.name in ('text','real','money','float','sql_variant','ntext','numeric','bigint','varchar','char','nvarchar','nchar','xml') 
    AND Tables.name not in ('ZZ_AuditQueries', 'ZZ_AuditResults')
-- Select all data type where CC can be stored
 
 
 
------------------------------------------------------------------------------------------------------------------------ 
--Run the Auto Generated Queries
------------------------------------------------------------------------------------------------------------------------ 
DECLARE @SqlQuery as varchar(4000)
DECLARE @QueryCursor as CURSOR
 
SET @QueryCursor = CURSOR FOR
SELECT Query FROM ZZ_AuditQueries
 
OPEN @QueryCursor
FETCH NEXT FROM @QueryCursor INTO @SqlQuery
 
WHILE @@FETCH_STATUS = 0
BEGIN
 EXEC(@SqlQuery)
 FETCH NEXT FROM @QueryCursor INTO @SqlQuery
END
 
CLOSE @QueryCursor;
DEALLOCATE @QueryCursor;

Now if you notice there are minor changes like the INSERT because we need to insert the result of the queries as well as the ones we found out to the tables we created on the Database.  We also limited the result set of querying all tables by not including the ones we created (the ZZ tables).  Finally you notice this will work with only one database, I guess I leave that to you on whats your strategy to run this on all databases.

Just a final note, this might not be the fastest solution but it works and its free no plugins necessary or 3rd party applications needed.  You might make this as a SQL Server Agent Job because on large databases this will be slow and you don’t want to affect the performance of your databases on peak times.  And yes this is the same query I used for my  database and I did found lots of results which is good because sometimes users just save sensitive information like Credit Cards in any way they can without you knowing.

All you need to do after running this is to wait for the results and it will look similar to this

02 Sample Result

BTW you cannot use those credit card numbers they are test numbers which I got from this site.


Leave a Reply