When I wrote about FileTables earlier I mentioned the new search capabilities you have with Semantic Search. What I forgot to mention was another new SQL 2012 feature of Full Text searches, Property Lists.
Simply put, property lists allow you to use a full text index to search document properties such as Author, Title, or Category. As long as you have a filter in place that can extract the property from the types of documents you’re storing you’re good to go. Older Microsoft Office document filters, like for .doc Word files, are in place by default, and you can install the filters for the newer types like .docx by installing and registering the Semantic database and the Office Filter Pack SP1. There are other 3rd party filters, known as iFilters available as well. One of the most popular is available from Adobe, to allow you to search .pdf documents.
One way you can check if a document has any extractable properties is by using the command line tool filtdump.exe, available from the Windows SDK. Running filtdump and passing the file name will show you properties that can be extracted from that file. In the screenshot below I’ve run filtdump against a simple .pdf file I created. Looking at Attribute, in the Chunk section, you’ll notice a GUID\integer for System.Title. Under Value you’ll see “proof”, which is the title of this small pdf.
You’ll need that GUID and integer when it becomes time to create your property list. You don’t need to generate your own values, they’re already available from Microsoft. In the Resources section at the end of this post I’ve listed a link to where you can find the values for Author, Title, Keywords, and PerceivedType. There’s a second link to a complete list of properties.
For this demo I’m using the same database used in my FileTable demos. I’ve installed and registered the Adobe iFilter, and I’ve inserted 2 SQL 2012 whitepapers into my FileTable. You can read my earlier posts on FileTables and get my full demo script here.
First I create my full text catalog, called Documents_Catalog. Next I create my property list called DocumentProperties. I add Title and Author to DocumentProperties using the GUID and ID I found on the Microsoft site. Finally I create a full text index, and I link the property list in the WITH SEARCH_PROPERTY_LIST option. The complete script is below.
CREATE FULLTEXT CATALOG Documents_Catalog WITH ACCENT_SENSITIVITY = ON; CREATE SEARCH PROPERTY LIST DocumentProperties; ALTER SEARCH PROPERTY LIST DocumentProperties ADD 'Title' WITH (PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2, PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' ); ALTER SEARCH PROPERTY LIST DocumentProperties ADD 'Author' WITH (PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = 'System.Author - Author or authors of the item.' ); CREATE FULLTEXT INDEX ON dbo.Documents ( name LANGUAGE 1033 STATISTICAL_SEMANTICS, file_type LANGUAGE 1033 STATISTICAL_SEMANTICS, file_stream TYPE COLUMN file_type LANGUAGE 1033 STATISTICAL_SEMANTICS ) KEY INDEX PK_Documents ON Documents_Catalog WITH SEARCH PROPERTY LIST = DocumentProperties, CHANGE_TRACKING AUTO, STOPLIST = SYSTEM;
Using full text searches I can now search my FileTable for all related tiles or all documents by a particular author. This is a screenshot of the properties of one on the .pdf whitepapers I’m storing.
Searching all titles that contain SQL returns both whitepapers. And searching by author returns the one in my screen shot above.
-- Using Full Text Search for Title SELECT name AS DocumentName, file_stream.GetFileNamespacePath() AS Path FROM Documents WHERE CONTAINS(PROPERTY(file_stream, 'Title'), 'SQL'); -- Using Full Text Search for Author SELECT name AS DocumentName, file_stream.GetFileNamespacePath() AS Path FROM Documents WHERE CONTAINS(PROPERTY(file_stream, 'Author'), '"Jenn Louras"');
One thing to note. Notice that when I search by author I’ve put the author name in double quotes inside the single quotes. SQL will return an error if you don’t. If you just want to search by first or last name the double quotes aren’t required.
Resources:
Here are a few links where you can get more information on Property Lists.
Property List Overview on Book Online
http://technet.microsoft.com/en-us/library/ee677637.aspxFinding GUID for Property Lists
http://technet.microsoft.com/en-us/library/ee677618.aspxCheck the Windows Properties
http://msdn.microsoft.com/en-us/library/ff521735(v=vs.85).aspx