Quantcast
Channel: SQL – SQL Servings
Viewing all articles
Browse latest Browse all 50

Using Property Lists in SQL 2012 Full Text Searches

$
0
0

When I wrote about FileTables earlier I mentioned the new search capabilities you have with Semantic Search. What I forgot to mention was another new SQL 2012 feature of Full Text searches, Property Lists.

Simply put, property lists allow you to use a full text index to search document properties such as Author, Title, or Category. As long as you have a filter in place that can extract the property from the types of documents you’re storing you’re good to go. Older Microsoft Office document filters, like for .doc Word files, are in place by default, and you can install the filters for the newer types like .docx by installing and registering the Semantic database and the Office Filter Pack SP1. There are other 3rd party filters, known as iFilters available as well. One of the most popular is available from Adobe, to allow you to search .pdf documents.

One way you can check if a document has any extractable properties is by using the command line tool filtdump.exe, available from the Windows SDK. Running filtdump and passing the file name will show you properties that can be extracted from that file. In the screenshot below I’ve run filtdump against a simple .pdf file I created. Looking at Attribute, in the Chunk section, you’ll notice a GUID\integer for System.Title. Under Value you’ll see “proof”, which is the title of this small pdf.

image

You’ll need that GUID and integer when it becomes time to create your property list. You don’t need to generate your own values, they’re already available from Microsoft. In the Resources section at the end of this post I’ve listed a link to where you can find the values for Author, Title, Keywords, and PerceivedType. There’s a second link to a complete list of properties.

For this demo I’m using the same database  used in my FileTable demos. I’ve installed and registered the Adobe iFilter, and I’ve inserted 2 SQL 2012 whitepapers into my FileTable. You can read my earlier posts on FileTables and get my full demo script here.

First I create my full text catalog, called Documents_Catalog. Next I create my property list called DocumentProperties. I add Title and Author to DocumentProperties using the GUID and ID I found on the Microsoft site. Finally I create a full text index, and I link the property list in the WITH SEARCH_PROPERTY_LIST option. The complete script is below.

CREATE FULLTEXT CATALOG Documents_Catalog WITH ACCENT_SENSITIVITY = ON;

CREATE SEARCH PROPERTY LIST DocumentProperties;

ALTER SEARCH PROPERTY LIST DocumentProperties
   ADD 'Title'
   WITH (PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2,
   PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' );

ALTER SEARCH PROPERTY LIST DocumentProperties
   ADD 'Author'
   WITH (PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 4,
   PROPERTY_DESCRIPTION = 'System.Author - Author or authors of the item.' );

CREATE FULLTEXT INDEX ON dbo.Documents
(
   name
   LANGUAGE 1033
   STATISTICAL_SEMANTICS,
   file_type
   LANGUAGE 1033
   STATISTICAL_SEMANTICS,
   file_stream
   TYPE COLUMN file_type
   LANGUAGE 1033
   STATISTICAL_SEMANTICS
)
KEY INDEX PK_Documents
ON Documents_Catalog
WITH SEARCH PROPERTY LIST = DocumentProperties, CHANGE_TRACKING AUTO, STOPLIST = SYSTEM;

Using full text searches I can now search my FileTable for all related tiles or all documents by a particular author. This is a screenshot of the properties of one on the .pdf whitepapers I’m storing.

image

Searching all titles that contain SQL returns both whitepapers. And searching by author returns the one in my screen shot above.

-- Using Full Text Search for Title
SELECT name AS DocumentName, file_stream.GetFileNamespacePath() AS Path FROM Documents
WHERE CONTAINS(PROPERTY(file_stream, 'Title'), 'SQL');

-- Using Full Text Search for Author
SELECT name AS DocumentName, file_stream.GetFileNamespacePath() AS Path FROM Documents
WHERE CONTAINS(PROPERTY(file_stream, 'Author'), '"Jenn Louras"');

image

One thing to note. Notice that when I search by author I’ve put the author name in double quotes inside the single quotes. SQL will return an error if you don’t. If you just want to search by first or last name the double quotes aren’t required.

Resources:

Here are a few links where you can get more information on Property Lists.

Property List Overview on Book Online

http://technet.microsoft.com/en-us/library/ee677637.aspxFinding GUID for Property Lists

http://technet.microsoft.com/en-us/library/ee677618.aspxCheck the Windows Properties

http://msdn.microsoft.com/en-us/library/ff521735(v=vs.85).aspx


Viewing all articles
Browse latest Browse all 50

Trending Articles