About
DiamondSilk

What is DiamondSilk?
Why is it called DiamondSilk?
What is a filter?
Why should I add a filter?
How do I add a filter?
What's so special about searching the DiamondSilk database?
How do I search the DiamondSilk database?
Where did DiamondSilk come from?
Acknowledgements

What is DiamondSilk?
DiamondSilk applies structure to the web. Most data on the web today is unstructured: the HTML for a page on buy.com doesn't tell a program what item is being sold, or how much it costs. However, most of these sites have a simple mechanism for converting their databases into HTML pages. DiamondSilk, with a user's help, finds an inverting mechanism for this data to turn HTML into knowledge. Our intelligent spider then periodically revisits the site to scrape new data into our knowledge warehouse, and our query engine allows users (or other websites) to exploit this information to create anything from a price comparison engine to a news service. For more information, check out the DiamondSilk Technical Documentation.

Why is it called DiamondSilk?
The purpose of DiamondSilk is to create a solid structure out of the unstructured data on the Web, and to offer a smooth, clean search experience. Hence, it is tough as diamonds and smooth as silk. We hope. :)

What is a filter?
A filter is an integral part of the functionality of the DiamondSilk system. It is used to automatically determine which URLs in a site link to valid content in a given category, and then it automatically "filters" data from a page on the site and puts that data into the DiamondSilk database with the proper category and attribute name associated with it. By giving the system some examples of suitable links and showing it what parts of a page should be in the database, you "train" a filter to do these things automatically.

Why should I add a filter?
The more filters that exist within the DiamondSilk system, the more complete the database will be. This is because DiamondSilk uses the filters to decide which sites to visit and what information to pull from them. By creating a new filter, you can help DiamondSilk offer more useful data to its users.

How do I add a filter?
There are two training steps required to add a filter. The first is to teach the filter about the site you want filtered. You choose a site (or an area of a large site) and specify the category that suits this site. The best page in a site to use for this is the front page or an index that contains links to the content that you want to filter. Specifying how often the site gets updated tells the system how often to look at it for new data. Once you submit this information, you are shown a random series of pages that were linked from the site and asked to specify whether they contain valid content. If the page shown does not contain content that you want searched by the filter you are creating, click No. If the page shown contains attributes that you want to filter, click Yes. Be careful with this step, as a mistake will confuse the system and may cause it to try to filter pages that don't contain the right kind of data. (Note: You can watch the system "learn" by noting its guesses after giving it a few examples to train on!)

The second step is to teach the filter what kind of data to look for. You are given a page determined to be valid by the patterns detected in the pages you specified in the previous step. (If the page given does not contain valid content, click Reject to be given a different one.) For each attribute associated with this site's category, you are asked for the content fitting this attribute. Simply highlight this text with your mouse and click Submit. The highlighted text will be used to teach the filter how to find that attribute in a page. You will also be shown what you have submitted; if you find you have made a mistake, simply click Try Again and you will be brought back a step to try again.

After you have submitted a complete example page, the filter will try to guess to see if it has the right answer. Click Correct if the highlighted text fits the attribute shown. When the filter guesses correctly for all the attributes in the category, it's done training and is now suitable for filtering automatically. Click Incorrect if the wrong text is highlighted. Unfortunately, in this version of DiamondSilk, an incorrect guess means that this site is not suitable for filtering.
Add a filter

What's so special about searching the DiamondSilk database?
Most search engines on the Web only search for one thing: a keyword found in the content of a page. DiamondSilk is different because of its automated categorization. Instead of searching the whole Web for a certain word or phrase, DiamondSilk allows you to search within certain categories of sites and certain parts of pages, enabling a more refined and informative search. Once you've found the content you want, you can also sort the results by their different attributes so you can quickly and easily get the data you want, without wading through the results of a traditional search engine.

How do I search the DiamondSilk database?
You first get to choose which category to search. Subcategories are shown indented; clicking a parent category allows you to search all of its subcategories. You may then choose which attributes of that category to search, what kind of matching should be applied and what word or phrase to match with. How long ago a site was "harvested" means how long ago it was first found on this site, and you can request how far back in the records to search. You may also choose which sites within this category to search. Clicking Select All will check all of the checkboxes for you, and Select None will uncheck them; be sure at least one box is checked, otherwise you will get no results!

Once you receive your search results, you may sort them by attribute by clicking on the corresponding column heading. To view a page found in the search, simply click on the link to its URL and it will open in a new window. If a result is too long to completely fit in the table, click on the [...] to see that entry in its entirety. If there are more than ten pages found which match your criteria, click "Next 10 Matches" to see the rest; click "Last 10 Matches" to go back to the previous results.
Search the database

Where did DiamondSilk come from?
DiamondSilk is the Stanford University Computer Science Senior Project of David Weekly and Valerie Kucharewski. It was conceived by David Weekly in the Autumn of 1999 under the guidance of Stanford CS professor Armando Fox. Valerie Kucharewski joined the team in January of 2000. David and Valerie worked together through the Spring of 2000 to implement the complete system.

Acknowledgements
Thank you to Armando Fox for taking a chance on some undergrad advisees and for being our guiding light and voice of reason. Thanks to Grant for the food and love, and to Vanessa for the love and seaweed.

Valerie would especially like to thank Grant for his devotion, Mari for the hugs, Jen for the understanding tolerance, Lauren for the interest, Hannah and Craig for the help, Cam and Lindsay for the surprise visit, Dad for caring, Mom for listening, Nick for the encouragement, and finally David for not hating her when she cracked the whip.

David would like to thank Vanessa, who provided encouragement (electronic and non) at all times; his family, ceaselessly praying for him; the friendly residents of Maison Francaise with an ever-so-tasty assortment of things to eat; and Valerie, without whom