Content area
Full text
How often have you struggled to find a fragment of data buried in mountains of full text? Whether the information is on the web or your intranet, it can be frustrating to locate exactly what you need without having some context for the content. Whether you're looking for an address, a phone number, an ingredient in a recipe, or an event location, searching for that information can result in a lot of false positives. One way around this is for someone to create the mother of all databases for events, reviews, items for sale, contact information, and all other information under the sun. Assuming this might be technically feasible, imagine trying to convince everyone to add their content to this site and keep it updated in addition to adding this content to their own sites. It's not going to happen.
How can we make discovering of needles in haystacks more like finding elephants in a zoo? One way is by publishing content on our sites that it is meaningful to search engines, specifically by using microformats. According to the microformats website (http://microformats.org/about), microformats are "a set of simple open data formats" that humans can read and from which computers can also read and extract information. They're "small bits of HTML that represent things like people, events, tags, etc. in web pages." Developed collaboratively and through open discussion, microformat development is guided by the following principles:
* Solve a specific problem.
* Start as simple as possible.
* Design for humans first, machines second.
* Reuse building blocks from widely adopted standards.
* Aim for modularity and embeddability.
* Enable and encourage decentralized development, content, and services.
Think of it as making information understandable to computers, not just machine readable.
FINDING BURIED TREASURE
The web is filled with millions of pages with buried treasure-pieces of information that describe people, events, and things. What if we could mine this information from full-text documents and know when a name is a name and an event location really is an event location? We really could build the mother of all databases of events, recipe ingredients, reviews, or whatever, after the fact. Micro formats encode meaning into full text by taking advantage of the class element in HTML to label...





