The Future Is the Semantic Web

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The “Intelligent agent” people have touted for ages will finally materialize.
~Tim Berners-Lee

In the 1999, Tim Berners-Lee described his vision for the future of the Internet. He described computers being able to parse and understand the same information that we can, and become personalized virtual assistants on our behalves. Imagine the calendar application on your mobile device observing a schedule conflict when you begin to book a trip on Expedia. Or what if your computer could do all of the research necessary to justify decision you task it with, and it compiles a detailed research report in support of a decision, ready for you in the morning. To many, this is the true destination of the Internet, and all of the social sharing that caught everyone’s attention in web 2.0 is a mere preview of what is to come. One blog author I came across, described it by saying “Its a data orgy and your server is invited”.

For roughly a decade now, a movement called the Semantic Web (aka HyperMedia) has been seeking to enable this vision. It is thought that by properly annotating an HTML document, you will enable a computer to consume and understand the information, much like a human would, and thus be able to make informed decisions and act on our behalf. To accomplish this, there must be structure and relational definitions, and several annotation protocols and descriptive vocabularies followed. So far, adoption has been anemic however, for numerous reasons: the effort has so far been splintered by various factions creating competing technologies, and limited support by browsers and search engines, limiting incentives for early adopters to explore the technology more deeply.

But recently this has been changing. As per the introduction of HTML5, there is now support for semantic tagging in all modern browsers, search engines have begun to reward websites with rich snippets in their search results when the semantic data is present, and reports have been written, detailing how BestBuy on others have seen lift in traffic as much as 30%, as a result of their recent RDFa implementations and resulting rich snippets in search results.

Others are also making use of semantic markup recently. Facebook has been using RDF in their social graph implementation since 2009 and recently began using the hCard and hCalendar markups of user profiles and events, respectively. Google has been making a push to ‘authenticate’ authors and authored works with the XFN rel=’me’ annotation, Yahoo Tech and LinkedIn are both using Microformats in their data as well. So now we have support, peer adoption, and recent evidence of positive ROI from the effort and we’re starting to see results; RDFa adoption increased 500% in the last 12 months alone! The only thing missing was significant corporate backing.

In response to this confusion and complexity, a consortium of the major search engines, including Google, Yahoo, Microsoft and Yandex, came together to create a standardized approach, called Schema.org. It assumes the use of Microdata rather than RDF/RDFa and provides a single resource for the major semantic vocabularies to be used. The hope is that by simplifying the technology, and standardizing it in industry, adoption barriers will be reduced and the average development team will begin to embrace the technology.

So how does one implement semantic annotation? There two primary annotation frameworks are the Resource Definition Language (RDF/RDFa) and Microformats. RDF can be used markup with the HTML document with what are called Subject-Predicate-Object triplicates. It is a robust language that is most commonly used for more robust datasets that require deep data linking. It has been criticized for its complexity however and so a recent revision called RDFa focused on making it easier to use, with a focus on use of attributes. Here is an example of what it might look like if you were to markup a simple object in RDFa:

    <div xmlns:v=”http://rdf.semantic-vocabulary.org/#” typeof=”v:Person”>
        <p>Name: <span property=”v:name”>Neal Cabage</span></p>
        <p>Title: <span property=”v:title”>Technologist</span></p>
   </div>

Microformats meanwhile, was developed with simplicity in mind and a more natural extension of the HTML document. MicroData is a subset of Microformats that Schema.org has settled on and even has its own DOM API in the HTML5 spec, so this is the presumed future standard for most websites:

    <div itemscope itemtype=”http://semantic-vocabulary.org/Person”>
         <p>Name: <span itemprop=”name”>Neal Cabage</span></p>
         <p>Title: <span itemprop=”title”>Technologist</span></p>
   </div>

At some level the concepts are quite simple, however, there are numerous Ontologies and semantic vocabularies, which have been defined and are referenced in order to give meaning to your RDF or Microformat attributions. Notice the semantic-vocabulary.org reference above? It calls out to an externally defined schema for the Person hCard being defined here. Schema.org has defined many of these directly for MicroData but there are others such as Dublic Core, DocBook, and Good Relations which is specifically very popular for eCommerce. There are plugins available for many of the major CMS and eCommerce platforms to assist with semantic markup as well.

The Semantic Vision
In the near term, the Semantic Web (or “Linked Data”) is merely an exercise of annotating your content more precisely with hope of getting the Google carrot at the end of the proverbial stick (e.g. rich snippets). As a result, any short-term concrete gain main seem a bit hollow. The real promise however, is in what is possible when semantic annotations in content reach critical mass. This chart shows what some have been predicting in terms of the direction of Internet technology and proliferation. The real intelligence and interoperations still very much lie ahead in what could be described as a Web 4.0 World. This assumes that HTML5 marks the precipice of Web 3.0 and we all begin working on proper annotations now, to lay the foundation for those achievements in the future.

In the meantime, there are companies already beginning to do cool things, experimenting with the sort of predictive intelligence that one might expect from a “linked data” web 4.0 World. Hunch.com in particular attempts to help you parse through excessive data on the Internet, by looking at your past interests and those of your friends, to determine what you might like and limit your choices accordingly. Hunch.com’s CEO talks about how research exists to assert that fewer purchases are made when consumers are presented with too much information or too many choices. It is thus worthwhile technology for retailer to pursue, in an effort to predictively get fewer precision choices in front of each consumer.

And with all of the data that currently exists in the World, we just continue to collect exponentially more, via social interactions, tags, various use profiles, online transactions and analytics data. The buzz word in many organizations now is “big data” and they are looking to new tools such as Hadoop to help them address these problems. In this growing cluster of data, we will eventually outgrow search as our most useful paradigm for how to access all of this data. And what will that look like?

I was asking myself this question the other day when I looked down at my iPhone and realized, the interface may already be here! What would a computer interface look like that is largely Internet-driven, but for which the user experience does not begin with an Internet browser or Google.com? In fact, if the real vision of a semantic web, is intelligent consumption of data and that intelligence applied for specific applications as virtual agents, wouldn’t that manifest as a lightweight albeit more mature mashup style app, similar to what we already have on mobile phones and tablets today? Its an interesting thought. I can imagine a progression in that direction, with these applications continuing to grow in sophistication, intelligence, and awareness.

In closing, here is a video that I found while researching, that provides an excellent introduction to the topic. If you’re looking to introduce these concepts to your team or stakeholders, this is a great place to start: