Generate a Mirror | Mirror List

Google
 
  AddThis Social Bookmark Button

Anticipating a your site getting Slashdotted, or succuming to the Digg effect? Mirror your site for free with LynxCache.

 


Planning a Semantic Web site

mirror of http://www.ibm.com/developerworks/xml/library/x-plansemantic/index.html?ca=drs-


Planning a Semantic Web site 

 

Skip to main content [1] [2]

 

Country/region [ select [3] ]

 

All of dW ----------------- AIX and UNIX Information Mgmt Lotus Rational Tivoli WebSphere ----------------- Architecture Autonomic computing Java technology Linux Multicore acceleration Open source SOA & Web services Web development XML ----------------- dW forums ----------------- alphaWorks ----------------- All of IBM

 

 

Home [4] Business solutions [5] IT services [6] Products [7] Support it just follows its internal rules for displaying the page. It's up to you to understand the information on the page.

 

Structuring data adds value to that data. With consistent structure, it can be used in more ways. You can see the demand for structured data today in the proliferation of APIs that have sprung up around Web sites as a part of the Web 2.0 trend--an _API_ is structured data, and structured data from a variety of sources is what powers mashups. The idea behind _mashups_ is that data is pulled from various sources on the Web and, when combined and displayed in a unified manner, this combination of elements adds value over and above the source information alone.

 

The individual APIs that everyone is busy building are to solve the exact same problem that the Semantic Web is intended to address: Expose the content of the Web as data and then combine disparate data sources in different ways to build new value. Rather than build and maintain your own API, you can build your Web site to take full advantage of the Semantic Web infrastructure which is already in place. If your Web site is your API, you can reduce the overall development and maintenance. Similarly, rather than build custom solutions for every Web site you want to pull data from, you can implement one solution based on Semantic Web technologies and have it work interchangeably across many Web sites--including Web sites you weren't even aware of before you began development.

 


BACK TO TOP [32]

 

Semantic Web technology overview

 

Semantic Web technologies can be considered in terms of layers, each layer resting on and extending the functionality of the layers beneath it. Although the Semantic Web is often talked about as if it were a separate entity, it is an extension and enhancement of the existing Web rather than a replacement of it. FIGURE 1. THE SEMANTIC WEB TECHNOLOGY STACK

 

As shown in Figure 1, the base layer of the Semantic Web is HTTP and URIs. These are commonly considered 'Web' rather than 'Semantic Web', but every proposed Semantic Web technology rests upon these Web fundamentals. URIs are the nouns of the semantic Web. HTTP are the verbs: GET, PUT and POST as well as a number of thoroughly tested solutions in the fields of authentication and encryption.

 

The Resource Description Framework (RDF) is the workhorse of the Semantic Web. It is a grammar for encoding relationships. An RDF triple has three components: a subject, a predicate (or verb), and object. Each can be expressed as a resource on the Web, that is a URI. This is far less ambiguous than encoding data in random XML documents. Compare the different ways of expressing a simple relationship in XML given in Listing 1 with the RDF triple in Listing 2 [33]. LISTING 1. AMBIGUOUS RELATIONSHIPS IN XML

 

page Rob page

 

Listing 2 shows the RDF triple. LISTING 2. EXPRESSING RELATIONSHIPS IN RDF

 

.

 

The relationship expressed in all the examples shown in Listing 1 [34] is 'Rob is the author of page'--a fairly simple statement--yet expressed in several ways in XML. It would be very difficult to build software that can derive that relationship from all the possible ways to express it in XML. But an RDF expresses that relationship in only one way, so it becomes feasible to build generic parsers.

 

In the early days of the Semantic Web, it was hoped that content producers would make all their content available in RDF and soon make a plethora of data available. Unfortunately, perhaps because the main XML expression of RDF looked unnecessarily complex, uptake was slow. More succinct RDF representations, like Notation3 (N3) and Terse RDF Triple Language (Turtle) are now available but have been unable to overcome the inertia. (For more on N3 and Turtle, see Resources [35].) A solution to the problem was inspired by the Microformats approach. With Microformats, semantic value is added to existing HTML content by using consistent patterns of standard HTML elements and attributes. Microformats exist for narrow but common items of data such as contact information and calendar items. The W3C equivalent is RDFa, RDF data embedded in XHTML. The implementation is slightly more complex than Microformats but it is far more generic--anything which you can express in RDF, you can add to XHTML documents using RDFa. Through this technique the Semantic Web can be bootstrapped by existing Web content.

 

Of course, the RDF embedded in XHTML documents as RDFa is no good for all the Semantic Web tools, which require RDF as input. There needs to be an automatic method to recognize the presence of RDFa content and extract the RDF out of it. The W3C solution for this is Gleaning Resource Descriptions from Dialects of Languages (GRDDL). The idea is that you run an existing XHTML document through an XSL transform to generate RDF. You can then link the GRDDL transform either through direct inclusion of references or indirectly through profile and namespace documents.

 


While unambiguously expressed semantics with RDF are good, even if everyone did that, it is of little use if you have no idea how the RDF from different sites is related. The RDF triple in Listing 2 [36] expressed an author relationship in the predicate, and while the meaning might seem obvious to you, computers still need some help. If you expressed an author relationship in an RDF file on your site, could the computer assume they were the same thing? What if you instead had a writer relationship in your RDF triple? What you need is a way to express a common vocabulary, to be able to say that my author and your author are the same thing, or that 'author' and 'writer' are analogous. On the Semantic Web this problem is solved by _ontologies_, and the W3C standard for expressing ontologies is the Web Ontology Language (OWL). OWL is a large subject in it's own right, and since you're only interested in applications of it in this article, see Resources [37] for more information about it.

 

Once you have some sources of data in RDF, and you have ontologies to let you determine the relationships between them, you need a way to get useful information out of them. The Simple Protocol and RDF Query Language (or SPARQL, pronounced 'sparkle') is an SQL-like syntax for expressing queries against RDF data, and the queries themselves look and act like RDF data. The fundamental paradigm for SPARQL is pattern matching and it is designed to work across the Web on data combined from disparate sources and to be flexible. For example, matches can be described as optional, which makes it much better than SQL at querying ragged data. _Ragged data_ has an unpredictable and unreliable structure, which is what you might expect to find if your data is combined from various sources on the Web rather than from a single well-contained SQL database.

 

BACK TO TOP [38]

 

Things you need to know when planning a Semantic Web site

 

As you've already seen, if you build the next great Web 2.0 site, you can save time if you plan from the start to embrace Semantic Web technologies and turn your Web site into an API, rather than create a separate API for your Web site. A Semantic Web approach gives you free API-like functionality. Usually an API is a way to get structured data, in XML or JSON format, out of an otherwise unstructured Web site. This leads to a dual approach: You have Web pages for human consumption and you have an API where computers can pull out structured information for automatic processing. However, this creates extra work for you; if you expect people to make use of your API, then you have to document it and support it and keep it synchronized with new features on your Web site. With a Semantic Web approach, your Web site is the structured data. You don't need a separate implementation. You and your users can take advantage of other Semantic Web tools to do automatic processing.

 

This does raise some issues for planning. With an API you are free to define your own data format for each item of information you want to deliver, and in the Semantic Web this is analogous to defining your own ontology. Ontology design can be a difficult thing to get right with little experience, so you should consider whether any of the large array of existing ones will be suitable for the types of data you plan to use, which will be discussed in the next section [39]. When you design an API, you also usually consider an object model for conceptual organization so developers can understand when they get collections of items or just items, and which collections their items belong in. On a Semantic Web site this will be partly determined by your ontology choices, but also by your URI scheme. Next, you'll look at approaches to making your URIs usable as part of your API.

 

Finally, on an existing Web site, you and your users can still benefit from the Semantic Web, if you update your content to take advantage of GRDDL, RDFa and Microformats.

 

BACK TO TOP [40]

 

Evaluate your data in the context of existing ontologies

 

A more complex part of the Semantic Web is to design an ontology that matches up to your data. Arriving at the right ontology is usually a critical element of successful implementation of Semantic Web projects. Fortunately, many ontologies already exist. Table 1 lists some of them. TABLE 1. SOME ONTOLOGIES IN USE ON THE WEB TODAY

 

Dublin Core This metadata element standard for cross-domain information resource description provides a simple and standardised set of conventions for describing things online in ways that make them easier to find.

 

SIOC Semantically-Interlinked Online Communities Project is an ontology that expresses the information contained both explicitly and implicitly in Internet discussion methods, such as blogs or forums mailing lists.

 

FOAF The Friend of a Friend ontology describes individuals, their activities and their relations to other people and objects. FOAF allows the description of social networks in a distributed fashion.

 

DOAP Description Of A Project is an ontology to describe open-source projects

 

ResumeRDF This ontology expresses a Resume or Curriculum Vitae (CV), including information such as work and academic experience or skills.

 

In addition, many ontologies are domain specific in fields such as technology, environmental science, chemistry and linguistics. These will apply to fewer Web sites than those listed above, however. A lot of your data is likely to fit into at least one of the areas covered by the ontologies in Table 1 [41], in which case you can incorporate them in your planning.

 

BACK TO TOP [42]

 

Choose a Semantic URI scheme

 

If your Web site is your API, then your URIs are the methods that programmers will access to get data. A sensible, succinct and consistent structure is therefore very important, and you need to think about it in advance because frequent changes after everything is launched will cost the goodwill of your target audience. You should also remember that the components of an RDF triple are usually URIs. To change them will invalidate most existing RDF which refers to your Web site.

 

In the early days of the Web, the structure of the URI usually reflected the organization of the files on a Web server. If you sold a particular type of widget among a collection of products, its URI might be similar to: http://www.mysite.com/products/gadgets/widget.html.

 

The advantage of this approach is that it is relatively semantically clear; if you also sold a doodad, then an obvious URI where you might expect to find the product details is: http://www.anothersite.com/products/gadgets/doodad.html.

 

The relationship between the widget and the doodad is fairly clear. The main problem is that this approach is inflexible; the categorization hierarchy is fixed.

 

As the Web advanced, dynamically generated sites became the norm. But while the sites became more flexible, with structure no longer tied to a particular layout of files, the amount of semantic information in the URI decreased. The page you are shown is determined by some rather cryptic information in the query string. For instance, the URI of the widget might be: http://www.mysite.com/inventory.cgi?pid=12345 and the URI of the doodad might be: http://www.mysite.com/inventory.cgi?pid=67890.

 

Suddenly the URI gives you very little semantic value. It's certainly not clear that these two products might be in the same category. More recently, content management systems and Web development frameworks have started to address this issue. Now it's much easier to have semantically structured URIs yet retain the flexibility of dynamic pages. This is achieved through URIs that refer not to a physical file on the server, but to content which can be delivered from a script or page in a different location. In the trend-setting Ruby on Rails framework. this is achieved through _routes_ (rules that map matching URLs to specific controllers and actions). In CMS packages, the feature usually depends on Apache's mod_rewrite (or equivalent on other Web servers) and is often referred to as "Search Engine Friendly URIs" or something similar. When you choose a CMS or development framework for your site, be sure to investigate what it is capable of in this regard.

 

One final note: If possible, consider removing file name extensions from your URIs. The filename extensions (.html and .cgi) provide no semantic information that is relevant to the user and actually cause problems in the long run. If you changed your Web site to use PHP instead of CGI scripts, you suddenly have different URIs but serve exactly the same content. This is bad for the semantic value of your URIs, as well as your Google ranking! A more semantically elegant method is to take advantage of the HTTP headers to do content negotiation. Consider the following URI: http://www.mysite.com/products/gadgets/widget.

 

A Web browser will generally indicate its preferred content type using the Accept HTTP header. When asked for this resource, the Web server can check that header, note that text/html is one of the options, and serve an HTML page. If you have a mashup application that wants RDF, then the Accept header in the HTTP request should contain application/rdf+xml and the Web server, from the same URI, can serve an RDF version of the page.

 

At present this content negotiation functionality is not available in many off-the-shelf CMS solutions, but in the short term it should be possible for a lot of them to use URIs without file extensions, which means you can add this functionality in the future without upsetting your URI scheme.

 

BACK TO TOP [43]

 

Take advantage of existing semantic add tools

 

Whether you fully embrace the Semantic Web in your Web site infrastructure, or just want to make your existing content more useful, there are probably several opportunities to add structure to existing content on your Web site. This is the domain of Microformats, RDFa and GRDDL. Table 2 lists the more common information types that you can easily mark up as structured data. TABLE 2. OPPORTUNITIES FOR STRUCTURED MARKUP AND AUTOMATIC TRANSFORMATION

 

INFORMATION TYPE STRUCTURED MARKUP

 

People and Organizations hCard, RDF vCard

 

Calendars and Events hCalendar, RDF Calendar

 

Opinions, Ratings and Reviews VoteLinks, hReview

 

Social Networks XFN, FOAF

 

Licenses rel-license

 

Tags, Keywords, Categories rel-tag

 

Lists and Outlines XOXO

 

Adding the structured markup to your page is fairly simple. Listings 3 [44] and 4 [45] below show a fragment of HTML containing contact information without, and then with, the additional markup required for the RDF vCard, respectively. LISTING 3. UNSTRUCTURED CONTACT INFORMATION

 

Rob Crowther. Web hacker at Example.org . You can contact me via e-mail or on my work phone at 0123 456789.

 

Listing 4 shows the contact information with additional markup required for the RDF vCard. LISTING 4. CONTACT INFORMATION USING VCARD

 

Rob Crowther. Web hacker at Example.org . You can contact me via e-mail or on my work phone at 0123 456789 .

 

In Listing 4 [46], you can see span elements added to delimit the semantically significant bits of text, and attributes that indicate what they mean. You added the namespace "contact" linked to the RDF VCard vocabulary. Next, you indicated that this element is _about_ the resource represented by the URI http://example.org/staff/robertc. Then, you added metadata using the rel attribute for link relationships and the property attribute on non-links. The only slightly complex part is the telephone because you need to specify a type as well as the number. To achieve this, you nest the type and value elements inside the tel element. Adding this structure allows users to add the contact details to their address book with a single click of the mouse.

 

Other automatic processing is possible with the other structured forms; for example, Technorati makes use of the rel-tag microformat to categorize its vast aggregation of blog posts. A rel-tag is shown in Listing 5, and as you can see, it is simply a link that makes use of the rel attribute. The significant part is the last bit of the URI, after the final /. This is the tag (using the normal URI encoding conventions where a space is represented by the plus sign). LISTING 5. REL-TAG FOR TECHNORATI FOR THE TAG \'SEMANTIC WEB\'

 

Semantic Web

 

If you write a blog post related to the Semantic Web that includes the code from Listing 5 [47] and then ping Technorati to let them know you made a new post (a lot of blog software can be configured to do this automatically), then their crawler will index your post and add a summary of it to the page that your tag element links to, along with any other posts with the same tag (see Figure 2). FIGURE 2. THE \'SEMANTIC WEB\' PAGE ON TECHNORATI, GENERATED FROM REL-TAG

 

BACK TO TOP [48]

 

Conclusion

 

SHARE THIS...

 

[49] Digg this story [50]

 

[51] Post to del.icio.us [52]

 

Slashdot it!

 

In this article, you saw how Semantic Web technologies address the need for structured data on the Web in a standard and consistent manner, in contrast to the currently popular method of each Web site defining their own API. You looked at how the Semantic Web technologies add value in layers on top of the HTTP and URIs of the existing Web, first allowing the unambiguous expression of relationships with RDF, and then allowing for shared meaning with OWL based ontologies and finally querying the distributed Web of knowledge using SPARQL. The article also looked at how you can take advantage of existing ontologies to define what your data is and use a semantic URI scheme to enable your Web site to also be your API. Finally the article looked at how you can upgrade the content of your existing Web site using RDFa and Microformats so that GRDDL services can automatically extract RDF from your pages.

 

Although the promise of Tim Berners-Lee's Semantic Web is yet to be fully realized, the years of thinking and research that have gone into it are starting to bear fruit in terms of solutions to practical problems that people face today. The strong collaboration trends in Web 2.0 will only lead to more requirements for structured and semantically encoded data being available on the Web. With some planning, you can be in position to take advantage of the Semantic Web tools which help meet that need.

 

ResourcesLEARN

 

The ultimate mashup--Web services and the semantic Web [53] (Nicholas Chase, developerWorks, August 2006): Practice using Semantic Web techniques with this six-part tutorial series.

 

Introduction to Jena: Use RDF models in your Java applications with the Jena Semantic Web Framework [54]( Philip McCarthy, developerWorks, June 2004): Find out how to use the Jena Semantic Web Toolkit to exploit RDF data models in your Java applications.

 

Programmable Web [55]: Stay up to date with the latest on mashups and the new Web 2.0 APIs.

 

The Structured Web - A Primer [56]: Read a general introduction to the value of structured data.

 

The W3C\'s RDF Primer [57]: Learn the basics of RDF and how to use it effectively.

 

A Semantic Web Primer for Object-Oriented Software Developers [58]: Read how to use Ontologies, such as RDF Schema and OWL, in the context of OOP.

 

The W3C\'s OWL Overview [59]: Get an understanding of what OWL can do for apps that process information content instead of just presenting it to humans.

 

The SPARQL Query Language for RDF [60] specification: Explore the syntax and semantics of this query language for RDF.

 

Notation3 [61]: Read about N3, a compact and readable alternative to RDF's XML syntax.

 

Terse RDF Triple Language [62]: Check out Turtle, a textual syntax for RDF that writes RDF graphs in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Turtle is compatable with existing N-Triples and Notation 3 formats and the triple pattern syntax of SPARQL.

 

Cool URIs for the Semantic Web [63]: Read guidelines for effective URIs as the link between RDF and the semantic Web.

 

University of Southampton Department of Electronics and Computer Science [64]: See a semantic Web site in action.

 

RDFa [65] or Microformats [66]: Embed semantic information in your Web pages.

 

IBM XML certification [67]: Find out how you can become an IBM-Certified Developer in XML and related technologies.

 

XML technical library [68]: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

 

developerWorks technical events and webcasts [69]: Stay current with technology in these sessions.

 

The IBM developerWorks XML zone [70]: Learn more about XML and the Semantic Web.

 

The technology bookstore [71]: Browse for books on these and other technical topics.

 

GET PRODUCTS AND TECHNOLOGIES

 

IBM trial software [72]: Build your next development project with trial software available for download directly from developerWorks.

 

DISCUSS

 

Participate in the discussion forum [73].

 

XML zone discussion forums [74]: Participate in any of several XML-related discussions.

 

developerWorks XML zone: Share your thoughts: [75] After you read this article, post your comments and thoughts in this forum. The XML zone editors moderate the forum and welcome your input.

 

developerWorks blogs [76]: Check out these blogs and get involved in the developerWorks community [77].

 

About the author

 

Rob Crowther is a Web Developer from London. He has a keen interest in Web Standards and blogs sporadically at http://www.boogdesign.com/b2evo/ [78].

 

Rate this page

 

Please take a moment to complete this form to help us better serve you.

 

Did the information help you to achieve your goal?

 

Yes No Don't know

 

 

 

Please provide us with comments to help improve this page:

 

 

 

 

 

How useful is the information?

 

1 2 3 4 5

 

Not useful

 

Extremely useful

 

 

 

Share this....

 

Digg this story

 

del.icio.us [79]

 

Slashdot it!

 

BACK TO TOP [80]

 

About IBM [81] Privacy [82] Contact [83] Terms of use [84]

 

 

 

Google
 


All content mirrored on this site is generated from the open-source browser, Lynx. Ads are included simply to pay for our bandwidth in supplying this free service. All content is copyright the original owner (see mirror of: in the header).