Closed Bug 73992 (dublinCore) Opened 24 years ago Closed 15 years ago

Page Info dialog should support Dublin Core metadata

Categories

(SeaMonkey :: Page Info, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME
Future

People

(Reporter: karl, Assigned: db48x)

References

()

Details

(Keywords: helpwanted)

The View Page Info dialog should support Dublin Core metadata <URL: 
http://dublincore.org/documents/1999/07/02/dces/ >. Dublin Core defines 15 
elements (these are not elements in the SGML/XML sense):

Title: A name given to the resource.
Creator: An entity primarily responsible for making the content of the resource.
Subject: The topic of the content of the resource.
Description: An account of the content of the resource.
Publisher: An entity responsible for making the resource available.
Contributor: An entity responsible for making contributions to the content of 
the resource.
Date: A date associated with an event in the life cycle of the resource.
Type: The nature or genre of the content of the resource.
Format: The physical or digital manifestation of the resource.
Identifier: An unambiguous reference to the resource within a given context.
Source: A Reference to a resource from which the present resource is derived.
Language: A language of the intellectual content of the resource.
Relation: A reference to a related resource.
Coverage: The extent or scope of the content of the resource.
Rights: Information about rights held in and over the resource.

This metadata is included in the HTML like this as defined in <URL: 
http://www.ietf.org/rfc/rfc2731.txt >:

       <meta name    = "DC.Creator"
             content = "Engels, F.">
       <meta name    = "DC.Title"
             content = "Capital">

       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.1/">

Dublin Core also has a set of qualifisers <URL: 
http://dublincore.org/documents/dcmes-qualifiers/ >, which "narrow" the meaning 
of different elements. Example:

    <meta name    = "DC.Date.Created"
          content = "1998-05-14">

    <meta name    = "DC.Date.Available"
          content = "1998-05-21">

    <meta name    = "DC.Date.Valid"
          content = "1998-05-28">

More examples can also be found in <URL: 
http://dublincore.org/documents/2000/07/16/usageguide/qualified-html.shtml > 
(not normative). Note that an element can be repeated several times (e.g., when 
there's several authors). All elements should be displayed in the UI, possible 
using several tabs/categories.

All of these should be support in the page info dialog. Here's a 
*complete* "walk-through" of what we can expect to find in HTML documents, 
including all qualifiers (we should support these, and noone else) and schemes:


TITLE:

    <meta name    = "DC.Title"
          content = "Hamlet in Iceland; being the Icelandic romantic Ambales 
saga">

    <meta name    = "DC.Title.Alternative"
          content = "Ambales saga">

    <meta name    = "DC.Title"
          lang    = "nn"
          content = "Hamlet på Island&#xa0;&ndash; Ambales saga">

Note the 'Alternative' qualifier. This should be marked as such in the UI.

The language of the to first titles are defined by the document, i.e.:

    <html xml:lang="en"> (or <head ...> or another parent)
    <html lang="en">
    HTTP header 'Content-Language'
    <meta http-equiv="Content-Language" content="en">

xml:lang overrides lang which overrides HTTP header which overrides meta http-
equiv. (This is the normal way of getting the language of an element/attribute -
- inheritance. I don't know if this information is available in Mozilla, but it 
*should* be, as CSS 2 requires it.)

Language can be explicitly defined on each 'meta' element or implicitly, by 
inheritance from the parent or HTTP header.


CREATOR:

    <meta name    = "Creator"
          content = "Hufthammer, Karl Ove">

The creator name is usally written in the form 'Last name, First Name', but not 
always, e.g.:

	<meta name    = "DC.Creator"
          content = "Mao Tse Tung">

They should *always* be displayed as 'First Name Last Name', e.g. 'Hufthammer, 
Karl Ove' should be displayed as 'Karl Ove Hufthammer'.


SUBJECT:

    <meta name    = "DC.Subject"
          content = "heart attack">

    <meta name    = "DC.Subject"
          scheme  = "MeSH"
          content = "Myocardial Infarction; Pericardial Effusion">

    <meta name    = "DC.Subject"
          content = "Vietnam War">

    <meta name    = "DC.Subject"
          scheme  = "LCSH"
          content = "Vietnamese Conflict, 1961-1975">

    <meta name    = "DC.Subject"
          content = "Friendship">

Note the 'scheme' attribute. This can take one of the values:

LCSH
MeSH
DDC
LCC
UDC

When presented in the UI, the name of the scheme should also be shown, but 
expanded to the following (not including the text in []):

Library of Congress Subject Headings
Medical Subject Headings [See <URL: http://www.nlm.nih.gov/mesh/meshhome.html >]
Dewey Decimal Classification [See <URL: http://www.oclc.org/dewey/index.htm >]
Library of Congress Classification [See <URL: 
http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html >]
UDC [See <URL: Universal Decimal Classification >]


DESCRIPTION:

    <meta name    = "DC.Description"
          content = "A tutorial and reference manual for Java.">

    <meta name    = "DC.Description.TableofContents"
          lang    = "en"
          content = "The Author gives some Account of Himself and Family
                     -- His First Inducements to Travel -- He is
                     Shipwrecked, and Swims for his Life -- Gets safe on
                     Shore in the Country of Lilliput -- Is made a
                     Prisoner, and carried up the Country">

    <meta name    = "DC.Description.Abstract"
          content = "The kinematics of the jaws and hyolingual apparatus in 
                     Caiman crocodilus were examined by cineradiography and
                     electromyography. After catching, caimans position their 
                     prey between the teeth by a series of inertial bites and 
                     then kill and crush it by a forceful bite.">

Note the 'TableofContents' and 'Abstract'. These should be marked as such in 
the UI.


PUBLISHER:


    <meta name    = "DC.Publisher"
          content = "O'Reilly">

    <meta name    = "DC.Publisher"
          content = "Digital Equipment Corporation">

This is pretty straigt-forward. There could be more than one publisher of a 
document.


CONTRIBUTOR:

    <meta name    = "DC.Contributor"
          content = "Curie, Marie">

Again, pretty straigt-forward.


DATE:

    <meta name    = "DC.Date"
          scheme  = "W3CDTF"
          content = "1998-05-14">

    <meta name    = "DC.Date.Created"
          scheme  = "W3CDTF"
          content = "1998-05-14">

    <meta name    = "DC.Date.Available"
          content = "1998-05-21">

    <meta name    = "DC.Date.Valid"
          scheme  = "W3CDTF"
          content = "1998">

    <meta name    = "DC.Date.Valid"
          scheme  = "W3CDTF"
          content = "1999-09-25T14:20+10:00/">

    <meta name    = "DC.Date.Issued"
          scheme  = "W3CDTF"
          content = "1998-05-29">

    <meta name    = "DC.Date.Modified"
          scheme  = "W3CDTF"
          content = "1998-05-29">

Note the qualifiers 'Created', 'Available', 'Valid', 'Issued' and 'Modified'.
If the value of 'Created' and 'Modified' isn't available, they can be taken from
the HTTP headers.

The W3CDTF scheme is basically ISO 8601, and is specified in <URL: 
http://www.w3.org/TR/NOTE-datetime >. This is the default is no scheme is 
specified.

There is also a scheme="Period" defined in <URL: 
http://dublincore.org/documents/dcmi-period/ >, though this can't be used as an 
attribute value (as far as I can see).


TYPE:

    <meta name    = "DC.Type"
          scheme  = "DCMIType"
          content = "Software">

    <meta name    = "DC.Type"
          scheme  = "DCMIType"
          content = "Dataset">

    <meta name    = "DC.Type"
          scheme  = "DCMIType"
          content = "Event">

    <meta name    = "DC.Type"
          scheme  = "DCMIType"
          content = "Service">

The DCIMType scheme is defined in <URL: http://dublincore.org/documents/dcmi-
type-vocabulary/ >. There are nine different DCMI types. There can be a button 
to get a description of a type, or this can be presented as a tooltip 
(localizable of course). For 'Service':

A service is a system that provides one or more functions of value to the end-
user. Examples include: a photocopying service, a banking service, an 
authentication service, interlibrary loans, a Z39.50 or Web server.


FORMAT:

    <meta name    = "DC.Format.Medium"
          scheme  = "IMT"
          content = "text/xml">

    <meta name    = "DC.Format.Extent"
          content = "14 minutes">

    <meta name    = "DC.Format"
          content = "A text file with mono-spaced tables and diagrams.">

    <meta name    = "DC.Format"
          content = "video/mpeg; 14 minutes">

The IMT scheme is defined in <URL: http://www.isi.edu/in-
notes/iana/assignments/media-types/media-types >.


IDENTIFIER:

    <meta name    = "DC.Identifier"
          scheme  = "URI"
          content = "http://catalog.loc.gov/67-26020">

The URI scheme is defined in <URL: http://www.ietf.org/rfc/rfc2396.txt >. All 
URIs should be clickable.
(An identifier is *not* and shoulnd not be treated as an URI unless the 'URI' 
scheme is used, even though it has the form of a valid URI. We should always 
honor the scheme and never assume a particular scheme is used if it isn't 
explicitly defined in the 'meta' element (an exception is 'DC.Date' which is 
W3C Date/Time if no scheme is chosen).)


SOURCE:

    <meta name    = "DC.Source"
          content = "Shakespeare's Romeo and Juliet">

    <meta name    = "DC.Source"
          scheme  = "URI"
          content = "http://a.b.org/manon/">

The scheme 'URI' is a URI. The default is plain text.


LANGUAGE:

    <meta name    = "DC.Language"
          scheme  = "rfc1766"
          content = "en">

    <meta name    = "DC.Language"
          scheme  = "ISO639-2"
          content = "eng">

    <meta name    = "DC.Language"
          scheme  = "rfc1766"
          content = "en-US">

ISO639-2: <URL: http://lcweb.loc.gov/standards/iso639-2/langhome.html >.
RFC 1766: <URL: http://www.ietf.org/rfc/rfc1766.txt >.

The name, not the language code of the language should be displayed in the UI. 
Mozilla already has a list of language code/language name pairs 
(see 'Preferences' | 'Language') built-in.

Also, for backwards compatibility, 'ISO639-1' should be treated as synonym 
for 'rfc1766'. (The Nordic Metadata Template uses this.)

When no language is specified, the language should be taken from the HTTP 
header, a http-equiv meta element or lang="xx" or xml:lang="xx" on the 'html' 
element (it should not be taken from any other elements -- only language 
specified on the top-level element defines the document language).


RELATION:

    <meta name    = "DC.Relation.IsVersionOf"
          scheme  = "URI"
          content = "http://foo.bar.org/draft9.4.4.2">

    <meta name    = "DC.Relation.HasVersion"
          scheme  = "URI"
          content = "http://foo.bar.org/draft9.4.4.2">

    <meta name    = "DC.Relation.IsReplacedBy"
          scheme  = "URI"
          content = "http://foo.bar.org/draft9.4.4.2">

    <meta name    = "DC.Relation.Replaces"
          scheme  = "URI"
          content = "http://foo.bar.org/draft9.4.4.2">

    <meta name    = "DC.Relation.IsRequiredBy"
          scheme  = "URI"
          content = "http://foo.bar.org/draft9.4.4.2">

    <meta name    = "DC.Relation.Requires"
          content = "LWP::UserAgent; HTML::Parse; URI::URL;
                     Net::DNS; Tk::Pixmap; Tk::Bitmap; Tk::Photo">

    <meta name    = "DC.Relation.IsPartOf"
          scheme  = "URI"
          content = "http://foo.bar.org/abc/proceedings/1998/">

    <meta name    = "DC.Relation.HasPart"
          scheme  = "URI"
          content = "http://foo.bar.org/abc/proceedings/1998/">

    <meta name    = "DC.Relation.IsFormatOf"
          scheme  = "URI"
          content = "http://foo.bar.org/cd145.sgml">

    <meta name    = "DC.Relation.IsReferencedBy"
          scheme  = "URI"
          content = "http://foo.bar.org/cd145.sgml">

    <meta name    = "DC.Relation.References"
          content = "urn:isbn:1-56592-149-6">

    <meta name    = "DC.Relation.IsFormatOf"
          content = "Shakespeare's Romeo and Juliet">

    <meta name    = "DC.Relation.HasFormat"
          scheme  = "URI"
          content = "Shakespeare's Romeo and Juliet">

The scheme 'URI' is a URI. The default is plain text.

I *think* I remembered all qualifers. Description of them can be found at <URL: 
http://dublincore.org/documents/dcmes-qualifiers/#relation >.


COVERAGE:

    <meta name    = "DC.Coverage.Temporal"
          content = "US civil war era; 1861-1865">

    <meta name    = "DC.Coverage.Temporal"
          scheme  = "W3CDTF"
          content = "1998">

    <meta name    = "DC.Coverage.Spatial"
          content = "Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W">

    <meta name    = "DC.Coverage.Spatial"
          scheme  = "TGN"
          content = "Columbus (C,V)">

Note to author: This is the spatial or temporal features of the intellectual 
content. A document about the Eiffel Tower, written in English, by a Norwegian, 
living in Turkey, stored on a server in Brazil should have a coverage 
of 'Paris' or 'France' or the equivalent geographical coordinates.

This has the qualifiers 'Temporal' and 'Spatial'. There are tons of schemes for 
these. See <URL: http://dublincore.org/documents/dcmes-qualifiers/#coverage >.


RIGHTS:

    <meta name    = "DC.Rights"
          lang    = "en"
          content = "Copyright Acme 1999 - All rights reserved.">

    <meta name    = "DC.Rights"
          scheme  = "URI"
          content = "http://foo.bar.org/cgi-bin/terms">


*** IMPORTANT ***

The 'DC' "elements" should *only* be recognized if one of the following lines 
are included in the HTML document:

       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.1/">

       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.0/">

       <link rel     = "schema.DC"
             href    = "http://purl.org/metadata/dublin_core_elements">

       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.0/#fragement-identifier">

(You can compare these to 'namespaces' in XML.)

The 'http://purl.org/metadata/dublin_core_elements' should only be supported 
for backwards compatibility and its use is discouraged. *All* URLs can contain 
fragment identifiers, e.g.:

       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.1/#date">

Here, only the 'date' element should be supported.

I'm not completely sure of this, but I *think* using

       <link rel     = "schema.TEST"
             href    = "http://purl.org/DC/elements/1.1/">

should enable:

        <meta name    = "TEST.Description"
             content = "A tutorial and reference manual for Java.">

to work (i.e., the prefix has no meaning in it self, only when connected to 
a "namespace").
Blocks: 68410
Blocks: 52730
Karl: db48x is rewriting Page Info, as far as I know. You may want to contact 
him. The word infobot in #mozillazine has his email address.

Gerv
What if someone has:
       <link rel     = "schema.DC"
             href    = "http://purl.org/DC/elements/1.2/">
[and that url exists]
> What if someone has:
>       <link rel     = "schema.DC"
>             href    = "http://purl.org/DC/elements/1.2/">

Hmm, you have a point. OK, I think we safely can assume that all schemas 
beginning with "http://purl.org/DC/elements/" is part of the the DC. (The DC is 
pretty stable, and I doubt it will change much.)

> [and that url exists]

404?
well i picked 1.2 because you didn't list it, but i'm assuming it doesn't exist 
yet (or you would have). if someone references 1.2 and it returns 404 do we 
still honor it? [I was more concerned w/ honoring a present file that matched 
the naming convention but not your list -- thanks for your revised answer, 
it's much more acceptable]
Keywords: helpwanted
This information will be already be picked up by the new page info stuff when it
lists the contents of all meta tags. I'm just displaying the contents of the
name and content/http-equiv attributes, so it won't pretty print anything. Will
this be enough for the shipping mozilla? Anything further is easily included as
an extension with an overlay. That overlay could modify the contents of the tree
I'm showing meta tags in, or add a completely seperate tab for displaying the DC
metadata. I'd recomend the latter.

db48x

Now if I could just get mozilla to stop crashing on form submission...

> I'm just displaying the contents of the
> name and content/http-equiv attributes, so it won't
> pretty print anything. Will
> this be enough for the shipping mozilla?

Well, it will be better than nothing, but it won't be enough for marking this 
bug as 'fixed' (anymore than displaying a DOM tree of an HTML document can be 
seen as *supporting* HTML).
Dublin Core metadata should also be accessible through the W3C's RDF
recommendation, like so:

<html><head>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/metadata/dublin_core#">
<rdf:Description about="http://www.dlib.org">
<dc:Title>D-Lib Program - Research in Digital Libraries</dc:Title>
<dc:Description>The D-Lib program supports the community of people
     with research interests in digital libraries and electronic
     publishing.</dc:Description>
<dc:Publisher>Corporation For National Research Initiatives</dc:Publisher>
<dc:Date>1995-01-07</dc:Date>
<dc:Subject>
<rdf:Bag>
<rdf:li>Research; statistical methods</rdf:li>
<rdf:li>Education, research, related topics</rdf:li>
<rdf:li>Library use Studies</rdf:li>
</rdf:Bag>
</dc:Subject>
<dc:Type>World Wide Web Home Page</dc:Type>
<dc:Format>text/html</dc:Format>
<dc:Language>en</dc:Language>
</rdf:Description>
</rdf:RDF>
</head></html>

or the RDF abbreviated syntax:

<html><head>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/metadata/dublin_core#">
<rdf:Description about="http://www.dlib.org"
    dc:Title="D-Lib Program - Research in Digital Libraries"
    dc:Description="The D-Lib program supports the community of people
     with research interests in digital libraries and electronic
     publishing."
    dc:Publisher="Corporation For National Research Initiatives"
    dc:Date="1995-01-07"/>
</rdf:RDF>
</head></html>

or through links to external RDF files:

<link rel="meta" href="mydocMetadata.DC.RDF">

...These examples were taken from the W3C recommendation at
http://www.w3.org/TR/REC-rdf-syntax/ -- see that document for more details.

I know that Mozilla has support for RDF datasources, I don't know how different
this usage of RDF is from the current implementation.  As support for RDF as a
vehicle for metadata and the "Semantic Web" grows, Mozilla needs to be able to
put it to use.
No longer blocks: 52730
Status: NEW → ASSIGNED
Target Milestone: --- → Future
mass moving open bugs pertaining to page info to pmac@netscape.com as qa contact.

to find all bugspam pertaining to this, set your search string to
"BigBlueDestinyIsHere".
QA Contact: sairuh → pmac
Component: XP Apps: GUI Features → Page Info
Alias: dublinCore
Assignee: bugs → db48x
Status: ASSIGNED → NEW
QA Contact: pmac
Product: Browser → Seamonkey
Bug 268343 is about Live Bookmarks better supporting Dublin Core metadata, related?
Test cases at http://www.codestyle.org/test/DCTestCases.shtml
WFM with the current pageinfo implementation in Build identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.1b4pre) Gecko/20090422 SeaMonkey/2.0b1pre

All the attributes show up in the General->Meta list.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
This is a TEST!!!
You need to log in before you can comment on or make changes to this bug.