Documentation: Difference between revisions

From Almeda
mNo edit summary
m →‎Works: added works in parts (columns)
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Decisions ==
== Decisions ==


=== Works ===
=== '''Author description''' ===
All creative works and versions (even one-version works) will be modelled at two levels: Work and Version, Edition, and Translation, or in the case of performed works, Work and Performance.
The description field for an Author has the formula: Country + occupation (e.g. Ghanaian writer). If only ALMEDA data is available, we add to the description "writer with published work in [name of the publication]".
 
All Scholarly Works will only be a reference ''or'' when need to use ‘stated_in’ we will create an item which only has the level of the version
 
The subclasses of Work are: Creative Work and Scholarly Work. Their subclasses are:
{| class="wikitable"
|+
!Creative Work
|-
|Anthology (under discussion)
|-
|Discursive, instructional or educational work
|-
|Dramatic work
|-
|Everyday expression
|-
|Film
|-
|Graphic Work
|-
|Literary Work
|}
{| class="wikitable"
|+
!Scholarly Work
|-
|Edited Collection (under discussion)
|}


=== '''Author name string''' ===
=== '''Author name string''' ===
Line 44: Line 16:
=== '''Awards''' ===
=== '''Awards''' ===
Multiple awards given under the auspice of one organisation (e.g The Bank Windhoek Doek Literary Award for Fiction, The Bank Windhoek Doek Literary Award for Poetry etc), are grouped under an item (e.g. The Bank Windhoek Doek Literary Award), which is given as an instance of ‘Group of awards’. 'Group of awards' is an ' instance of ALMEDA category'.  
Multiple awards given under the auspice of one organisation (e.g The Bank Windhoek Doek Literary Award for Fiction, The Bank Windhoek Doek Literary Award for Poetry etc), are grouped under an item (e.g. The Bank Windhoek Doek Literary Award), which is given as an instance of ‘Group of awards’. 'Group of awards' is an ' instance of ALMEDA category'.  
=== Content type cartoons/comic strips ===
Use 'image'


=== Country of citizenship ===
=== Country of citizenship ===


All countries referred to in any biographical information on authors is included. Database notes on usage will include the proviso: 'The legal status of any given person's citizenship cannot be guaranteed.' 
All countries referred to in any biographical information on authors is included. Database notes on usage will include the proviso: 'The legal status of any given person's citizenship cannot be guaranteed.
 
=== Deducing occupation from 'Form of creative work' ===
For the form of the creative work of fotonovella the occupation is writer and the photographer (if different) is illustrator (see below)
 
=== Excerpts ===
Excerpt is stated under the originality of the item. An excerpt is an instance of ''version, edition, translation''. Add original work if possible, otherwise add "to be defined" under statement ''edition or translation of''.
 
=== Genre ===
Genre will not routinely be added to ALMEDA records. If datasets include this column it can be included under P44.
 
=== Human settlement ===
We have included all human settlements in Africa with a population of 500 and above (with the exception of duplicate names of human settlements that we could not assign to a specific district). All capital cities of non-African countries are included, as are all human settlements with populations above 5000 in countries that have the following national languages: Dutch, English, French, German, Italian, Portuguese, and Spanish. All other human settlements will be added as the need arises.
 
=== Inception date ===
The property latest date has been added so that we can now express f.e  born on or before 1960 or published on or before 1960 by using the Almeda property latest date P147 as qualifier. For an example with a (folk) tale see: [[Item:Q71510|https://wikibase.almeda.engelska.uu.se/wiki/Item:Q71510]]. Similarly, a property ''earliest date'' has been added to express on or after this date.
 
=== Illustrator ===
To be used for the creator any graphic medium used for the purpose of illustrating a text (such as an artist, an illustrator or a photographer). If one person has written and drawn the text, add them as both writer and illustrator of the work. When adding the related item on the creator, add occupation: illustrator, photographer etc, as applicable.
 
=== '''Items with missing information''' ===
Any collective will be an instance of collective agent (Q27754), i.e. Drum authors. Individual members will be connected to the item of the collective agent via a TBD property.
 
Works without authors will have "to be defined" as author, and description of the work would be "work by unattributed author"
 
Works without a title will get an invented title with square bracket [ ]. To make to the title, one can use the information that is available, such as the first line of the poem, otherwise title will be similar to the description. See f.e. [[Item:Q64141|https://wikibase.almeda.engelska.uu.se/wiki/Item:Q64141]]. The anthology does not mention the original Juǀ’hoansi title and I couldn’t figure it out online, so I created a title myself and used square brackets to indicate that it is not appearing as such on the source.
 
In bibliographic, academic, and cataloging contexts, square brackets [ ] are used to indicate information that has been added or supplied by the researcher, editor, or cataloger, rather than information appearing on the source material itself.
 
=== '''Literary adaptation''' ===
=== '''Literary adaptation''' ===


Line 53: Line 56:


In the case of a Literary Adaptation, be sure to enter 'Originality of the item' as 'adaptation' in the version of the Filmic work
In the case of a Literary Adaptation, be sure to enter 'Originality of the item' as 'adaptation' in the version of the Filmic work
=== Performances ===
The modelling of performances of a work follows the LRMoo model in which the work is linked to the performance via the property ''is performed in'' (P146) and the performance is linked to the work via the property ''performed'' (P145). (e.g. the work ''Arme hulpbehoewende monsters!'' Q27324 and its performance Q74133)


=== Reconciliation ===
=== Reconciliation ===


Works are considered likely to be by the same person of the same name when: the date of birth of the author aligns with the date of publication; the same name (initials etc.) are repeated in same venue of publication; the same creative form is produced in a contiguous publication (e.g. poetry in Swahili appearing in two Swahili newspaper venues, or crime stories in English appearing in two popular South African magazines). 
Works are considered likely to be by the same person when the following conditions are met: the name or author string is the same; the date of birth of the author aligns with the date of the publication; the same name (initials etc.) are repeated in same venue of publication; the same creative form is produced in a contiguous publication (e.g. poetry in Swahili appearing in two Swahili newspaper venues, or crime stories in English appearing in two popular South African magazines). 


=== '''References''' ===
=== '''References''' ===
Line 64: Line 70:


Provenance indicates whether the data has been donated or retrieved and from what source. An item on the source is created.
Provenance indicates whether the data has been donated or retrieved and from what source. An item on the source is created.
=== Reference URL (P10) versus full work available at URL (P62) ===
Reference URL is used for provenance, whereas full work available at URL is used for providing the users to the place where the full work is available.


=== '''Serialisation''' ===
=== '''Serialisation''' ===
Line 69: Line 78:
Creative works serialised in Serial Works only use classes under creative work for ‘instance of’ (e.g. dramatic work, or literary work). The version record then includes: 'originality of the item': 'born serial'. 
Creative works serialised in Serial Works only use classes under creative work for ‘instance of’ (e.g. dramatic work, or literary work). The version record then includes: 'originality of the item': 'born serial'. 


The class 'Serial Work' refers to the container level: i.e. the magazine 'Drum Magazine' or the Newspaper 'Taifa Weekly'. 
The class 'Serial Work' refers to the container level: i.e. the magazine 'Drum Magazine' or the Newspaper 'Taifa Weekly'.
 
Difference between serialized works with continuous content (one item with parts) and works with repeated titles but different content. In the latter case, each work will get a unique item, with the same label/title but different description.
 
If the serialization crosses multiple years, the first and last issue get the qualifier point in time along side 'page(s)' and 'object of statement has role' 


=== Title in another language ===
Title is used as label in the original language and in the English language with square brackets (Ask Ursula)
Ideally we would put the title in the language of the work as the default value to Almeda (see https://www.wikidata.org/wiki/Help:Default_values_for_labels_and_aliases and https://www.wikidata.org/wiki/Q18507561). Since this is not implemented yet, we add the non-English title to both the English label and the non-English label. See f.e. [[Item:Q71488]]
=== Translated works ===
=== Translated works ===
The Work field for a translated work (where the original language text is not available) will be labelled with the translation within square brackets. E.g. Author: 'Luis Carlos Patraquim', Title: [Summer], Translator: 'Luis Raphael'. The version level of the English translation of this Portuguese text uses the translated title (not in square brackets). (e.g links to be uploaded)
The Work field for a translated work (where the original language text is not available) will be labelled with the translation within square brackets. E.g. Author: 'Luis Carlos Patraquim', Title: [Summer], Translator: 'Luis Raphael'. The version level of the English translation of this Portuguese text uses the translated title (not in square brackets). (e.g links to be uploaded)


=== Versions of works ===
=== Using the entity 'unknown' (Q29141) vs the entity 'to be defined' (Q17817) ===
In the event that data may still be findable on a particular statement, or when ALMEDA plans to make a concerted effort to add this data, use 'to be defined'. When the data can never be ascertained, use 'unknown'.
 
=== Version, Edition, Translation ===
The description field for a Version, Edition, Translation, has the formula: Date of publication, Title of work: e.g. 1987 edition of When Rain Clouds Gather.
 
When a version of a work is published in a periodical that crosses multiple months and/or years, the first given date is given. e.g.  
When a version of a work is published in a periodical that crosses multiple months and/or years, the first given date is given. e.g.  
{| class="wikitable"
{| class="wikitable"
Line 98: Line 120:
|}
|}
The year is given as 1982 and the Month as December.
The year is given as 1982 and the Month as December.
=== Work flow for uploading data on Literary/Small Magazines: ===
==== Prior to uploading: ====
# PI to check data follows the templates given to researchers and data collectors
# EITHER, Data uploader to clean the data and send major problems back to the PI for referral to researcher or data collector, OR, Dataset goes back to PI for publishing in ZENODO
# DOI created for ZENODO publication and sent back to data uploader to use as reference
# Data uploader proceeds with upload
==== Process of uploading: ====
(see documentation by GA)
==== Post uploading: ====
Once a data set has been uploaded, a brief report is sent to the PI marking the upload as complete. This report includes a list of a list of author name strings that will be sent out by the PI for further research/assigned to student interns.
Authors without existing data to be routinely checked by PI and assigned to student interns for enrichment.
=== Works ===
All creative works and versions (even one-version works) will be modelled at two levels: Work and Version, Edition, and Translation, or in the case of performed works, Work and Performance.
All Scholarly Works will only be a reference ''or'' when need to use ‘stated_in’ we will create an item which only has the level of the version
The subclasses of Work are: Creative Work, Scholarly Work, Edited Collection, and Serial Work. The subclasses of the Creative Work are:
{| class="wikitable"
|+
!Creative Work
|-
|Discursive, instructional or educational work
|-
|Dramatic work
|-
|Everyday expression
|-
|Graphic Work
|-
|Literary Work
|}
When writing a description for a Work, the formula is: Form of Creative Work plus name of Author. E.g. A novel by Ngũgĩ wa Thiong’o
Scholarly Works, Edited Collections and Serial Works do not have subclasses.
=== Works published in parts ===
Serialisations: Publication date will have a start date (first part) and end date (last part), number of parts of this work, issue has two, first and last issue of series. See Magaidi wa Dr. Shulla: [[Item:Q28743]].
Untitled columns/regular features/cartoon strips in magazines and newspapers: create one work and enter dates and editions under one VET as per example: (ADD GIRL ABOUT TOWN e.g. when complete)


=== '''Writing systems''' ===
=== '''Writing systems''' ===
Line 103: Line 171:
Writing systems should be noted when the item is not in Roman/Latin script (e.g. Karibu Mwambie: https://wikibase.almeda.engelska.uu.se/wiki/Item:Q21781). Writing system is classified as an 'ALMEDA category'.
Writing systems should be noted when the item is not in Roman/Latin script (e.g. Karibu Mwambie: https://wikibase.almeda.engelska.uu.se/wiki/Item:Q21781). Writing system is classified as an 'ALMEDA category'.


== Topics awaiting resolution ==
For a Federated search: If we don’t use literary award, and just award, do we lose anything? this determines whether we call awards 'awards' or include a sub-class of 'literary award'.
Events: define event vs activity/performance and consolidate the property list


'Default for language' field: can we include this in the language field of the label/description box?


Inception: used for work level, but can only be presumed in almost every case. Do we either 1. delete it, 2. put in 'sourcing circumstances' as 'presumed', 3. use 'publication date' as first known published version of the work, or 4. Put in the date under the statement 'inception' and write a proviso for the the whole dataset that these are all presumed?


== Topics awaiting resolution ==
A similar matter to Inception, Country of Citizenship can be found for many instances of 'sex/gender' , where the GaB of the author is assumed.
For a Federated search: If we don’t use literary award, and just award, do we lose anything? this determines whether we call awards 'awards' or include a sub-class of 'literary award'.
 
Using 'main subject' in both the work and version levels: this is a question of search ability (which links to issue of inception, too. If the link between the work and its version enables search across the levels, then these resolve more easily)
 
Data on publishers?

Latest revision as of 15:12, 2 March 2026

Decisions

Author description

The description field for an Author has the formula: Country + occupation (e.g. Ghanaian writer). If only ALMEDA data is available, we add to the description "writer with published work in [name of the publication]".

Author name string

Author name string is used for records on authors where only names that are clearly aliases or fictional names, (such as ‘Badger’ or ‘Man'J Mam '96’), initials or abbreviated names are given in the source material. 

In such cases, Author name string is applied to both the Work and Version records. 

Until such time as the identity of the author is known to the project, 'author name string' will remain in the record

When only initials are given, then the name of the publication of the work is given in square brackets behind the author string. (e.g. C. M. [Drum Magazine]) 

Awards

Multiple awards given under the auspice of one organisation (e.g The Bank Windhoek Doek Literary Award for Fiction, The Bank Windhoek Doek Literary Award for Poetry etc), are grouped under an item (e.g. The Bank Windhoek Doek Literary Award), which is given as an instance of ‘Group of awards’. 'Group of awards' is an ' instance of ALMEDA category'.

Content type cartoons/comic strips

Use 'image'

Country of citizenship

All countries referred to in any biographical information on authors is included. Database notes on usage will include the proviso: 'The legal status of any given person's citizenship cannot be guaranteed.

Deducing occupation from 'Form of creative work'

For the form of the creative work of fotonovella the occupation is writer and the photographer (if different) is illustrator (see below)

Excerpts

Excerpt is stated under the originality of the item. An excerpt is an instance of version, edition, translation. Add original work if possible, otherwise add "to be defined" under statement edition or translation of.

Genre

Genre will not routinely be added to ALMEDA records. If datasets include this column it can be included under P44.

Human settlement

We have included all human settlements in Africa with a population of 500 and above (with the exception of duplicate names of human settlements that we could not assign to a specific district). All capital cities of non-African countries are included, as are all human settlements with populations above 5000 in countries that have the following national languages: Dutch, English, French, German, Italian, Portuguese, and Spanish. All other human settlements will be added as the need arises.

Inception date

The property latest date has been added so that we can now express f.e  born on or before 1960 or published on or before 1960 by using the Almeda property latest date P147 as qualifier. For an example with a (folk) tale see: https://wikibase.almeda.engelska.uu.se/wiki/Item:Q71510. Similarly, a property earliest date has been added to express on or after this date.

Illustrator

To be used for the creator any graphic medium used for the purpose of illustrating a text (such as an artist, an illustrator or a photographer). If one person has written and drawn the text, add them as both writer and illustrator of the work. When adding the related item on the creator, add occupation: illustrator, photographer etc, as applicable.

Items with missing information

Any collective will be an instance of collective agent (Q27754), i.e. Drum authors. Individual members will be connected to the item of the collective agent via a TBD property.

Works without authors will have "to be defined" as author, and description of the work would be "work by unattributed author"

Works without a title will get an invented title with square bracket [ ]. To make to the title, one can use the information that is available, such as the first line of the poem, otherwise title will be similar to the description. See f.e. https://wikibase.almeda.engelska.uu.se/wiki/Item:Q64141. The anthology does not mention the original Juǀ’hoansi title and I couldn’t figure it out online, so I created a title myself and used square brackets to indicate that it is not appearing as such on the source.

In bibliographic, academic, and cataloging contexts, square brackets [ ] are used to indicate information that has been added or supplied by the researcher, editor, or cataloger, rather than information appearing on the source material itself.

Literary adaptation

Literary Adaptation is the only Form of the Creative Work belonging to the class Film. Other forms of film are outside of the domain of ALMEDA. All other filmic material in ALMEDA is modelled as content type within a version of the work. (e.g. https://wikibase.almeda.engelska.uu.se/wiki/Item:Q28756)

In the case of a Literary Adaptation, be sure to enter 'Originality of the item' as 'adaptation' in the version of the Filmic work

Performances

The modelling of performances of a work follows the LRMoo model in which the work is linked to the performance via the property is performed in (P146) and the performance is linked to the work via the property performed (P145). (e.g. the work Arme hulpbehoewende monsters! Q27324 and its performance Q74133)

Reconciliation

Works are considered likely to be by the same person when the following conditions are met: the name or author string is the same; the date of birth of the author aligns with the date of the publication; the same name (initials etc.) are repeated in same venue of publication; the same creative form is produced in a contiguous publication (e.g. poetry in Swahili appearing in two Swahili newspaper venues, or crime stories in English appearing in two popular South African magazines). 

References

References are applied universally across the database for all statements from all datasources. Two properties are used: 'reference URL' and 'stated in'

When data comes from a common source that ALMEDA has not collected or published, and which does not have an external identifier or URL, the sourcing information can be found in a Provenance field.

Provenance indicates whether the data has been donated or retrieved and from what source. An item on the source is created.

Reference URL (P10) versus full work available at URL (P62)

Reference URL is used for provenance, whereas full work available at URL is used for providing the users to the place where the full work is available.

Serialisation

Creative works serialised in Serial Works only use classes under creative work for ‘instance of’ (e.g. dramatic work, or literary work). The version record then includes: 'originality of the item': 'born serial'. 

The class 'Serial Work' refers to the container level: i.e. the magazine 'Drum Magazine' or the Newspaper 'Taifa Weekly'.

Difference between serialized works with continuous content (one item with parts) and works with repeated titles but different content. In the latter case, each work will get a unique item, with the same label/title but different description.

If the serialization crosses multiple years, the first and last issue get the qualifier point in time along side 'page(s)' and 'object of statement has role'

Title in another language

Title is used as label in the original language and in the English language with square brackets (Ask Ursula)

Ideally we would put the title in the language of the work as the default value to Almeda (see https://www.wikidata.org/wiki/Help:Default_values_for_labels_and_aliases and https://www.wikidata.org/wiki/Q18507561). Since this is not implemented yet, we add the non-English title to both the English label and the non-English label. See f.e. Item:Q71488

Translated works

The Work field for a translated work (where the original language text is not available) will be labelled with the translation within square brackets. E.g. Author: 'Luis Carlos Patraquim', Title: [Summer], Translator: 'Luis Raphael'. The version level of the English translation of this Portuguese text uses the translated title (not in square brackets). (e.g links to be uploaded)

Using the entity 'unknown' (Q29141) vs the entity 'to be defined' (Q17817)

In the event that data may still be findable on a particular statement, or when ALMEDA plans to make a concerted effort to add this data, use 'to be defined'. When the data can never be ascertained, use 'unknown'.

Version, Edition, Translation

The description field for a Version, Edition, Translation, has the formula: Date of publication, Title of work: e.g. 1987 edition of When Rain Clouds Gather.

When a version of a work is published in a periodical that crosses multiple months and/or years, the first given date is given. e.g.

Serialisation Year Month Volume Pages Author Title Form of the Creative Work Language
MOTO Magazine 1982/1983 December/January 8 58-60 Sarah Kawaza Kamuchacha Story chiShona

The year is given as 1982 and the Month as December.

Work flow for uploading data on Literary/Small Magazines:

Prior to uploading:

  1. PI to check data follows the templates given to researchers and data collectors
  2. EITHER, Data uploader to clean the data and send major problems back to the PI for referral to researcher or data collector, OR, Dataset goes back to PI for publishing in ZENODO
  3. DOI created for ZENODO publication and sent back to data uploader to use as reference
  4. Data uploader proceeds with upload

Process of uploading:

(see documentation by GA)

Post uploading:

Once a data set has been uploaded, a brief report is sent to the PI marking the upload as complete. This report includes a list of a list of author name strings that will be sent out by the PI for further research/assigned to student interns.

Authors without existing data to be routinely checked by PI and assigned to student interns for enrichment.

Works

All creative works and versions (even one-version works) will be modelled at two levels: Work and Version, Edition, and Translation, or in the case of performed works, Work and Performance.

All Scholarly Works will only be a reference or when need to use ‘stated_in’ we will create an item which only has the level of the version

The subclasses of Work are: Creative Work, Scholarly Work, Edited Collection, and Serial Work. The subclasses of the Creative Work are:

Creative Work
Discursive, instructional or educational work
Dramatic work
Everyday expression
Graphic Work
Literary Work

When writing a description for a Work, the formula is: Form of Creative Work plus name of Author. E.g. A novel by Ngũgĩ wa Thiong’o

Scholarly Works, Edited Collections and Serial Works do not have subclasses.

Works published in parts

Serialisations: Publication date will have a start date (first part) and end date (last part), number of parts of this work, issue has two, first and last issue of series. See Magaidi wa Dr. Shulla: Item:Q28743.

Untitled columns/regular features/cartoon strips in magazines and newspapers: create one work and enter dates and editions under one VET as per example: (ADD GIRL ABOUT TOWN e.g. when complete)

Writing systems

Writing systems should be noted when the item is not in Roman/Latin script (e.g. Karibu Mwambie: https://wikibase.almeda.engelska.uu.se/wiki/Item:Q21781). Writing system is classified as an 'ALMEDA category'.

Topics awaiting resolution

For a Federated search: If we don’t use literary award, and just award, do we lose anything? this determines whether we call awards 'awards' or include a sub-class of 'literary award'.

Events: define event vs activity/performance and consolidate the property list

'Default for language' field: can we include this in the language field of the label/description box?

Inception: used for work level, but can only be presumed in almost every case. Do we either 1. delete it, 2. put in 'sourcing circumstances' as 'presumed', 3. use 'publication date' as first known published version of the work, or 4. Put in the date under the statement 'inception' and write a proviso for the the whole dataset that these are all presumed?

A similar matter to Inception, Country of Citizenship can be found for many instances of 'sex/gender' , where the GaB of the author is assumed.

Using 'main subject' in both the work and version levels: this is a question of search ability (which links to issue of inception, too. If the link between the work and its version enables search across the levels, then these resolve more easily)

Data on publishers?