Sunday, December 11, 2011

Subjects


Jessica R. Buchanan
Dr. Kazmer
LIS5703 – Information Organization
December 10, 2011
Subjects

Introduction
When the goal of creating a comprehensive database is that of user findability, it is vital to take into account user expectations of information retrieval. With the increased use of online search engines, the average user anticipates certain results based upon their own understandings (Rolla). These expectations are derived from an ever growing use of folksonomies created through social tagging. In this paper, I will examine the folksonomy created by my classmates and myself, the problems that our folksonomy faced in contrast to controlled vocabulary in subject headings and how other subject access systems compare to our folksonomy in use of controlled vocabulary versus social tagging.

RefWorks Folksonomy
            Throughout the semester my classmates and I have contributed bibliographic information for resources we believed to be useful in writing our two class papers. With these contributions, we were asked to submit descriptors, or tags, to identify what our resources covered. With these descriptors, we created our own folksonomy. When this folksonomy is examined closely one can see the common issues found in social tagging.
Subjectivity in Social Tagging
Without the guidelines of a controlled vocabulary, folksonomies can “suffer from a certain degree of messiness and inconsistency” (Thomas, Caudle, and Schmitz). When one conducts a search it is based upon their understanding, and thus it stands to reason, that our descriptors would be reflective of how we interpret the articles that we contributed (Taylor and Joudrey 333) (Bates and Rowley). Folksonomies are influenced greatly by the person creating the tag and can lead to deviations and ambiguities (Thomas, Caudle, and Schmitz). In “Folksonomies: Path to a Better Way” Costentino found that the majority of tags in folksonomies were highly subjective and based on the worldview of the tagger themselves. In our class folksonomy there is a level of subjectivity based on how we came to our descriptors. While I may read an article and feel that the most important aspects are its coverage of MARC, Dublin Core, and Cataloging, a classmate may read the same article and believe that its coverage of Metadata, Cataloging Systems, and Controlled Vocabulary are the best descriptors to highlight this particular article. Without the presence of controlled vocabulary we are left to our own understandings as opposed to a set of standards that would lead to a higher level of consistency (Lu, Park and Hu).
Specific vs. General Terms
            Taylor and Joudrey mention that specific versus general terms is a challenge that even controlled vocabulary faces (336). If we examine the class folksonomy we can also see how this is a challenge faced through social tagging as well. Peter J. Rolla discusses, at great length, how social tags are generally very specific or very broad in comparison to most subject headings in controlled vocabulary. He gives a few examples of a how Library of Congress Subject Headings (LCSH) differ from how he might tag the books himself. Here is one such example: “The LC catalogs Bean’s Aegean Turkey, a guide to the archeological sites of Turkey’s western coast, under the single subject, “Ionia.” For me, however, the book is about Turkey and archaeology” (Rolla). We can see through this example that there is a gap between specific and general terms used to help find a book. In our class folksonomy we can also see the general versus specificity used when creating our descriptors. One of the best examples of this is the use of the descriptor “metadata.” Our class used the single, broad, descriptor of “metadata” 27 times in our contribute assignments. In addition to these 27 uses of the single word descriptor, there were also several more specific descriptors used in conjunction with the word metadata. Here is a list of the more specific metadata descriptors: metadata creation, metadata creation templates and editors, metadata format conversion, metadata generation, metadata harvesting, metadata implementation strategies, metadata objects description schema, metadata optimization techniques, metadata quality control, metadata schemas, metadata sharing, metadata standards, metadata structure, and metadata/evaluation. Through this we can see there were some contributors who did not feel metadata was a specific enough descriptor and thus they added to that a more precise account of what they believed their article was about.
Number of Descriptors Assigned
            For the purpose of our class contribute assignments we were given a limit to how many descriptors were to be used. Three descriptors were to be very specific and then the other three were left to our discretion for a total of six descriptors. Taylor and Joudrey touch on the topic of limited descriptors and share that this limitation can be useful in helping cut down the indexing time, but does not always benefit the user due to the fact that certain descriptors might be left out for time’s sake and may hinder the user’s ability to find what they are looking for (344). You could of course, have a similar effect if there are too many descriptors. In many social tagging environments people tend to over-tag. This leads to an overwhelming number of descriptors that can again, limit the user’s ability to find an item they are looking for by providing too many options.

Subject Access Systems
            With the increase in social tagging through sites like delicious, flickr, twitter, and many others we are left with a plethora of ways that users have become familiar with information organization. As previously stated, there are many flaws in this type of information organization, but there are also a great many things that we can learn. If we look at the class folksonomy in comparison to other subject access systems, we can see the benefit of both a controlled vocabulary and the usefulness of social tagging.
Rise of the Folksonomy
            In a time of social networking we are constantly learning and exchanging new ideas. This has led to an increase in social tagging. Wichowski touches on the idea that folksonomies grew out of a need for findability “amidst a changing environment,” and that they were “developed ‘in field’ in response to [that] environmental need.” Understanding that this type of information organization caters to a generation of users who are heavily inundated in social networking provides us with a context for the creation of folksonomies. This type of social tagging leads to a quicker, more effective means of exchanging information in a social environment leading to “an improvement of the folksonomy usage” (Nocera and Ursino).
Social Tagging vs. Subject Headings
            When we compare the class folksonomy to other subject access systems we can begin to see how controlled vocabulary in subject headings and social tagging can provide us with different types of results. If we look at The Internet Movie Database (IMDB) we see a high level of authority control and how this type of control helps with the end goal of user findability. It recognizes misspellings and offers recommendations if certain terms are not found. When you look at a site like Flickr, you can still find concrete terms, like a person, place or thing, but due to social tagging you can also find ideas, symbolisms and abstract thoughts. The problem with the lack of authority control in sites like this limits the ability to point users in the right direction if they do make an error in spelling or if what they are looking for does not exist in the system. Our class folksonomy has both authority control (Assignment, Contribute #, and Contributor’s name) and social tagging (our three self-determined descriptors). This helps to provide a view of how the two can work together and weakness that they each possess.
            Gross and Taylor discuss how “the best subject searching was done by using both natural language searching and controlled vocabulary searching in parallel.” Through their research they discovered that removing subject headings completely would actually eliminate about one-third of the results that users currently receive when doing both subject and keyword searches (Gross and Taylor). Social tagging still has the problem of inconsistency and “semantic ambiguity” which causes information professionals to stay slightly skeptic of the “value of social tagging” (Lu, Park and Hu). Yet it still stands to reason that this increase in user generated information organization can help provide information professionals with a way to “engage users with information management” (Lu, Park and Hu). Bates and Rowley also argue that the “folksonomy might be able to achieve a user-oriented retrieval aboutness.”
            Rolla states that the combination of controlled vocabulary with user tags will increase user findability. He is very adamant about the fact that user tags cannot replace the valuable use of controlled vocabularies like LCSH, but that they can “point libraries in the right direction” (Rolla). His research also showed that many user tags overlap with LCSH (Rolla). This information helps to provide us with an understanding of how user tags can help increase findability within controlled vocabulary settings such as LCSH.

Conclusion
            From the examination of the class folksonomy we created this semester we can see how controlled vocabularies in subject headings and user generated tagging can have both positive and negative effects on the findability of information objects. Lu, Park and Hu mention that “social annotation” and controlled vocabulary like LCSH would benefit from co-existence. The marriage of these two concepts would provide a more user friendly search environment that would result in better findability. Steps are already being taken to provide this type of culmination through products such as LibraryThing (Thomas, Caudle, and Schmitz). If the end product of findability is kept as a top priority among information professionals we are likely to see a more user friendly environment in which we collectively learn, and help others learn as well.


Works Cited
Antonino Nocera, Domenico Ursino, “An approach to providing a user of a “social folksonomy” with recommendations of similar users and potentially interesting resources.” Knowledge-Based Systems, 24.8 (2011): 1277-1296. Electronic.

Bates, Jo, and Jennifer Rowley. "Social reproduction and exclusion in subject indexing: A comparison of public library OPACs and LibraryThing folksonomy." Journal of Documentation 67.3 (2011): 431-48. Electronic.

Cosentino, Sharon L. "Folksonomies: Path to a Better Way?" Public Libraries 47.2 (2008): 42-7. Electronic.

Gross, Tina, and Arlene G. Taylor. "What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results." College & Research Libraries 66.3 (2005): 212-30. Electronic.

Lu, Caimei, Jung-ran Park, and Xiaohua Hu. "User tags versus expert-assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings." Journal of Information Science 36.6 (2010): 763-79. Electronic.

Rolla, Peter J. "User Tags versus Subject Headings: Can User-Supplied Data Improve Subject Access to Library Collections?" Library Resources & Technical Services 53.3 (2009): 174-84. Electronic.

Taylor, Arlene G. and Daniel N. Joudrey. The Organization of Information. 3rd ed. Connecticut: Libraries Unlimited, 2009. Print.

Thomas, Marliese, Dana M. Caudle, and Cecilia Schmitz. "Trashy Tags: Problematic Tags in LibraryThing." New Library World 111.5-6 (2010): 223-35. Computer and Information Systems Abstracts. Electronic.

Wichowski, Alexis. "Survival of the fittest tag: Folksonomies, findability, and the evolution of information organization." First Monday (Online) 4 May 2009. Electronic.

Sunday, November 6, 2011

Representation and Description

Jessica R. Buchanan

Dr. Kazmer

LIS5703 – Information Organization

November 5, 2011

Representation and Description
Introduction
For the purpose of this paper I will examine “A World of Ideas: Essential Readings for College Writers” 6th ed. and how Machine-Readable Cataloging (MARC) and Dublin Core (DC) work to provide a searchable bibliographic record for said item. Through a close examination of the item, there will be discussion of the aspects I have deemed most important and how those are represented in both a MARC and DC record. Once this has been done, I will delve into a more in-depth look at the differences and similarities between the two records, the importance of clean metadata to ensure effective usability, and how MARC and DC work towards that goal.
Important Aspects
            In examination of “A World of Ideas: Essential Readings for College Writers” 6th ed. it was determined that the following aspects would be important in any descriptive metadata record pertaining to this item.
Title
First and foremost, it is important that the title of the book be in any record, as to give the end-user access to the name the book holds. The reason that this has been cited as an important aspect is due to the fact that one must know the title in order to accurately determine whether or not they have found the correct item.
Author
Another important aspect is that of author/editor. Not only does this type of information give you the author for a particular work, but it also gives you access to the name for instances where you would like to read or view additional items created by the same person.
Works Collected
“A World of Ideas” is a book of collected essays. It is believed that having those essays listed within the bibliographic record would benefit the end-user in finding an item that contains a certain essay they may be looking for.
Edition
For the purpose of this assignment, I have chosen to work with the 6th edition of the book, and therefore view that it is important that the edition of a book be viewable in any descriptive metadata record. The reason for its importance is that it tells the end-user whether or not they are working with the most up to date information.
Publication Year
Like edition, publication year helps the end-user know if they are working with the most up to date information and can also help to determine if they can view it as a relevant resource.
MARC Record
            The following is a MARC record for “A World of Ideas: Essential Readings for College Writers” 6th ed. taken from the Library of Congress online database and includes the six items of importance that have already been discussed:
245 02 |a A world of ideas : |b essential readings for college writers / |c [edited by] Lee A. Jacobus.
250 __ |a 6th ed.
260 __ |a Boston : |b Bedford/St. Martin’s, |c c2002.
504 __ |a Includes bibliographical references and index.
700 1_ |a Jacobus, Lee A.
We can see from the representation above that the 245 tag contains the title and responsibility of the bibliographic record, the 250 tag is for Edition Statement and the 260 tag contains information related to the year of publication (2002). In regards to bibliographic information contained within the book itself, this MARC record uses the 504 tag to show that there is bibliographic references and an index. While I have listed the 504 tag (as that is what was contained in the MARC record from the LoC), the 505 tag would be more appropriate for my initial thoughts that a detailed list of the essays included in the book should be represented in the descriptive metadata for this particular record. The 700 tag in this record is for added entry – personal name in order to make the name of the editor in this record match authority control.
DC Record
Now I will examine a DC record for “A World of Ideas: Essential Writings for College Readers” 6th ed. and how it differs in representation from the MARC record that was just reviewed:
<dc:title>A World of Ideas: Essential Readings for College Writers</dc:title>
<dc:creator>Jacobus, Lee A.</dc:creator>
<dc:description>List of essays held within book.</dc:description>
<dc:date>2002</dc:date>
<dcterms:isVersionOf>A World of Ideas: Essential Readings for College Writers 5th Edition</dcterms:isVersionOf>
<dcterms:hasVersion>A World of Ideas: Essential Readings for College Writers 7th Edition</dcterms:hasVersion>
<dcterms:hasVersion>A World of Ideas: Essential Readings for College Writers 8th Edition</dcterms:hasVersion>
The DC record for this item reads more like html and is easy to read even for someone who might not be familiar with cataloging and bibliographic information. From the beginning it is easy to determine that the title is “A World of Ideas: Essential Readings for College Writers” as the tags read title. This holds true in the majority of tags for DC records, which is due in large part to the fact that DC was developed by “experts from many different fields [and] therefore…is a cross-domain standard and can be the basis for metadata for any type of resource in any field” (Taylor and Joudrey 213).
Similarities and Differences of MARC and DC Records
Similarities
            It is clear that both MARC and DC strive to produce effective bibliographic records. Yasser states, “unless a metadata record effectively and accurately represents the resource being described, the resources themselves will remain inaccessible.” It can thus be synthesized that the goal of MARC and DC is to provide detailed information to meet the end goal of users, accessibility. They both provide a way to include the same information and help to accurately describe what one might need to know about an item.
Differences
            While the end goal of MARC and DC are similar, how they get there is slightly different. MARC uses a sophisticated set of fields and subfields to describe the data located in a bibliographic record. In comparison, DC uses a set of 15 elements that can be used to describe any object in a bibliographic record. “While it certainly is easier to create 15 elements as opposed to ‘tags and fields and subfields’ it is also less easy to uniquely identify information resources” (Coleman). This can be seen in the way the above records show edition. The MARC record has a clear, definable tag for edition, where the DC record is slightly vague when it shows “VerionOf” and “HasVersion”. This was the only field I found difficulty in describing using the DC standards. For the DC record I had to do some digging, where the MARC record for version was clearly defined.
Analysis of Bibliographic Records
Metadata
The simplest definition of metadata is “data about data”. For the purpose of this paper I will work with Howarth’s definition that “metadata is the sum total of what one can say about any information object at any level of aggregation.” From the MARC and DC records found in this paper one can see this definition at play. These records work to bring the sum total of information together to form a complete bibliographic record that will give the end-user the most complete information about the object. It can be said, “at the system level, metadata can be used to facilitate interoperability and the ability to share among resource discovery tools…at the end-user level, metadata can facilitate the ability to determine what data are available; whether they meet specific needs; how to acquire them; and how to transfer them to a local system” (Howarth). From this, we can see the importance of good metadata. “Poorly created metadata records result in poor retrieval and limit accessibility to collections, ultimately exercising a detrimental impact on the continuing adoption and use of a digital library” (Yasser). Without clean and concise metadata, the end-user could lose accessibility to items valuable to their research.
Creating metadata that adheres to standards and a set of vocabularies will enhance said metadata and ensure usability between different systems. Clear and concise metadata will create multiple access points that a user can search by to find a particular resource (Yasser). “Resource description should be consistent” (Coleman) in order for this type of clarity to happen.
Access Points and Authority Control
            An access point is a word or phrase we search by in a retrieval system to find certain types of data related to our search (Taylor and Joudrey 441). “The access point has two basic functions. It enables the catalogue to find the record and it groups together records sharing a common characteristic” (Gorman). These functions helps the end-user by giving them the availability to do a specific search while providing them with like objects that could aid in furthering their research.
            “Authority control is a mechanism for creating consistency in online systems and for allowing greater precision and better recall in searching” (Taylor and Joudrey 187). When you consider the need for clean and concise metadata and combine that need with authority control it is then that you begin working towards the goal of accessibility. In order to create this clear and concise metadata one must follow a set of standards and guidelines. Adhering to this will allow access points to do their job.  “Cataloging cannot exist without standardized access points, and authority control is the mechanism by which we achieve the necessary degree of standardization. Cataloging deals with order, logic, objectivity, precise denotation and consistency, and must have mechanisms to ensure these attributes” (Gorman).
Bibliographic Control
            “Dedicated to the creation of [bibliographic records]…the theory and practice of bibliographic control has focused on systematic, uniform and consistent approaches to describing intellectual or artistic content and physical characteristics” (Howarth). The design of bibliographic control is to adhere to a set of standards in order to facilitate access to the items the end-user is attempting to get to (Howarth). In Howarth’s article “Metadata and Bibliographic Control: Soul-mates or Two Solitudes?” it is determined these two entities work together. Without bibliographic control we are left with metadata that may or may not lead the end-user to what they are searching for. Yet, if you combine clear and concise metadata with the standards adhered to through bibliographic control, what one is left with is consistency. Coyle points to this as well stating, “what library cataloging and catalogs provide is a high degree of conformity in the data captured in the records. This conformity is a service to users, who can move from one library to another comfortably.”
Interoperability
            Once the goal of accessibility is met, the next goal should be interoperability – where a user can move easily from one system to another without feeling as though they have no understanding of the information they receive. “Without agreement on standards, without consistent approaches, sharing information would be a laborious mapping process and users would be presented time and again with new and conflicting information on nonstandard interfaces” (Allinson). It is of vital importance that standards be adhered to in order to reach the end goals of accessibility and interoperability. Data sharing will only be successful if metadata is taken from “complete and consistent resource description” (Park et al.). If there is an inconsistency in the metadata provided this will lead to difficulties in finding accurate information amongst different repositories (Allinson).
Using MARC and DC to Meet These Needs
MARC
“As with most standards, there are both strengths and weaknesses associated with MARC. Strengths include the fact MARC is a mature standard…weaknesses include the fact that MARC is virtually unknown outside of libraries” (Taylor and Joudrey 141). The strength of MARC is its longevity. Having been around for quite some time it has made its impression on the librarian profession and is well known for its ability to describe, in great detail, an information object. The weakness stated above reminds those who are comfortable and familiar with MARC it is a standard that has difficulty translating into other information fields.
DC
Dublin Core standards are slightly easier to understand than the more detailed and complex nature of MARC. Though it is simplistic in nature, with its 15 elements, it is not without its problems. “Conceptual ambiguities and semantic overlaps underlying the DC semantics were responsible for various interpretations of DC elements resulting in their incorrect application” (Yasser). Without a strong understanding of the DC elements, one is left to self-interpretation which results in various information output. This ambiguity leads to confusion for the end-user. If the elements are not used in the appropriate manner this will lead to sloppy metadata and make it nearly impossible to find accurate information (Yasser).
Which One to Use
            “For digital resources to be included in the library catalog integrating new metadata standards such as DC with older standards such as MARC…is necessary” (Coleman). There has to be a merging of both the old and the new in order to accurately and comprehensively describe the information objects that end-users are trying to obtain. What has to be taken into consideration is how information professionals can continue to enhance their knowledge of these standards. When there is an understanding of the issues that can arise within metadata records it helps to prepare information professionals on how to be proactive in preventative measures (Yasser). Knowing there are ambiguities within the creation of bibliographic records one can see there “is the need for the cataloguer to be able to negotiate these ambiguities by exercising skill, good judgment and the fruits of experience” (Gorman).
“The pace of change in the metadata environment creates an increased demand for continuing education programs that are designed to allow cataloging and metadata professionals to stay up-to-date with current and emerging standards and technologies for describing networked and digital resources” (Park et al.). Making sure that information professionals are able to stay current in metadata standards will serve to ensure that there is accurate and comprehensive metadata. If those who create and maintain metadata and bibliographic records are given the means for professional development it will continue to increase the clear and concise metadata needed for accessibility by the end use.
Conclusion
            What can be concluded from this paper is in order to ensure accessibility for the end-user there must be clear and concise metadata. In order to ensure clear and concise metadata there has to be an application of standards amongst information professionals. And in order to ensure an application of standards amongst information professionals there must be a source of continuing education to maintain a working knowledge of current standards and technologies. Understanding both MARC and DC will help to create a more comprehensive set of standards to work with. It is through this understanding we can hope to meet our goal of accessibility.


Works Cited
Allinson, Julie. “Describing Scholarly Works with Dublin Core: A Functional Approach.” Library Trends 57.2 (2008): 221-43. Electronic.
Coleman, Anita S. “From Cataloging to Metadata: Dublin Core Records for the Library Catalog.” Cataloging & Classification Quarterly 40.3-4 (2005):153-81. Electronic.
Coyle, Karen. “Understanding Metadata and Its Purpose.” The Journal of Academic Librarianship 31.2 (2005): 160-3. Electronic.
Gorman, Michael. “Authority Control in the Context of Bibliographic Control in the Electronic Environment.” Cataloging & Classification Quarterly 38.3-4 (2004): 11-22. Electronic.
Howarth, Lynne C. “Metadata and Bibliographic Control: Soul-Mates or Two Solitudes?” Cataloging & Classification Quarterly 40.3-4 (2005): 37-56. Electronic.
Library of Congress. U.S. Govt. 6 November 2011 <http://www.loc.gov/>.
Park, Jung-ran, Tosaka, Y., Maszaros, S., and Caimei, L. “From Metadata Creation to Metadata Quality Control: Continuing Education Needs Among Cataloging and Metadata Professionals.” Journal of Education for Library and Information Science 51.3 (2010):158-76. Electronic.
Taylor, Arlene G. and Daniel N. Joudrey. The Organization of Information. 3rd ed. Connecticut: Libraries Unlimited, 2009. Print.
Yasser, Chuttur M. “An Analysis of Problems in Metadata Records.” Journal of Library Metadata 11.2 (2011): 51-62. Electronic.