A Southern Perspective: April 2007

Monday, April 23, 2007

PSA About Hand Washing

Please wash your hands!

I was waiting my turn when someone left and just ran right out the door.

It really is not that difficult - I have even found some instructions for you to follow.

So Please Wash Your Hands!

Friday, April 20, 2007

Right Said Fred

This is so funny - because I used this song as a radio promo during my undergrad..

Your Stripper Song Is

I'm Too Sexy by Right Said Fred

"And I'm too sexy for your party
Too sexy for your party
No way I'm disco dancing"

Yes, you're super sexy. But you never yourself too seriously!

What Song Should You Strip To?

Wednesday, April 18, 2007

Do You Know My Mother?

Apparently in my post stating that I was indeed alive - I forgot to mention that my fabulous mother came to visit me over Easter.

The book to the right was one of my favorites when I was little ---->

El Gato Guapo is now in residence and meowing every chance he gets!

That's it for now -- More hopefully after I finish my paper and my presentation!

Thanks to all for the many fabulous birthday wishes I received!

It was a fabulous day and I will now not be getting any older - so stop counting - I will stay where I am thank you very much.

Unfortunately this year my happy day coincided with the tragic happenings in Virginia.

My heart and prayers go out to all those that were touched by this great sadness.

Also to all the family and friends in my life - I Love You! Thanks for being in my life and being the fabulous You!

Till next time - Keep your feet on the ground and keep reaching for the stars. (Casey Kasem)

LS 500 The Notion of Category...

Barite, M. (2000). The Notion of "Category:" Its Implications in Subject Analysis and in the Construction and Evaluation of Indexing Languages. Knowledge Organization 27:4-10.

Introduction

There is not very much written on the topic of categories. This may be as the author suggests due to the fact that categories matter only to classificationists. Barite poses the following question:

What do we designate when we make reference to categories?

Ranganathan brought the concept from Philosophy to the Classification of Knowledge – he constructed a system of classification called the Colon Classification based on his theoretical ideas.

The author does mention that the comprehension of the concept if neither simple nor easily accessible.

Definitions

It is not possible to characterize categories in the Theory of Classification.
Categories are generally abstract expressions, so they can be perceived in any entity, element or object.
Categories simplified as abstractions with the strength of intellectual instruments are used by classificationists to investigate regularities of objects of the physical and ideal world and for representing notions.

Usefulness

Design, planning and structuring of indexing languages or systems of knowledge.
Modification or specification of classification tables.
The evaluation and analysis of indexing languages and systems of concepts through a set of parameters capable of establishing the grade of reciprocal tension among related concepts and their relevance and validity.

Category, Object and Analyst

It is not possible to isolate the notion of category from those of object and analyst. The complexity of any object impedes global, integral and complete analysis. Object attributes that condition its study:

Any object is naturally dynamic and mutable.
The object may be real or ideal.
Some objects have delimitation problems.
A large part of the objects belong to, or occur in a phase of the time-space continuum, or rather flow along a section of that continuum.

Characters of Categories

Every category is a sectorial one.
Every category implies a specific level of analysis.
Categories are levels of analysis external to the object.
Categories are mutually excluding
Every category is highly generalizable.
Every category may admit, with reference to an object, variable levels of subdivision.
Agreement has not been reached regarding a limited collection of categories.

Conclusion and Reflection

At the end of the article the author proposes greater attention should be shown to this topic. I have to agree with that along with requesting that it be in simpler terms. After two readings my mind is still boggling with some of the terms and ideas. The references to the time-space continuum made me think of a play I know that involved paradoxes. I can only hope that I will understand this topic more after my class tomorrow night.

Wednesday, April 11, 2007

I Am A - L - I - V - E !

Yes I am still here.

I am extremely busy with the new position and the end of the semester!

Let's recap -- shall we?

I found out I got mono right before break and therefore break was not very much fun as I mostly slept and packed the things I wanted to bring back with me. It also snowed three times -- not so much fun. I flew back to school on St. Patty's Day and subsequently got stuck in the Chicago airport for 5 hours. I was not a happy camper as all I wanted to do was be in bed and once I got to my destination I still had an hour drive ahead. While I was in the airport I did however see a sleeping leprechaun -- very cool...very cool indeed! So I finally got home around 1am only to have to go back out and find the 24hr pharmacy to get cough syrup.

I learning something new every day at the new J O B and am slowly but surely learning my way around town -- soon I'll be able to get here, there and everywhere.

Last night I impressed some folks with my ability to quote Dr. Seuss it was a rant about Green Eggs and Ham. The things that are in my mind -- some day that will help get me a job -- right??

It was also told to me through several channels that I am not posting enough personal info...and keeping my loving family and friends informed of my goings on...I shall try my best..but you know a girl has to have some mystery right??

Hope everyone had a Happy Easter!!

TTFN~

LS 500 How a Search Engine Works

Liddy, E. (2001). How a Search Engine Works. Searcher 9(5):38-45.

A Search Engine is the more popular term for an Information Retrieval (IR) System. Whichever term you call the system it contains four different elements:

A Document Processor
A Query Processor
A Search and Matching Function
A Ranking Capability

Document Processor

Prepares, processes and inputs the documents, pages or sites that users are searching. Document processors perform some of the following steps:

Normalize the document stream to a predefined format
Break the document stream into retrievable units
Isolate and metatag subdocument pieces
Identify potential indexable elements in documents
Delete stop words
Stems terms
Extracts index entries
Computes weights
Creates and updates the main inverted file against which the search engine searches in order to match queries to documents

Query Processor

Query processing has seven possible steps:

Tokenize query terms
Recognize query terms vs. special operators
Delete stop words
Stem words
Create query representation
Expand query terms
Compute weights

Search and Matching Function

How systems carry out their search and matching functions change depending on which theoretical model of information retrieval underlies the system’s design philosophy. Searching the inverted file for documents meeting the query requirements, referred to simply as “matching”, is typically a standard binary search, no matter whether the search ends after the first two, five or all seven steps in the query process. Some search engines use algorithms for scoring not based on document content, but based on the relation among documents or past retrieval history of documents and pages. After the similarity is computed for each document in the subset of documents, the system presents an ordered list to the searcher. The sophistication of the ordering of the documents depends on the model the system uses as well as how advanced the document and query weighting mechanisms are. Some systems that are very sophisticated go the extra mile and let the user provide relevance feedback or modify their query based on the results they were given.

What Document Features Make a Good Match to a Query

Term Frequency

How frequently a term appears in a document is one of the most obvious ways to determine a document’s relevance to a query. However, several situations can undermine this premise. Many words have multiple meanings; such as “pool” or “fire.” Also in some domains certain words are so common and so frequent that their relevance declines sharply.

Location of Terms

Many search engines give preference to words found in the title or lead paragraph or in the metadata of a document. Terms that occur in the title of a document or page that match a query term are therefore frequently weighted more heavily than terms occurring in the body of the document. Also, query terms that occur in section headings or within the first paragraph of the document may be more likely to be relevant.

Link Analysis

Link analysis works like bibliographic citation practices. Link analysis is based on how well connected each page is as defined by Hubs and Authorities, where Hub documents link to large numbers of other pages (out-links) and Authority documents are those referred to by many other pages, or have a high number of “in-links.”

Popularity

Google and several other search engines use popularity to determine page relevance. Popularity uses data on the frequency that a page is chosen by users to predict the relevance of it.

Date of Publication

Some search engines assume that the newer the information is the more likely that it will be relevant to the user. These engines present the results beginning with the most current ones first followed by the older results.

Length

When there is a choice with two documents having the same query terms, the search engine chooses the document that has a higher occurrence of the term relative to the length of the document.

Proximity of Query Terms

When the terms occur near each other in a document it is more likely that the document is relevant to the query than if the terms occur at a greater distance.

Proper Nouns

These sometimes have a greater weight, since many searches are performed on people, places or things.

Summary and Reflection

Up till now search engine providers have primarily opted for less versus more complex processing of documents and queries. This then leaves the bulk of the work to be done by the searcher to pick their way through the results to find what they are seeking. Hopefully this status-quo will not continue and search engines will continue to enhance the quality of the processing.

I have to honestly say it never occurred to how or what exactly happens when I perform a search. It was interesting to learn exactly how complex the search process is and what all the different components are. Just today I saw an additional article (see below for link) from ZDnet.com that stated that Google is drawing 64% percent of the search queries for the month of March. Overall I found the article very enlightening and informative as to how the whole process works. I certainly won’t look at performing a search the same way again.

http://news.zdnet.com/2100-9595_22-6175248.html?part=rss&tag=feed&subj=zdnn

LS 500 Authority Challenges

Bennett, D.B. & Williams, P. (2006). Name Authority Challenges for Indexing and Abstracting Databases. Evidence Based Library and Information Practice, 1(1).

Introduction

Indexing and Abstracting (I&A) databases generally have not implemented name authority control as is used in many library catalogs. Most I&A databases burden the searcher with identifying and selecting name variations. The use of widely varied forms of authors’ names without reference or links to alternatives causes problems for the searchers. End results may be inaccurate or incomplete, resulting in a decrease in the scientific integrity of the research.

Individual library online catalogs have been applying authority control since the implementation of AACR2. Personal name authorities bring together works by an author, regardless of the variations in name as identified in the work itself.

One large challenge lies in managing author name changes. Few databases have chosen to link the variations or name changes to facilitate searching and retrieval of an author’s works. I&A databases may also move all of an author’s works from the former name to the current name, altering some records so then the author name no longer matches the name displayed on the original article.

Examples of Problems with Name Changes

Authors that publish works under two forms of their name and authors that have changed their names are both not easily found in databases and in most cases not all the relevant citations are found. If database citations do not contain the form of the name used on the article, citing errors most often will occur.

The Web of Science, the original citation tool, uses the author name exactly as it appears in the citing article. The policy of ISI is not to over-correct “variations” because it cannot check them all and refuses to second guess an author’s intentions.

When the searcher uses only the author name on an article but the I&A database has reformatted the author name and the user selects the name from the I&A database rather than the name on the article, some citations will not be retrieved.

Potential Solutions: Overview

Solutions to the problem of identifying and linking author name changes within I&A databases can take many approaches both in production and in the research modeling stage.

Authority Control through the use or linking of Name Authority files
Uses a file: MathSciNet or WilsonWeb
Proposed file: International Standard Authority Name/Data Number
Linking across files: HoPEC, ANAC Levy Project, LEAF

Name Disambiguation through automated methods

In Practice: Author-ity
Models in development by research teams, including use of social networks

Maintaining name authority files requires a high amount of labor but benefits the user with high recall and high precision results. Automated methods of name disambiguation require less manual labor but cannot compare to the level of high recall and high precision of well maintained authority files.

Potential Solutions: Authority File in the MathSciNet Database

The MathSciNet database creates and maintains a name authority file to control variations. Much of the identification process is automated; however around 20% of all the items require manual checking. MathSciNet’s solution is workable in small database communities, where it is possible for human indexers to check and correct all problem entries manually. It should be noted that this solution may not work for large databases, but it could prove very useful for databases covering a smaller range of information and topics.

Potential Solutions: More Examples Creating, Using, or Linking Authority Files

I&A Databases may follow Library of Congress (LC) practice but may get an added benefit in looking at the LC Name Authority File (LCNAF) to help with collocating the names in the author databases. I&A Databases would also benefit from the effort that goes into compiling the LCNAF. But the I&A make the mistake of changing the authors’ names rather than pointing or linking to the variations as given within the articles themselves.

Several projects are currently in the works to build on LCNAF and other authority files.

FRANAR – Functional Requirements and Numbering of Authority Records

“Is working to develop a conceptual model to assist in an assessment of the potential for international sharing and use of authority data both within the library sector and beyond.”

HoPEc System

Implements an author registration component that places the burden on the authors to create and maintain their own authority files if they wish for the papers to be clustered.

Librarians realized long ago that linking methods could be exchanged for authorized forms of names. In the automated environment a system does have to select a “correct” form as long as all of the variations link to each other.

LEAF Project – Linking and Exploring Authority Files

Links all authority records that pertain to the same person based on the automatic linking rules of the project and includes birth/death dates.

Potential Solutions: Alternative Approaches Using Name Disambiguation

Instead of using name authority files, researchers are aiming for an automated method of examining more than the author name to determine the likelihood that any two papers with similar author names have been written by the same person.

Authority name issues can be grouped into three categories: (1) multiple name variations that signify the same author, (2) similar or homonymic names that belong to more than one author and (3) linear changes when an author alters his/her name.

Disambiguation projects generally share similar attributes. All of them use metadata beyond the author name alone. Most have proven that adding more data elements to the models help to disambiguate names in a faster manner and with a higher rate of success than solely using single author names. Merging the techniques of adding data elements and relying on disciplines to maintain their own linked name files could garner great success for large, multidisciplinary databases such as I&A.

Conclusion

Alternative solutions must be implemented to assure access, retrieval and the proper crediting of authors’ works. Without control or linkage to name variations, searchers may retrieve incomplete or inaccurate results. To meet the access needs of the 21^st Century both catalogs and I&A databases may need to implement options that present a high degree of probability that the items have been authored by the same individual rather than options that provide high precision with manual maintenance. Striving for name disambiguation rather than name authority control may be the best option for catalogs, I&A databases and digital library collections.

Developing automated methods can reduce the searchers burden of determining author name variations while ensuring that the author index entries match the names on the article and that the end user can successfully retrieve all of an author’s works from that database.

Reflection

Anything that furthers getting the correct packets of information to the user is something that we should be pursuing in library science. The average user does not know to look under the author name variations unless the search page tells them to do so and even then they may not read the instructions. I believe we need to focus on the quality of the results and ensure that people are finding what they are looking for and gaining good quality information. We must as a profession, do whatever is necessary to deliver the correct information in a quick and timely manner to those that are requesting it.