Thursday, November 20, 2008

Week 13 Notes

Total "Terrorism" Information Awareness (TIA):
-very interesting, scary, and informative site
-2002, Total Informtaion Awarness: intended to detect terrorists through analyzing troves of information.
- TIA purported to capture the "information signature" of people so that the government could track potential terrorists and criminals
- called for the development of "revolutionary technology for ultra-large all-source information repositories,"
- develop data-mining or knowledge discovery tools that would sort through the massive amounts of information to find patterns and associations.
- development of biometric technology to enable the identification and tracking of individuals
- September 2003, Congress eliminated funding for the controversial project and closed the Pentagon's Information Awareness Office
-site includes news items, documents, and resources

No Place to Hide site:
-“ When you go to work, stop at the store, fly in a plane, or surf the web, you are being watched. They know where you live, the value of your home, the names of your friends and family, in some cases even what you read. Where the data revolution meets the needs of national security, there is no place to hide.”
- No Place To Hide: multimedia investigation by news organizations working together across print and broadcast platforms, to make a greater impact than any one organization could alone
-Interactive site with interviews, reviews, extended learning projects, links
-seems very validated by the sources which have reviewed them

Youtube Video:
-“this video is no longer available due to a copyright claim by Viacom International
Inc.”

Tuesday, November 11, 2008

Week 10 Muddiest Point

It seems to me that the regular Google search engine, and especially google scholar have more direct resluts than google image, is there a difference in how they sort and retrieve or is it mainly due to how people label images

Week 10 Comments

This week I commented on the blogs of April: https://www.blogger.com/comment.g?blogID=8747228788318880740&postID=1144674564107026535&page=1, and Rebecca: http://rap70.blogspot.com/2008/11/reading-notes-week-10.html

Assignment 6

http://www.pitt.edu/~jmt99/

Thursday, November 6, 2008

Week 10 notes

David Hawking , Web Search Engines

Part 1:
-modern search engines do more than was ever believed possible
-article focus= go behind the scenes and explain how this data processing "miracle" is possible
- search engines must reject as much low-value automated content as possible, its cost effective
- Currently, the amount of Web data that search engines crawl and index is on the order of 400 terabytes
- simple crawling algorithm must be extended to address the issues of speed, politeness, excluded/duplicate/continuous content, and spam rejection
- Engineering a Web-scale crawler is not for the unskilled or fainthearted (tag, im out)

Part 2:
-focus= “reviews the algorithms and data structures required to index 400 terabytes of Web page text and deliver high-quality results in response to hundreds of millions of queries each day.”
- Search engines use an inverted file to rapidly identify indexing terms
-goes over concepts of scaling up, term lookup, compression, phrases, anchor text(kinda interesting), link popularity scores, and query-independent scores
- major problem with the simple-query processor is that it returns poor results
-technology to speed things up= skipping, early termination, assignment of document numbers, caching(something I knew of before this article, yay)
-now interested in suggestions of generating advertisements targeted to the search query and generating spelling suggestions from query logs


Current developments and future trends for the OAI protocol for metadata harvesting:


-article looks at developing trend of Open Archives Initiative protocol for metadata harvesting, initiated originally for e-print archives community, mention of Mellon Foundation, the article and development is interesting. Though I am more into paper based documents, it seems more likely every day that I will have to know and work with these types of documents and databases, as the archival community continues to shift and develop

The Deep Web: Surfacing Hidden Value:


-This article was very enjoyable, I found it easy to read, enjoyed the metaphors, graphs and charts (yay pictures). The internet is way to vast for the common person to conceptualize and this article aided in my understanding of a complex tool/resource I typically take for granted

Week 9 muddiest point

With the continual changes in the internet and types of access, have there been any new developments past XML, like will iphones and other devices soon advance to greater needs, adn if so are there any current development to prepare for this

Monday, November 3, 2008

Week 9 comments

This week i commented on the blogs of Andrea, https://www.blogger.com/comment.g?blogID=854093220520038877&postID=2126501190425548356&page=1. and Eric, https://www.blogger.com/comment.g?blogID=6036476990105941684&postID=5103781994586888299