Essay on How Useful Are Search Engines to Academic Historians
Number of words: 3553
The internet is an unorganised mass of data, search engines are an attempt to present data in an orderly way.[1] As Glossbrenner argues, ‘a search engine is a tool that lets you explore databases containing texts from web pages, when the search engine finds pages that match your search requests it presents it with a brief description and clickable links to navigate you to the desired site’.[2] There are three main types of search engines, general, specialist and meta-engine. For the purpose of this essay I will evaluate the usefulness of a search engine for an academic historian’s research.
The internet is a continually growing phenomenon and sources suggest that search engines have been the most popular tool for searching the internet for over a decade.[3] As a result the use of search engines for academic purposes has increased; this is demonstrated in a survey which showed that 73% of students used the internet before libraries, and 90% used search engines when using the internet.[4] A study in 2004 confirms the importance of search engines to academics, with 45% using Google as their first preference whilst only 10% used their University Library catalogue.[5] The present digital age demonstrates that using search engines for academic purposes is a contemporary issue that needs to be critiqued.
The extent to which search engines can be used as a research tool for historians has not received extensive attention. The nature of critiquing continuously updated software is that literature on the subject quickly becomes outdated. Korfhage does not mention Google when discussing major search engines, it could be argued his work is outdated, particularly as in 2007 Google managed 75% of all online requests.[6] Basch uses a convoluted metaphor involving painting the golden gate bridge in a chaotic environment to demonstrate how difficult it is to write current information about search engines.[7] Some historians have had the chance to critique the subject. Hockey does not believe they are useful, describing the use of search engines as a cumbersome process due to limitations of functionalities, also stating they are designed by experts with little understanding of humanity research.[8] Cullen likens a search engine to fast food, ‘convenient and cheap but it’s not good for you’.[9] Presnell also supports this view, describing primary users as novice researchers.[10] I concur with the view expressed by these historians, that search engines are not useful due to their limitations. Conversely some scholars support the use of search engines, Fernback and Blenkinsop regard them as a misunderstood research tool which should be embraced.[11]
Practical Limitations
Search engines have various practical limitations. They index such a great number of sites containing information on the keyword entered that there are often too many results for the historian to sift through. Practices such as ‘Google bombing’ mean that a number of the results which appear will be irrelevant.[12] Unless ‘crawlers’ have visited a site recently there is the potential for an inactive site to still be indexed.[13] In order to expedite investigations search engines use algorithms to rank results to what they perceive to be the most important websites for the user. It is search engines indexes which rank information; but the historian may be applying a different ranking. Rankings are not always useful to the historian as it could just be the most popular link ranking highest.[14] Search engines, including Google can charge fees to website producers to get their search to rank higher.[15] There are many ways for website administrators to improve their ranking, for example using automated queries. These do not improve the results for the historian’s purpose, meaning they have to search through results irrelevant to their research, as their criteria may be different to the search engines.
There are ways to limit the number of results, such as ‘inclusive’ or ‘exclusive’ searches, however unless one is properly trained to effectively utilize the search engines keyword entry criteria, the user will receive too many results to practically search through. The links description may also be such a small extract that it is of little use. Most search engines will offer an advanced search system, but unless the specific website is known it may be difficult to exclude unwanted results. The only way to guarantee manageable results are to use a specialist search engine; however they are generally linked to a specific site or database and will only search a limited number of indexed sites, which in turn means there will be fewer sources to utilize, potentially meaning valuable data will be missed. This demonstrates the practical limitations of insurmountable results cannot be avoided unless one uses a search engine which will only contain a small number of indexed sites. There is the potential to gain large numbers of results from a library catalogue, however the nature of the internet means results have the potential to be in the millions where as library catalogues may only be in the hundreds, making the problem more manageable.
Bradley estimates there are around 20,000 search engines on the internet.[16] Each search engine will yield a different set of results, it would be the ideal to cross-reference as many as possible; however this is not viable in practice. One could use a meta-engine such as Ask Jeeves, which will search through search engines and return the results in one search. However meta-engines are described as having ‘the lowest common denominator effect on your search’, using only those features that all of the search engines have in common.[17] One can never search all of the different search engines for data, yet all will harvest different results, therefore the search engine is not useful as it is too difficult to cross reference results to ensure all potential sources are revealed.
The Invisible Web
Search engines do not utilize a great majority of the internet. What search engines cannot access is referred to as the invisible web. Lawrence suggests that less than 20% of the web is visible with the use of search engines.[18] However Bergman is more exact and argues that the invisible web is 500 times larger than the surface web.[19] The invisible web holds valuable information, demonstrated by a case study showing 13 history syllabi were found by Yahoo out of 15000.[20] This shows search engines are not useful to the academic historian as they are limited to indexing only a minute percentage of available sources on the internet.
There are a number of reasons why a search engine may not a particular piece of data. Information on the site may be too recent to have been indexed. New websites will not be listed on a search engine unless ‘crawlers’ have had time to visit the website and add it to its index.[21] The Googlebot either searches new pages submitted by webmasters through Google’s website or indexes new data found during its routine crawl.[22] Presenting your website to a search engine is not a simple task. Even after presenting a website to a search engine, a scrupulous criterion from the search engines manufacturers means there is no guarantee a site will be indexed.[23] This limitation means search engines hold limited results, missing data which potentially contains useful data.
Search engines do not find new information on already indexed websites. Unless a ‘fresh crawl’ has taken place new data will be bypassed. There are also issues with the ‘depth of the crawl’ a search engine will do; Google often limits its search to a crawl of 110 kilobytes, which means not all material will be indexed.[24] There are also particular formats that search engines are unable to access. Although a database may be found via a search engine, the contents of it possibly will not be indexed, these databases can contain valuable information and their exclusion can mean search engines are not useful.[25] A similar problem exists with other types of online content, such as untitled images.
General search engines have made inroads into indexing the invisible web, previously only HTML format could be recognized, but now PDF, XLS, PS, DOC, PPT and RFT will appear on results.[26] Google have also begun indexing dynamically generated web pages, which were previously inaccessible.[27] However, as search engines make inroads into the invisible web, it continues to grow at an exponential rate.[28] General search engines cannot search the content of some databases such as JSTOR; these require subscription and do not allow general search engines to index them.[29] Using the specialist search engine on a subscription site will limit the sources which are indexed. Google has also launched Google Scholar and Google Books which searches digitized academic works once part of the invisible web; however this is also limited by subscription only content, meaning results only show an extract of the source. Despite advancements the amount of data which still exists on the invisible web limits the usefulness of search engines as they are only able to search a small percentage of available material. Although library catalogues are similarly restricted to search only what is in their collection, at least one can be assured that results found are of an academic standard.
Quality of information
Another limitation of the search engine is that it does not test information for its validity or reliability. Himmelfurb maintains that ‘search engines will produce a comic strip or advertising slogan as readily as a quotation from the Bible or Shakespeare’.[30] It is imperative to the study of history that the sources are reliable. However search engines will produce results containing sites and data which are not reliable enough for the academic historians to use.
Assessing the authority of a website is essential, particularly when evidence from a study in 2004 discovered almost half of internet users in America created their own online content, 13% maintaining their own websites.[31] Anyone can add information to the internet and have their site indexed; it is important to check that the author is qualified or at least follows academic protocol. Wikipedia is a site which generally appears high on a general search engine query and is a site which anyone can edit, meaning its content is questionable, therefore citing it has been banned by academic institutions.
Although professional academics can make mistakes their work is checked through peer assessments, editors, proof readers, librarians and academic institutions. No checking system is in place for information on search engines; meaning it is not useful to the historian as there is no guarantee the information found will be accurate enough to use. This is demonstrated by a case study which showed that a search of ‘Abraham Lincoln’ brought up a site by Roger Norton, an unqualified enthusiast, prior to a renowned Lincoln historian Professor David Donald.[32] Events of this nature can occur whilst using library catalogues, such as results containing historical fictions. However the mass of data on the internet in comparison to a library catalogue means this eventuality is far more likely and problematic.
A search engines will also not discern between sites based on their domain name. For example, website addresses which end in ‘.edu’ are intended for accredited educational institutions, meaning the information on the site is supported by an institution and has some authority. Alternatively ‘.net’, although originally being intended for organisations involved in networking is now a general purpose domain space anyone can use. Although one still needs to apply critical techniques whatever the domain of the site, the way in which a search engine does not discern between reliable and less reliable domains shows they are not useful.
There are search engine features which minimise problems with the authority of sources, such as searching sites based upon whether they are linked to trusted sites, however this is still no assurance that the information is reliable. Similar problems exist with the accuracy of sources, there is a potential for forgeries to go unrecognised. Forgeries such as the ‘Protocols of the elders of Zion’ continue to appear on the internet as a primary source. There is a risk that search engines will index these sources and will mistakenly be used.[33] Errors and forgeries can be quickly replicated through ‘mirror sites’ and will circulate. Although I acknowledge parallel events can occur with the use of footnotes, the internet will distribute the mistake far quicker than a printing press, allowing less time for the mistake to be discovered.
Conclusion
To conclude, search engines have various limitations which mean they are not useful to academic historians. In practice it would take too long to search through the mass of results which a search engine will produce. Even ‘inclusion’ or ‘exclusion’ searches are not enough to prevent a mass of results and irrelevant sites appearing on results. The ranking system can also be unhelpful and may follow a different criterion to the historians. Each search engine will show a different set of results, but it is impractical to check each brand. An academic specially trained in internet searching would possibly be better equipped, however most historians would not have had such a skill included within their training.
Search engines only index a small portion of the web. Inroads have been made into the invisible web via new formats becoming accessible and new search engines like Google Books specifically targeting useful information previously unreachable. However as inroads are made the invisible web continues to grow at an exponential rate and as potential information remains invisible via search engines they are not useful to academic historians. The invisible web also contains images and untitled original documents the historian may wish to use.
Search engines do not differentiate between valid and unreliable information. Although academic historians have been trained in methods which can help check the reliability of a source, dependable sources are mixed with undependable sources causing potential problems. There are specialist search engines that can ensure that material is reliable, but this significantly limits the quantity of data searched. Due to their limitations, search engines are not sufficient tools for academic historians to use for their research.
Bibliography
Ackermann, Ernest and Hartman, Karen, The information searchers guide to searching, researching on the internet, Second edition (Wilsonville, Oregon, 2001)
Basch, Reva, Researching online for dummies, a reference for the rest of us (Foster City, 1998)
Barber, Sarah and Penniston-Bird, Corinna M, History beyond the text, routledge guides to using historical sources, a students guide to approaching alternative sources (London, 2009)
Bradley, Phil, The advanced internet searchers handbook, second edition (Michigan, 2004)
Cooke, Alison, A guide to finding quality information on the internet, selection and evaluation strategies (London, 2001)
Cullen, Jim, Essaying the past, how to read, write and think about history (Chichester, 2009)
Devine, Jane, Egger-Sider, Francis, Going beyond Google, the invisible web in learning and teaching (London, 2009)
Fielding, Nigel, Lee, Raymond M, Grant Blank, The Sage Handbook of online research methods (Los Angeles, 2008)
Glossbrenner, Alfred and Glossbrenner, Emily, Search engines for the World Wide Web, Second edition (Berkeley, 1999)
Grallet, Preston, How the internet works, Seventh edition (Indianapolis, 2004)
Hockey, Susan, Electronic texts in the humanities, principles and practice (Oxford, 2000)
Jeanneney, Jean-Noel, Google and the myth of Universal knowledge (Chicago, 2007)
Korfhage, Robert R, Information storage and retrieval (New York, 1997)
Lang, Sean, Tosh, John, The pursuit of history, aims methods and the new directions in the study of modern history, Fourth edition (Harlow, 2006)
Marius, Richard, Page, Melvin E, A short guide to writing history (New York, 2010)
Milstein, Sarah, Dornfest, Rael, Google, the missing manual, the book that should have been in the box (Beijing, Farnham, 2004)
Presnell, L. Jenny, The information literate historian, a guide to research for history students (Oxford, 2007)
Schlein, Alan M, Find it online, the complete guide to online research, 2nd edition (Tempe, Arizona, 2000)
Schneider, Fritz, Blachman, Nancy and Frederickson, Eric, How to do everything with Google (New York, 2004)
Stanford, Micheal, Thenature of historical knowledge (Oxford, 1986)
Stein, Stuart, Learning, teaching and researching on the Internet, practical guide for social scientists (Harlow, 1999)
Internet Resources
Cohen Daniel J and Rosenzweig, Roy, Digital history, a guide to gathering, preserving and presenting the past on the web (University of Pennsylvania, 2006) <http://chnm.gmu.edu/digitalhistory/exploring/> [Accessed October 2011]
[1] Preston Grallet, How the internet works, Seventh edition (Indianapolis, 2004) p. 191
[2] Alfred and Emily Glossbrenner, Search engines for the World Wide Web, Second edition (Berkeley 1999), p. 5
[3] Alison Cooke, A guide to finding quality information on the internet, selection and evaluation strategies (London, 2001), p. 15
[4] Jane Devine, Francis Egger-Sider, Going beyond Google, the invisible web in learning and teaching (London, 2009), p. 20
[5] Ibid., p. 25
[6] Jean-Noel Jeanneney, Google and the myth of Universal knowledge (Chicago, 2007) p. 61 and Robert R Korfhage, Information storage and retrieval (New York, 1997), p. 277
[7] Reva Basch, Researching online for dummies, a reference for the rest of us (Foster City, 1998), p. 287
[8] Susan Hockey, Electronic texts in the humanities, principles and practice (Oxford, 2000), p. 8
[9] Jim Cullen, Essaying the past, how to read, write and think about history (Blackwell Publishing Ltd, 2009), p. 37
[10] Jenny L Presnell, The information literate historian, a guide to research for history students (Oxford, 2007), p. 136
[11] Sarah Barber and Corinna M. Penniston-Bird, History beyond the text, Routledge guides to using historical sources, a students guide to approaching alternative sources (London, 2009), p. 125
[12] Presnell, The information, p. 146
[13] For information on how ‘crawlers’ work, see Glossbrenner, Search engines, pp. 5-12
[14] Egger-Sider, Going beyond Google, p. 79
[15] Presnell, The information, p. 146
[16] Phil Bradley, The advanced internet searchers handbook, second edition (Michigan, 2004) p. 18
[17] Basch, Researching online, p. 47
[18] Alan M Schlein, Find it online, the complete guide to online research, 2nd edition (Tempe, Arizona, 2000), p. 30
[19] Egger-Sider, Going beyond Google, p. 7
[20] Daniel J Cohen and Roy Rosenzweig, Digital history, a guide to gathering, preserving and presenting the past on the web (Pennsylvania, 2006) <http://chnm.gmu.edu/digitalhistory/exploring/> [Accessed October 2011]
[21] Egger-Sider, Going beyond Google, p. 12
[22] Fritz Schneider, Nancy Blachman, Eric Fredericksen, How to do everything with Google (New York 2004), p. 337
[23] Sarah Milstein, Rael Dornfest, Google, the missing manual, the book that should have been in the box (Beijing, Farnham, 2004), p. 221
[24] Egger-Sider, Going beyond Google, p. 11
[25] Ibid., p. 9
[26] Presnell, The information, p. 144
[27] Egger-Sider, Going beyond Google, p. 15
[28] Ibid., p. 8
[29] Presnell, The information, p. 148
[30] Rosenzweig, Digital history, <http://chnm.gmu.edu/digitalhistory/exploring/>
[31] Ibid.
[32] Ibid.
[33] Presnell, The information, p. 7