It’s big, and it’s easy and free to use – but what do we really know about Google Scholar?
Part 1 of this blog gives a short description, before Part 2 looks at what research says about its performance.
The real-life demands of commercial search projects mean that often clients require search output within a week, from free to access sources. So one of the key areas of evidence that the specialist searcher needs is on the nature and general performance of freely available sources.
Enter Google Scholar (and its competitor, Microsoft Academic). This post focusses on Google Scholar: since its launch in 2004, Google Scholar has risen from being a curiosity for professional searchers, busy searching the ‘real’ bibliographic databases – to being a very useful component of a multi-pronged search approach. Wikipedia gives a useful overview of Google Scholar’s history, specification and limitations. The database includes scholarly journal articles, conference papers, books, theses, dissertations, technical reports, patents and law reports.
Estimates of Google Scholar’s size have included coverage of 80-90% of published scholarly literature (Trend watch, 2014), or around 389 million documents (January 2018, reported by Gusenbauer, 2019). In comparison, Microsoft Academic is smaller, at (only!) 248,455,650 records (Microsoft Academic site, 28 December 2020).
Google Scholar’s features are underpinned by its nature as a web-crawled resource, rather than a systematically-defined collection of indexed literature.
- Wide coverage – shown as greater than other commonly-used sources (Martín-Martín et al, 2020)
- Freely accessible interface; each retrieved citation links to publishers’ abstracts/full text (typically 50% of citations, based on personal experience)
- Easy to use interface; limited functionality to customise search
- Results exportable in a range of formats, including to common reference management apps
- Not transparent on journal title coverage, update frequency or search algorithm (which uses social media-type citation popularity to rank relevance)
- Unreliable/unstable total number of hits per search
- Citation checking required due to frequency of missing or incorrect information
- Lack of indexing or more advanced search functions, or ability to save search
- Does not support bulk export, apart from on a record-by-record basis
- Interface detects high volume use (e.g. while carrying out a review) and blocks user access without warning
In my opinion, this mixture of great features and severe limitations make Google Scholar an important source be aware of, but to use with caution. Please look out for my next post on what recent research says about Google Scholar’s performance.
Wikipedia. Google Scholar (page accessed 28 December 2020).
Trend watch. Nature. 2014;509(7501):405 – discussing Madian Khabsa and C Lee Giles. The number of scholarly documents on the public web. PloS ONE. 2014;9:e93949.
Gusenbauer M. Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics. 2019;118(1):177-214.
Microsoft. Microsoft Academic (page accessed 28 December 2020).
Martín-Martín A, Thelwall M, Orduna-Malea E, López-Cózar ED. Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. arXiv preprint arXiv:2004.14329. 2020 Apr 29.