Search engine optimization proves Google is not effective as an enterprise research tool
Conducting online research often involves users turning straight to Google or another major search engine. This can be an exasperating exercise, with users continuously reworking their search queries in the hope of producing better results.
The SEO (search engine optimization) industry proves that Google and other search engines are highly vulnerable to manipulation. When results are ranked according to an obscure set of standards favoring advertising budgets, conducting serious research with Google can be not only frustrating, but unfeasible.
Googling companies and individuals is absolutely a worthwhile endeavor. Google can produce millions of results within milliseconds. However, the value of these searches diminishes not long after the milliseconds a search takes. It is a shallow search.
The black hole of false positives
Whilst trained researchers know exactly what to look out for, the manual strain of ignoring false positives is hard to quantify and parse. Nardello recently highlighted that a Google search for ‘Wei Chen’ yields more than 1.3 million results, including singers, actors, professors, activists, journalists and an Australian-Chinese gang member, a US Department of Defense contractor indicted on charges related to theft of classified information, and a former Chinese government official accused of stealing millions from the state and absconding to the US. Part of the difficulty is represented by the abundance of information that you can access - Google knows the words that you’re looking for - but not in the right context, or with the right source prioritization. Yes, Google may only take milliseconds for a search, but make no mistake - the time saved will be lost sifting through irrelevant results.
Right to be forgotten
The EU’s right to be forgotten legislation allows entities to have negative information removed from their search engine - in effect, to prevent false negatives. Although some privacy NGOs have advocated this, it provides a big problem when conducting due diligence. The right to be forgotten was invoked to remove from Google searches 120 reports about company directors published by Dato Capital, a Spanish company which compiles such reports about private company directors, consisting entirely of information they are required by law to disclose
However, the EU will not allow you to remove your name from the bankruptcy register, for example. Or sanctions lists. Tools which search the deep web are imperative in an age where search engine manipulation is so prevalent.
Can I use the Wayback machine?
The Wayback machine captures only snapshots in time, and greatly limits the amount of data it captures as a result - 10,000 pages of a 1,000 page site, one a month, would be an example of Wayback caching. There are alternatives, such as http://www.screenshots.com/ or archive.is, but these are even more limited. Initiatives such as the British Library’s show that no single organization can be entrusted to keep a comprehensive archive.The Wayback archive is not reliable. Moreover, many robots.txt files refuse archiving from the Wayback machine, such as google websites:
Search engine manipulation will continue
What is SEO, and why should you care?
Here’s a list of reasons provided by Kissmetrics suggesting why Google will downrank a website. Consider that none of this directly relates to source reliability and relevance, and the ubiquity of blackhat SEO techniques demonstrates many of these attempts to comb and downrank are failing. Conversely, prioritising these algorithmic rules allows sites to drive up their Google ranking, polluting the quality of results.Google does not operate as an effective curator of the web, because curating the web is antithetical to its business model.
As a result, authoritative sources of information rank lower.
- Google is aware of the problem. In 2015 8 Google engineers published an academic paper Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources that describes how trust and factual accuracy could be used to rank websites in the SERPs. However, this approach has not solved the problem, despite Google’s efforts with its Penguin algorithm.
- Trusted sources tend not to worry about external links to outside information, because they are providing primary information themselves. The web is filled with links, some of them paid for by advertisers, and some intended for editorial.
The growth of an industry which serves to manipulate Google for competitive advantage highlights its weaknesses. The SEO-spending of digital marketing is forecasted by media forecaster Borrell Associates as likely to exceed $80 billion by 2020. These numbers are vast - and it is important to be skeptical, but they cannot be far wrong, and it would be a erroneous to anticipate that SEO techniques will cease to influence search behavior in the short term.
Google fails to access information kept behind paywalls, online documents or databases inaccessible to the public or censored - frequently pertinent to due diligence or investigation, as well as journalistic researchers. The Deep Web is not to be confused by the Dark Web - an obfuscation. The Deep Web as absolutely necessary to find information contingent to the real make up of an industry, or legislative landscape. As Nardello suggest:
In countries with high degrees of government censorship, such as Myanmar, the independent press may be run from abroad or through blogs and chat forums. The media in those countries may be pushing a hidden political or business agenda or have critical information scrubbed by government censors.
The importance of recognizing this for due diligence cannot be underestimated:
- Whilst increasing, a majority of the world’s population does not have access to the internet
- Only a third of all people living in Asia are internet users
- More than two thirds of Indian companies have no online presence
- Governments in most jurisdictions operate predominantly offline
Google also has limitations on proximity queries which make it ineffective for searching adverse terms. In Arabic or Chinese, this limit is hit very quickly.
Increasing concerns that Google, Yahoo and Bing have become the Internet’s gatekeepers are investigated by the FCC & the FTC, who are currently investigating on the basis of network neutrality. However, since the FTC have complained that Google’s lack of neutrality are posing ‘real harm to consumers and to innovation’ in 2012, little has changed.
The European Commission are going further - they have brought three antitrust cases against Google in the past two years, in particular a case against AdSense in July 2016. European Commissioner Margrethe Vestager is set to bring her antitrust case against Google to a close this year, and may wish to set a precedent. But the case is more likely to result in a fine which, though vast, is to Google affordable, and not a true incentive to overhaul its search algorithms.
Will SEO halt, making Google more reliable?
While the ‘S’ and ‘E’ of SEO continue to exist, SEO will always be a problem when researching sensitive information. SEO is also considered a science, and will continue to be taught in computer science or marketing degrees. Search engines will always rely on algorithms to crawl pages and rank results, which will rarely be transparent or customizable. As Digital Due Diligence wrote several years ago:
“…there will always be black-hat SEO. It’s a game theory kind of thing; the more people abandon it, the more it pays off for the folks who do it right.
These problems fundamentally cannot be solved by, say, switching to Bing. They are the consequences of using a consumer-grade search engine. Google’s business model is designed to support advertisers, and ultimately push more products and services to a personalized ranking, not provide sophisticated, trustworthy information in real-time. A data-agnostic search tool would be entirely useless for Google. Google will continuously attempt to improve its search algorithms to provide higher quality results for consumers, but researchers are not consumers of information - they are the opposite. A good researcher consumes as little information possible pertaining to a task; their experience affords them source knowledge for this very purpose.
Google has become an indispensable tool in so many areas of ordinary life, but it has severe weaknesses for those conducting professional research, which SEO seeks to exploit. SEO is a heightened problem to those conducting research of volume, or corporate checks of high sensitivity. An enterprise-grade, integrated search engine stops firms falling down the rabbit holes of false positives.
Arachnys Investigator is a bespoke search platform which pairs with existing enterprise-grade technology to address these research objectives for legal firms.