Is it possible to delete search data from a search engine's servers?

Is it possible to delete search data from a search engine's servers?

By Ed Skoudis | Oct 10, 2008

I've heard that some search engines enable users to automatically delete their search data from the search engine's servers. Is this really possible, and do you think archived search data represents a corporate data security or intellectual property threat? Search engine history is a very controversial subject. People often search for very sensitive personal items, including information about their medical conditions and love life. Many people also search for their own names, performing vanity searches to see what information is available about them on the Internet. Search engines store all of this information, usually associating all searches with the source IP address of the person making the search, and a cookie put on the user's browser by the search engine. Thus, if a user searches for their name, plus certain medical conditions, someone viewing this history can reasonably infer that they have the given condition.

Back in 2006, AOL released search history of about 20 million searches from 650,000 users, freely available on the Web, available at about half a dozen mirrors now. Their goal was to make the search information available to researchers, but they didn't properly consider the invasion of privacy that such a release entails. AOL tried to anonymize the information, passing it through software that remapped each particular user's identity into another value before releasing it publicly. Thus, for a hypothetical example, you can't tell that user Fred Smith did a search for "Fred Smith" and later searched for "halitosis." However, even with this remapping, a person can tell that a given user's anonymous number still performed both searches, implying pretty heavily that good old Fred suffers from bad breath.

But, that's personal information. To get to the point of your question, how does this impact corporate data security and intellectual property? Enterprise employees, especially those associated with some of the most important intellectual property assets of a company, frequently research new applications of their products, new markets they are considering entering, the competitors' products, potential mergers and acquisition targets, and so on. Imagine looking at the search engine history for all IP addresses associated with some large company and sorting them out by users differentiated by the cookie left on their browsers by the search engine. Surely, some very sensitive information about the organization's plans would be revealed.

Because of this concern about the sensitivity of search results, Google announced in March 2007 that they would anonymize search results after 18 to 24 months. That's better than keeping all search queries around forever, but it's a pretty long time. Also, even after that timeframe, Google doesn't delete user searches; it merely anonymizes them. Google has said that this anonymization process involves dropping some of the bits of a user's IP address as well as changing the cookie value, but details are murky.

To address this issue, other search engine companies have jumped on board the privacy bandwagon, offering users an option to avoid storing search history on their servers entirely. In July 2007, Ask.com announced their AskEraser feature, which allows users to configure the Ask.com search engine to not log any search history on their servers. By default, Ask.com logs search queries for 18 months. To change this, when accessing Ask.com, simply click on the "AskEraser" link near the top of their page. A message pops up asking if you want to turn on AskEraser. The service is pretty easy to use, and it's a helpful option for those people who desire more anonymity. While Ask.com hasn't revealed the detailed technical underpinnings of how they omit or destroy search history on their servers, such functionality is certainly possible.

Please note that the discussion above is associated with the search history stored on the search engine company's own servers. Even with AskEraser and Google's 18 to 24 month anonymizing process, browsers still maintain a browsing history that includes all recent searches -- completely independent of what the search engine itself does with that information.

More information:
 

Add comment

Post a Comment

The content of this field is kept private and will not be shown publicly.
Verification Code
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
 

knowledge_central_tab

 
 
Knowledge Central
Trusted Mobility Index
The mobile ecosystem of devices, services and networks is at a critical inflection point.While the mobile revolution is unleashing massive opportunities in both emerging and mature economies, it is also increasing in complexity and confusion. The reality is the lightning-fast adoption of powerful, smart devices is outpacing society’s ability to secure them. Today, trust in mobility hangs in the balance.
The state of the Internet, Q4, 2011
Geography appears to play a role in frequency of observed attacks on specific ports. For example, Port 23 (Telnet) is a favorite target for attacks observed to be originating from South Korea and Turkey, where it accounted for more than five times the number of attacks targeting the next most popular port (445 in both countries). Other instances of geography-based port targeting include observed attacks centered on Port 1433 (Microsoft SQL Server) in China and on Port 80 (WWW/HTTP) in Indonesia.
 
 
 
HID Global deploys a centralized, web-based IP access control solution at Fuxi Power Plant
Unable to meet the needs for real-time monitoring with its traditional patrol system, China's Fuxi Power Plant has deployed HID Global's VertX V2000.
StubHub: How to spot fraud before it happens
Whenever a list of log-on credentials is dumped onto the Web, retailers get hit with waves of automated attacks. Here's how ticket marketplace StubHub fights the threat.