Defending Against Web Scraping Attacks – Dark Reading

ProxyEgg Defending Against Web Scraping Attacks - Dark Reading

Web scraping attacks, like Facebook’s recent data leak, can easily lead to more significant breaches.

Web scraping is as old as the Internet, but it’s a threat that rarely gets its due. Companies frequently underestimate its risk potential because it is technically not a “hack” or “breach.” 

A recent example is Facebook, which has tried to downplay its latest massive data leak by claiming the scraping impacted public information only. The company overlooks the risk this type of personal data exposure poses for the victims and the ultimate value of harvesting this data on such a massive scale, particularly for social engineering attacks.  

Scraping sites for user data is nothing new; Facebook has faced this issue on multiple occasions. In 2013, I disclosed two methods of scraping Facebook user data. One involved a tool I created called Facebook Harvester, which utilized the then-recently released Graph Search feature to perform a brute-force search of phone numbers and return any associated user profile. 

Meanwhile, Facebook is still partially vulnerable to malicious scraping through its password-reset page. By entering a phone number, it is possible to pull up privately listed people on the platform — including their full name and profile photo. This is notably different from the method used in the recent data dump, in that the end user does not need to be publicly searchable. While Facebook has tightened the data revealed on this page, it may still prove a useful tool for malicious actors. 

But scraping isn’t just a social media problem. It’s an issue that affects many types of organizations across various industries. Scraping is one of the methods malicious hackers use to collect intel on companies before they target them with more significant attacks. 

Here is a closer look at this undervalued threat. 

How Attackers Use Web Scraping
Web scraping can easily lead to more significant attacks. At my company, we routinely use Web scraping as one of the initial steps in a red team or phishing engagement. By pulling the metadata from posted documents, we can find employee names, usernames, and deduce username and email formats, which is particularly helpful when the username format would otherwise be difficult to guess. Mix this with scraping a list of current employees from sites like LinkedIn, and an adversary can perform targeted phishing and credential brute-force attacks. 

In one recent example, we determined the client’s unique username configuration by collecting documents scraped from the company’s public-facing sites. These documents contained the author’s first and last name and the file path; because the file was saved within the user’s profile path, the path also contained the username. In this case, the format was two letters of the first name, the last name, and a digit. So, if the user’s name were John Smith, the username would have been josmith1. Once we found this, it was easy enough to perform credential brute-forcing by using a list of common first and last names to match the discovered username format. By running the attack with just a few common passwords per username, we gained access to at least one account, which gave our red team an initial foothold. 

Scraping document metadata is also useful for detecting internal hostnames and software versions in use at the targeted company. This enables an attacker to customize the attack to exploit vulnerabilities specific to that company, and it is an important part of victim reconnaissance.  

Adversaries can also use scraping to collect gated information from a website if that information isn’t properly protected. Take Facebook’s password-reset page: Anyone can find privately listed people through a simple query with a phone number. While a password-reset page may be necessary, does it really need to confirm or, worse, return a user’s private information? 

While this may be a worst-case scenario, many websites are still vulnerable to user enumeration via simple error messages. I see this often where a registration, login, or password-reset page returns a message like “the username could not be found” when submitting invalid credentials to the login page or for a password reset. While this seems innocent enough, attackers can abuse this notification to determine which usernames or emails exist as registered accounts for the service. A list of valid usernames could be used for more targeted credential brute-force attacks, and valid emails can be used in targeted phishing attacks.  

Controlling the Threat
There are several ways to reduce the risk of Web scraping.

First, organizations should regularly audit their websites to make sure they are not unintentionally exposing sensitive information to public-facing websites through published documents or information stored in back-end databases that are linked through the website. 

Organizations should also have a process in place to strip metadata from documents before they are published externally. They should prevent exposing things such as usernames, file paths, print queues, and software versions, as these can all be useful in mounting an attack. 

Password-reset pages often contain verbose messages that reveal if a submitted username is valid or not. Going back to the Facebook example, should the password-reset page return the full name and profile picture associated with a phone number before sending a reset link? In these instances, the password-reset page reveals unnecessary information. Where possible, pages should return a generic message after a person submits information for a password reset, letting them know a text or email will be sent to the account if it exists. The key is that the page should not indicate whether the account or information is valid.  

Rate limiting and CAPTCHAs are standard defenses against scraping, but a determined attacker may still be able to bypass these measures by using CAPTCHA-solving services or rotating through a list of IP addresses. These measures should make things more difficult for Web scraping but are not a substitute for the proper protection of sensitive data. 

Recognize the Threat
While Web scraping has long been viewed more as an annoyance than a security risk, it is widely used by attackers to gain critical insights into a company, particularly for user enumeration attacks. Implementing some of these security measures can greatly reduce a company’s risk.

Rob Simon is a Principal Security Consultant at TrustedSec, where he specializes in Web and mobile applications, as well as hardware security. Rob has more than a decade of experience in information security, with roles ranging from software development to penetration … View Full Bio

Recommended Reading:

More Insights

Source of this news:

Related posts:

Off-set fund Alden’s bid in order to purchase Tribune Publishing, including The Baltimore Sun, appro...
Rick Edmonds, the new media business analyst at the Poynter Institute in St . Petersburg, Florida, said Bainum looked like there was hinting he might establish a non-profit, digital-only startup ...
Database leak exposed a large amount of credential stuffing for Spotify users -
Researchers helped Spotify detect and address serious credential stuffing operations that affect hundreds of millions of users. On July 3, VpnMentor’s research team, led by Ran Locar and Noam Rote...
Asustor Drivestor 2 Pro AS3302T - Review 2021 - PCMag India
Designed for use as a personal cloud server, the Asustor Drivestor 2 Pro ($249) is a reasonably priced two-bay NAS that offers multi-gig connectivity and numerous USB ports. It also has a generous ca...
Best Free Proxy Sites and Services to Hide your Web - BollyInside
This list is about the Best Free Proxy Sites and Services to Hide your Web. We will try our best so that you understand this list Best Free Proxy Sites and Services to Hide your Web. I hope you like ...
'Neurevt' Trojan Targets Mexican Banking concern Customers - BankInfoSecurity. apresentando
Account Takeover Fraud , Cybercrime , Fraud Remedies & Cybercrime Updated Malware This time Includes Spyware and a Backdoor Prajeet Nair ( @prajeetspeaks ) • June 19, 2021 &nbs...
Yankees' Aaron Boone says Nestor Cortes Jr. has 'been great every step of the way' ⋆ 4State News MO ...
x style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-1064213803427912" data-ad-slot="4222299391"> Yankees Nestor Cortes Jr close up Due to injuries in th...
The Importance of Rotating IP Addresses in Ecommerce Sites -
The Importance of Rotating IP Addresses in Ecommerce Sites is to be explained. It is a common practice in the world to use proxies to collect data from the internet. The practice is often used to he...
Install Code-Server for VS code on Ubuntu 22.04 or 20.04 LTS - Linux Shout
Code Server is an open-source project to program on VS Code but using a web browser. Here we learn the command to install Code Server on Ubuntu 22.04 Jammy JellyFish or 20.04 Focal Fossa. VS Code...
Okla CISO says pandemic quick zero-trust implementation - StateScoop
Written by Benjamin Freed Monton 30, 2021 | STATESCOOP With more than half of the Oklahoma state government’s request, 000-person workforce still working on their duties remotely in a year's...
Fix Linux mint 20 - Cannot add PPA: ''This PPA does not support focal''. - H2S Media
If you are adding PPA repo in Linux mint 20.02 and getting an error  Cannot add PPA: ”This PPA does not support focal”.  Then follow the simple command given in the article that wi...
New to Telegram? Here are five advanced features every user should know - The Indian Express
Telegram is recognised widely as a feature-packed app that can do a lot more in terms of functionality compared to rivals like WhatsApp and Signal. We recently covered the Top 10 Telegram features th...
Zoom Patches Multiple Vulnerabilities -
Application Security , Governance & Risk Management , Incident & Breach Response Flaws Enable Attackers To Intercept Data, Attack Customer Infrastructure Prajeet Nair (@prajeetspeaks) • No...
How to setup a VPN on Playstation - Mashable
When you think about VPNs, gaming consoles don’t usually come to mind. A VPN is a type of cybersecurity software that sends your web traffic on a detour through an encrypted tunnel and away from your...
NAB 2022: EVS Unveils MediaCeption Signature 1 . 0 you should Sports Video Group
EVS, the main provider of live video footage technology, has announced your current launch of MediaCeption Signature 1 . 0, the company’s latest-generation end-to-end asset remedies solution for fas...
Best VPN for iPhone and iPad 2021 - ZDNet
Image: Daniel Romero via Unsplash My iPhone offers pretty good connectivity, but tends to be hamstrung by the limits imposed by my cellular carrier. Even though I have an unlimited data plan, using ...
Exactly a proxy server find it difficult to protect you like a VPN can - ZDNet
Getty Images/iStockphoto A ereader recently sent me the question: Purpose do I need a VPN? Isn't a proxy server good enough? Web site written a number of articles on VPNs , as well as ...
Bedrock ICS Proxy Solution Helping Utility Transition Seamlessly to Modern, Cyber Secure Automation ...
Bedrock Automation is the maker of Bedrock OSA®, the world’s most powerful and secure industrial control system. A Colorado utility is upgrading PLCs in this control room to modern, cyber secure auto...
How to Fix Roblox Error Code 109 - TWCN Tech News
Here is a full guide on how to fix error code 109 on Roblox. Roblox is a great online gaming platform that brings gaming enthusiasts together and enables them to play a variety of games. However, lik...

IP Rotating Proxy Onsale


First month free with coupon code FREE30