Extensive scraping of demanding web content
We offer premium technology services for business.
DevsData LLC is a boutique software & recruitment agency, with Google-level engineers and a vast network of senior expert contractors.
You name the source and data to extract and we will come up with a tailor-made solution adjusted to your needs.
- Product pricing & details
- Social Media
- Real Estate Data
- Media News
Battle-tested at web scraping
Our engineers have an in-depth understanding of complex databases and broad experience in processing them.
We are able to extract data from multiple challenging sources, even scrape-proof websites. To achieve such results, we use the most advanced tech solutions such as:
- Human Browsing experience simulation using selenium webdriver
- Premium mobile proxy & VPN usage
- Automatic captcha solving (apart from Google ReCaptcha)
Throughout the years our engineers have gathered, extensive hands-on experience
Creating and maintenance of several small scrapers on short notice
We designed and maintained several small scrapers for a company project on short notice. Our task was to extract the data as quickly as possible and filter it to obtain only the essential information. One of the projects was a Natural Language Processing scraping engine for a London-based hedge fund – it scraped and scored news articles based on precise criteria given by the client.
Scraping and processing confidential data
We created a scraper for a US-based client. It required as few requests as possible to collect responses for 300m SSNs under the protected form on the website. The obvious choice for scraping technology was the low-level request package.
The system was set on ten small machines on a Google cloud.
Scraping data stored deep inside HTML
We worked on extracting data from Filmweb – the second biggest movie database in the world. It required as few requests as possible to collect all data about every movie/TV series on the website. The data was stored deep inside HTML. Beautifulsoup was used to collect essential information and parts of the website.
Communication is the key
We always make sure to be on the same page with our clients as we strongly believe that communication is the key to fruitful cooperation.
Most of our specialists work remotely from our European office, however, we are open to permanent, cross-border relocation of selected engineers. For longer projects, we usually start full-time engagement with 2 weeks of onboarding, locally at the client’s office.
We took part in the maintenance and modification process of many scraping engines.
Scraping unstructured data from Wikipedia
We created a scraper running on Wikipedia to collect a data set regarding movies/television series and their cast. The biggest threat in this project was that the website was non-structured, so links to other subpages could have been located everywhere. Scrapy, which memorizes visited subpages and schedules pages to visit, was the most efficient technology to use.
Boosting the efficiency of an existing scraping engine
We took part in the maintenance and modification process of the company’s scraping engine. It was responsible for collecting profile data about people and companies from about ten confidential sources. The data had been purchased before, so our task was to collect what was either not yet available to buy or to extend the possessed data.
Gathering data from numerous websites
Our client needed to collect data on clothing products, with the main focus being their categorization and prices. There were about 30 websites with varying depth of information and protection against scraping.