Scraping eHow.com Website: How to prevent data-scraping a valuable data web service?

I have a great idea for a windows store app. I'd like to make this app. However it requires a large and valuable database that I will need to create a service for so that people cannot easily steal it. My thinking is maybe host a mobile service on Azure (which I've never tried) and create a .net Web API project to take requests and dish out Json like candy to a windows 8 mvvmclient. However what I don't want is someone sniffing my traffic back and forth from app to service and figuring out how to get/post data from using my app and service then setting up their own app / website to display this data using my bandwidth to make them money.

How can I protect my app-to-db data access so it can't be reverse engineered on me.

Also is this the best setup for developing a high volume windows 8 app like this? Do you have a better suggestion?

EDIT: I know I can use SSL etc to encrypt traffic to and from. What I am trying to protect is someone using Firebug or Fiddler to figure out what parameters can be posted to get a particular record back. Then creating their own site that simply uses my service as the end point and siphons my data and whores my bandwidth. ie. Just using firebug I know I can use https://www.google.com/search?q=dallas to search the word dallas on google. Even if I encrypt the page, they can see that much in their browser. so if someone does the same get/post in their own application they would get the same records back thus using my stuff.

3 Answers

The most straight forward thing you can do is to setup authentication for your users using something like OAuth. This will allow you to ensure no communication happens with your service in an anonymous fashion.

Once you have authenticated your requests you can place controls on those requests that won't impact a normal user. You could rate limit or throttle requests or any number of tactics to make it very expensive time wise to siphon off large portions of your data set.

For instance, you can start blocking requests when you notice a large number of users clustering from a single IP address. You could place sensible limits on each user (like 10 API calls per minute with a result set limited to 50). You get the idea I'm sure.

I think we met the same concern. I'm developing a windows 8 application which is contacting a web service built on top of Windows Azure Web Site. I don't want the bad guy fire some fake requests to my service by intercepting the traffic through some tools like Fiddler.

I asked this question in a mail group and got a tip. I've never tried but just for your information. If your application needs user login, then the user's password is a good seed for data/traffic protection. You can use the password to generate a key-pair, sign the request and send it to server as well as the public key. Then on the server side it can verify the sign by the public key.

Use HTTPS is another approach. But as you know, a bad guy can also know the actual data through Fiddler even though HTTPS.

Use certificate might be another solution I think. But I didn't find the relevant document on how to install and pick a certificate from client's machine.

HTH

just serve it over HTTPS, then they can't sniff it.

Source:http://stackoverflow.com/questions/14350298/how-to-prevent-data-scraping-a-valuable-data-web-service

Scraping eHow.com Website

Monday, 18 August 2014

How to prevent data-scraping a valuable data web service?

No comments:

Post a Comment