New to requesting work on 'Turk, and looking for input on how to structure a data collection task.
I'd like to collect address and phone numbers for the district offices of US Senators and Representatives. I have a URL for each legislator's website, but they're all different and generally resistant to automated scraping. An example is the set of addresses at the bottom of the page here: https://www.boxer.senate.gov/ .
As you can see, there are several offices -- always at least one but fewer than ten, and I don't know how many ahead of time.
Any advice or examples on how to design HITs to collect data for this sort of task? How can I encourage workers to gather a complete set of addresses? Assuming each HIT corresponds to getting data for a single legislator, is it possible to compensate workers more if they need to enter in more addresses?
Thanks in advance!