We need your help to make sure Turker Nation stays online. Please click the thermometer to learn more.
Note: clicking on the above banners and making ANY purchase returns a commission to Turker Nation.
If you can't see the ad, please click on Shop on Amazon instead. | Want to advertise here? PM Spamgirl to learn more!




Results 1 to 6 of 6

Thread: Advice on collecting multiple pieces of data

  1. #1

    Advice on collecting multiple pieces of data

    New to requesting work on 'Turk, and looking for input on how to structure a data collection task.

    I'd like to collect address and phone numbers for the district offices of US Senators and Representatives. I have a URL for each legislator's website, but they're all different and generally resistant to automated scraping. An example is the set of addresses at the bottom of the page here: https://www.boxer.senate.gov/ .

    As you can see, there are several offices -- always at least one but fewer than ten, and I don't know how many ahead of time.

    Any advice or examples on how to design HITs to collect data for this sort of task? How can I encourage workers to gather a complete set of addresses? Assuming each HIT corresponds to getting data for a single legislator, is it possible to compensate workers more if they need to enter in more addresses?

    Thanks in advance!

  2. The Following User Says Thank You to ungiddy For This Useful Post:


  3. #2
    I'm a newbie as well, but I figure you may want a few different ideas, and I have at least one to share, so I'll give it a go.

    Any advice or examples on how to design HITs to collect data for this sort of task?
    Do the websites that you have tend to link to the precise page with the complete contact information, or are they to homepages? If they link to home pages, I would create two projects:
    P-I. Tasks for workers:
    1. Navigate to ${HomePages} (this will pull, for each HIT, one link from an excel sheet saved as .csv with column head "HomePages" and links beneath it)
    Can also include ${SenateOrHouse}${LegislatorName}; ${State} and ask them to navigate to it themselves if the link is broken.

    2. Navigate to the "Contacts" page where addresses/phone numbers are listed, if not the first page

    3. Enter URL of the contact page into the text box (include even if same as link you clicked):
    <input class="form-control" name="ContactPages" placeholder="MM/DD/YYYY" size="120" type="text" />

    4. Enter the number of distinct office addresses, and indicate the number here:
    <input class="form-control" name="Set1Total" placeholder="MM/DD/YYYY" size="120" type="text" />

    P-II Tasks for Workers
    1. Navigate to ${HomePages}
    1.a Checkbox: check if the above link DID NOT lead to the contacts page for ${NameOfLegislator} (this checks first part of project 1 work), and enter the correct link into this text box (gets you the info so no gaps in your data set).

    2. Enter the number of distinct offices/addresses on the page here:
    <input class="form-control" name="Set2Total" placeholder="MM/DD/YYYY" size="120" type="text" /> (this checks second part of project 1, and project 2 work, creating discrepancies between these two columns that you can add conditional formatting to in excel, then sort them and use the resulting discrepancy data to create a 3rd HIT, or if it were me, I would likely just check these myself at this point).

    3. Copy and Paste addresses and phone numbers into the below fields (you will want to find the formatting that works best. You can probably use javascript to make he number of address fields conditioned on the input from the previous question.

    If the links you have go directly to the contact pages, then make the second project above the first project, and simply create a second project that is, in essence, "check X data against the info on Y website"


    Regarding varying the pay rate:
    1. Straight forward option: For each batch of HITs created from a single project, I don't believe there's a way to vary pay rate between like-HITS ex ante, especially where the amount of data for a given HIT won't be known until after it is complete. This leads me to the Bonus feature as a natural mechanism, for post-HIT price adjustments. If in your position and I wanted to offer comp in this way, I would specify the $x per additional piece of data, and then deal with it as a bonus after the fact by sorting your answer file according to the number entered for the total number of contact info fields.
    2. Little more complicated: In the first example above, your concern is with the second project (the first one doesn't seem to give rise to concerns regarding the fairness of static payment values. However, once you have project 1 complete, then you will know ex ante with legislators will require more data entry then others. You can group your data by the number of input fields expected based on worker 1's answers, and create batches of HIT's by the amount of data expected (i.e. all with 3 address fields grouped in a project paying $.03 per HIT, all with 8 address fields grouped in project that pays $.08 per HIT, etc.).

    I hope that you find this helpful, please let me know of any questions about what I've proposed!

  4. The Following 2 Users Say Thank You to RCC Ventures For This Useful Post:


  5. #3
    Moderator RippedWarrior's Avatar
    Join Date
    Sep 2011
    Gender
    Male
    Location
    Canada
    Posts
    2,025
    Thanks
    1,535
    Thanked 2,402 Times in 1,008 Posts

    You can compensate workers extra by sending them a bonus payments. For example, offer $0.05 bonus for each extra office they gather. This would have to be calculated and paid manually (unless you are using the API in which case you can do it programatically).

    I would recommend using your own custom qualification to create a curated list of workers who understand your task and your expectations. This way, you can train a smaller group of workers to complete the task correctly, and not have to worry about whether people are collecting the extra addresses where they should.

    You can do this by creating a set of "qualification" HITs, where you already know the answers. Grant your custom qualification to those who perform well on the "test" HITs.
    Eschew obfuscation, espouse elucidation. Batch HITs for newbs. 1000 req'd. 5000 req'd.

  6. The Following 2 Users Say Thank You to RippedWarrior For This Useful Post:


  7. #4
    Thank you for the suggestions! I'll post back with results.

  8. The Following 2 Users Say Thank You to ungiddy For This Useful Post:


  9. #5
    How would you go about created a curated list of workers like you've mentioned here?

  10. #6
    Moderator RippedWarrior's Avatar
    Join Date
    Sep 2011
    Gender
    Male
    Location
    Canada
    Posts
    2,025
    Thanks
    1,535
    Thanked 2,402 Times in 1,008 Posts

    Quote Originally Posted by brybrows View Post
    How would you go about created a curated list of workers like you've mentioned here?
    Hi brybrows, there are several ways you can do this.

    1. If you are using the API, you can create a qualification test that will auto-grade the answers and automatically grant the qualification
    or
    create a qualification test that will record the answers and then you grade them yourself and grant the qualifications.

    2. You can (API or GUI) create a qualification HIT. Use the HIT to test workers, and grant the qualification to those who meet your requirements.

    3. You can post HITs for workers to complete, and then identify the top performers and grant them the qualification.

    4. You could reach out on the forum, introduce yourself, and recruit workers that way.

    5. You could contact SpamGirl and she can recommend workers who are known to be trusted.
    Eschew obfuscation, espouse elucidation. Batch HITs for newbs. 1000 req'd. 5000 req'd.

  11. The Following 2 Users Say Thank You to RippedWarrior For This Useful Post:


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •