I've been playing around with some daredevil fandom stats in the past, I scraped all the data for the first prompt of the kinkmeme a long, long time ago. (Like when it was just two prompt posts.)
And there is a lot of interesting stuff I and anyone else interested in fandom stats could do with that data but collected over a longer period of time. It's just that that first day, scraping 1,000 prompts took all day long. (That's only a little bit of an exaggeration.) And now there are 5+ kinkmeme prompt pages.
I was thinking, I'd make a spread sheet for the information I want to collect, and then assign anyone interested in helping me a prompt post and page numbers within those prompt posts to fill.
That's 151 prompts pages (when I guessed on my tumblr I guessed 150. For a total between 3650 prompts and 3775 prompts on the kinkmeme. Since each page has a maximum 25 prompts in it.)
Anyone anyone willing to do this with me would be a huge help.
But to give some perspective, if I had 30 people working on this, they would each read through and take notes on 5 pages (125 prompts) which would take less than two hours. It could be done in a day
If, when I had this all sorted out, and people got back to me within a week, it would be done in that time.
If fifteen people were willing to work with me for two hours a day once a week for two weeks it would be done in a half a month. Which is about how long it would take me working my ass off almost every single day in that time span.
And so on, and so forth.
Five people, doing five pages a week would take 6 weeks. (And, actually, since I would be one of those people, I'd only need four others willing to help.)
So, again, this is just my outline for how much work is involved. If anyone is interested in helping I would feel comfortable assigning a minimum of one page (25 prompts).
What this means for anyone helping is they get to reread a random part of the kinkmeme that they maybe haven't seen or don't remember well. I remember the last time I did this I discovered tones of knew prompts that I missed the first time I read through the kinkmeme.
And, for reference, here is some of the data I am collecting. I absolutely want: data posted, main pairing, secondary parining(s)*, other mentioned pairings**,
I would be intrested to look at what gets filled: has it been filled? What is the word count of the fill? What pairing(s) from the original prompt does the fill use.
Also: is it a cross over? (if so, with what?) is a character disabled who isn't normally? is it RPF? is it asking for a polly pairing or is it open to one?
*This is like Matt/Foggy preferred by Matt & Foggy fine. ** This is where the prompts talks about say, Karen/Claire but also mentions that Matt/Foggy is a thing in the background.
However, I know that the mod is also trying to get all the prompts into delicious as well. So, maybe we reach out and see if we could do both at the same time? It would make sense, since we're already going through the prompts. Then we could add a column for tags on delicious and a checkbox that it's there?
Also, what are you planning on doing with the data? Is it just a curiosity thing? Were you planning to post it publicly here on the meme. I have friends who look at fandom fairly academically and might be interested in it too since it could have real research applications.
Also, maybe some cross-scraping with data from AO3? I'd be curious to see how many fics here get cross-posted there, and if they do which authors write a lot of fills for this meme under their own name there. Also, the percentage of DD stories on AO3 that were generated by this meme vs. not, since those from this meme are generally in that collection.
Oh! And we should be tracking how many prompts here fall into being technically "kink" vs. non-kink, since this was originally supposed to be a kinkmeme (or even just sexual vs. non-sexual).
I mean, if you're going to collect data, collect data, right? lol
Reach out to me on my Tumblr. I am enthusiasmgirl (yes the one running the challenge on the Challenge Post). :D
That would make sense! I don't know how to do delicious though.
My fandom stats tag (a couple months back) covers a lot of my thoughts about this. I'd been trying to rescrape the data ever since I finished page 1, but I burned myself out really bad and haven't been able to get back into it. (Because it's just such a huge, huge project. I looked into bot scrapping but people much more knowledgeable than me said that DreamWidth's API is trash and half the stuff I wanted to scrape couldn't be done so that way. So.)
OP: I've done more work. There are about 4851 prompts as of the closing of prompt post 5 on this kinkmeme. It would be really cool to get an accurate understanding of what the fills per prompt ratio is.
But anyway, I have my spread sheet all outlined.
I think going by the word count of a fill is hard considering that many are WIP and may never be finished. So I'm trashing that idea. (Plus, I scraped some data that way and I mean it's not that much more difficult, but it really isn't useful, at least that's what I think.)
Write now my spread sheet looks like this http://dusty-soul.tumblr.com/post/126696410137/whyy-self-why
I love that the internet exists and provides a corner of the world where scraping data about daredevil fandom and doing a podcast on the ins and outs of fanfic writing is considered cool.
OP: More info on the project, AKA, I scraped what we have on prompt 6 so far. http://dusty-soul.tumblr.com/post/126699016407/this-is-from-the-daredevil-kinkmeme-prompt-round
You should upload the spreadsheet to Google Drive if lots of people are going to be working on it.
Then, not only could everyone see what everyone else has done, but we could also hold impromptu chat parties when we find ourselves working the doc with other people at the same time. :D
I'm still working out the kinks and creating an explanation for how I want other's to fill in the spread sheet, so it can be as constant as possible across all data gathers. But once I work it out that's my plan. I'm hoping I can have it up before move in.
I'm also a little worried about someone like, accidentally deleted a huge portion of stuff do to computer error or being new to the google drive or something. So I'm still thinking about that...
OP: Also the fact that the spread sheet is 5,000 rows long, with the full number of columns means that it takes forever to load, and would be likely to crash if their were too many collaborates.
You can actually go back to previous edits in Google Docs if you have to. And you can lock them down so that only specific people can edit them or only specific people can see them who have to be logged into Google to do it.
And it should be able to handle a ton of lines of data.
Just FYI. I used to use Google Docs all the time for convention running, so I know you can throw a lot at it.
OP: I like the chat party idea a lot, actually. I broke them off into prompt posts sections. I think I may just have free for alls on each section until they are filled. That way the loading problem isn't a problem and chat parties can still be a thing.
I still wouldn't want to leave editing open to any one with the link I don't think :/.... I would be much more comfortable sending it over email. Especially since then I can explain to people a bit how it work and made sure that no one didn't want to hoped on with their email that's they're real name or something.
It's a concern I'm personally very paranoid about.
Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 08:05 pm (UTC)(link)And there is a lot of interesting stuff I and anyone else interested in fandom stats could do with that data but collected over a longer period of time. It's just that that first day, scraping 1,000 prompts took all day long. (That's only a little bit of an exaggeration.)
And now there are 5+ kinkmeme prompt pages.
I was thinking, I'd make a spread sheet for the information I want to collect, and then assign anyone interested in helping me a prompt post and page numbers within those prompt posts to fill.
The work load right now looks like this:
Number of pages in each:
Prompt 1 39
Prompt 2 31
Prompt 3 26
Prompt 4 24
Prompt 5 31
That's 151 prompts pages (when I guessed on my tumblr I guessed 150. For a total between 3650 prompts and 3775 prompts on the kinkmeme. Since each page has a maximum 25 prompts in it.)
Anyone anyone willing to do this with me would be a huge help.
But to give some perspective, if I had 30 people working on this, they would each read through and take notes on 5 pages (125 prompts) which would take less than two hours. It could be done in a day
If, when I had this all sorted out, and people got back to me within a week, it would be done in that time.
If fifteen people were willing to work with me for two hours a day once a week for two weeks it would be done in a half a month. Which is about how long it would take me working my ass off almost every single day in that time span.
And so on, and so forth.
Five people, doing five pages a week would take 6 weeks. (And, actually, since I would be one of those people, I'd only need four others willing to help.)
So, again, this is just my outline for how much work is involved. If anyone is interested in helping I would feel comfortable assigning a minimum of one page (25 prompts).
What this means for anyone helping is they get to reread a random part of the kinkmeme that they maybe haven't seen or don't remember well. I remember the last time I did this I discovered tones of knew prompts that I missed the first time I read through the kinkmeme.
And, for reference, here is some of the data I am collecting. I absolutely want: data posted, main pairing, secondary parining(s)*, other mentioned pairings**,
I would be intrested to look at what gets filled: has it been filled? What is the word count of the fill? What pairing(s) from the original prompt does the fill use.
Also: is it a cross over? (if so, with what?) is a character disabled who isn't normally? is it RPF? is it asking for a polly pairing or is it open to one?
*This is like Matt/Foggy preferred by Matt & Foggy fine.
** This is where the prompts talks about say, Karen/Claire but also mentions that Matt/Foggy is a thing in the background.
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 08:06 pm (UTC)(link)Also, any help thinking about this is good.
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:42 pm (UTC)(link)However, I know that the mod is also trying to get all the prompts into delicious as well. So, maybe we reach out and see if we could do both at the same time? It would make sense, since we're already going through the prompts. Then we could add a column for tags on delicious and a checkbox that it's there?
Also, what are you planning on doing with the data? Is it just a curiosity thing? Were you planning to post it publicly here on the meme. I have friends who look at fandom fairly academically and might be interested in it too since it could have real research applications.
Also, maybe some cross-scraping with data from AO3? I'd be curious to see how many fics here get cross-posted there, and if they do which authors write a lot of fills for this meme under their own name there. Also, the percentage of DD stories on AO3 that were generated by this meme vs. not, since those from this meme are generally in that collection.
Oh! And we should be tracking how many prompts here fall into being technically "kink" vs. non-kink, since this was originally supposed to be a kinkmeme (or even just sexual vs. non-sexual).
I mean, if you're going to collect data, collect data, right? lol
Reach out to me on my Tumblr. I am enthusiasmgirl (yes the one running the challenge on the Challenge Post). :D
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:48 pm (UTC)(link)My fandom stats tag (a couple months back) covers a lot of my thoughts about this. I'd been trying to rescrape the data ever since I finished page 1, but I burned myself out really bad and haven't been able to get back into it. (Because it's just such a huge, huge project. I looked into bot scrapping but people much more knowledgeable than me said that DreamWidth's API is trash and half the stuff I wanted to scrape couldn't be done so that way. So.)
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:38 pm (UTC)(link)There are about 4851 prompts as of the closing of prompt post 5 on this kinkmeme.
It would be really cool to get an accurate understanding of what the fills per prompt ratio is.
But anyway, I have my spread sheet all outlined.
I think going by the word count of a fill is hard considering that many are WIP and may never be finished. So I'm trashing that idea. (Plus, I scraped some data that way and I mean it's not that much more difficult, but it really isn't useful, at least that's what I think.)
Write now my spread sheet looks like this
http://dusty-soul.tumblr.com/post/126696410137/whyy-self-why
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:44 pm (UTC)(link)Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:49 pm (UTC)(link)and then I do something like revisit this thing.
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 09:57 pm (UTC)(link)Where was that when I was in high school? lol
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 10:17 pm (UTC)(link)http://dusty-soul.tumblr.com/post/126699016407/this-is-from-the-daredevil-kinkmeme-prompt-round
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 10:26 pm (UTC)(link)Then, not only could everyone see what everyone else has done, but we could also hold impromptu chat parties when we find ourselves working the doc with other people at the same time. :D
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-14 11:00 pm (UTC)(link)I'm also a little worried about someone like, accidentally deleted a huge portion of stuff do to computer error or being new to the google drive or something. So I'm still thinking about that...
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-15 12:56 am (UTC)(link)Also the fact that the spread sheet is 5,000 rows long, with the full number of columns means that it takes forever to load, and would be likely to crash if their were too many collaborates.
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-17 02:40 am (UTC)(link)And it should be able to handle a ton of lines of data.
Just FYI. I used to use Google Docs all the time for convention running, so I know you can throw a lot at it.
Re: Does anyone want to help scrape data by hand?
(Anonymous) 2015-08-15 12:59 am (UTC)(link)I still wouldn't want to leave editing open to any one with the link I don't think :/....
I would be much more comfortable sending it over email. Especially since then I can explain to people a bit how it work and made sure that no one didn't want to hoped on with their email that's they're real name or something.
It's a concern I'm personally very paranoid about.