Then try running what's there in the readme:
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip
After that, you need to open up taparip.pl and start changing configuration. I was lame and lazy, so you have to edit it directly. Maybe if I ever decide to care enough I can make it an INI or YAML or something. Anyway, you need to edit this section:
https://github.com/labster/taparip/blob/...rip.pl#L20
$domain should link to the path of the forum you're trying to rip. Change the URL to your forum.
$apipath is probably still the same -- it will break right away if run and you'll know it if it's wrong, and then I'd actually have to look.
$dbfile is where you save the output. Pick some directory you can write to. Or just use sudo, lol (this software makes no warranty for any purpose lol)
The next one you should set is $endtopic. Get the thread number of the most recent thread from the URLs around the forum.
$repeat_thread is well... okay, this is how the software works: Within the list of thread numbers, it jumps around at random to simulate organic traffic. Therefore someone looking at server logs won't think it's obvious they're getting crawled. However, if you restart the script, the order will be re-randomized, so it might try to download threads you've already downloaded. If it's set to 0, you get everything once. If 1, you can get threads again -- most useful when you need to pick up new posts.
$delay - 2 seconds seems like we're not trying to murder their servers, right?
$verbose - you like words, don't you?
$username and $password -- If there are private forums you're trying to capture, you'll need this. I believe someone said it worked. However, I migrated this forum without needing to login, since it's all public.
$authorz - um, some guy added this to my script later, leave it alone I guess.
perl -c taparip.pl will check the code compilation without running anything, so you can check to see we have all the modules and you didn't mess up syntax before actually running it.
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip
After that, you need to open up taparip.pl and start changing configuration. I was lame and lazy, so you have to edit it directly. Maybe if I ever decide to care enough I can make it an INI or YAML or something. Anyway, you need to edit this section:
https://github.com/labster/taparip/blob/...rip.pl#L20
$domain should link to the path of the forum you're trying to rip. Change the URL to your forum.
$apipath is probably still the same -- it will break right away if run and you'll know it if it's wrong, and then I'd actually have to look.
$dbfile is where you save the output. Pick some directory you can write to. Or just use sudo, lol (this software makes no warranty for any purpose lol)
The next one you should set is $endtopic. Get the thread number of the most recent thread from the URLs around the forum.
$repeat_thread is well... okay, this is how the software works: Within the list of thread numbers, it jumps around at random to simulate organic traffic. Therefore someone looking at server logs won't think it's obvious they're getting crawled. However, if you restart the script, the order will be re-randomized, so it might try to download threads you've already downloaded. If it's set to 0, you get everything once. If 1, you can get threads again -- most useful when you need to pick up new posts.
$delay - 2 seconds seems like we're not trying to murder their servers, right?
$verbose - you like words, don't you?
$username and $password -- If there are private forums you're trying to capture, you'll need this. I believe someone said it worked. However, I migrated this forum without needing to login, since it's all public.
$authorz - um, some guy added this to my script later, leave it alone I guess.
perl -c taparip.pl will check the code compilation without running anything, so you can check to see we have all the modules and you didn't mess up syntax before actually running it.
"Kitto daijoubu da yo." - Sakura Kinomoto