Download and browse content you love!
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Macoy Madson 664a986a6f Close #81: Fix Pixiv filenames not being unique 6 days ago
images Added more images for updated readme 3 weeks ago
templates Close #81: Fix Pixiv filenames not being unique 6 days ago
webInterface Close #81: Fix Pixiv filenames not being unique 6 days ago
webInterfaceNoAuth Close #53: Added explicit URL download field 7 months ago
.clang-format Create password on first login 8 months ago
.gitignore Added more ignores 3 weeks ago
CreateDatabase.py I can't remember what this database stuff was 9 months ago
Generate_Certificates.sh Added Gfycat API support, changed certificates 1 year ago
LICENSE.txt Update license to use my real name 3 months ago
LikedSavedDatabase.py Close #81: Fix Pixiv filenames not being unique 6 days ago
LikedSavedDownloaderServer.py Close #81: Fix Pixiv filenames not being unique 6 days ago
PasswordManager.py Create password on first login 8 months ago
ReadMe.org Updated readme links and cleaned up login section 3 weeks ago
crcUtils.py At last added full Python 3 support; 2 still works as well 2 years ago
imageSaver.py Close #81: Fix Pixiv filenames not being unique 6 days ago
imgurDownloader.py Fix imgur exception handler 1 month ago
logger.py Web interface now fully functional! 2 years ago
pixivScraper.py Pixiv: Download both public and private bookmarks 3 weeks ago
redditLikedSavedImageDownloader.sublime-project Python 3 support 2 years ago
redditScraper.py Fixed settings issues, db work, added user ignore 3 months ago
redditUserImageScraper.py Fixed bug so only Pixiv images can be downloaded 3 weeks ago
settings.py Pixiv: Download both public and private bookmarks 3 weeks ago
settings_template.txt No longer check in settings.txt 8 months ago
submission.py Close #81: Fix Pixiv filenames not being unique 6 days ago
tumblrScraper.py Web interface now fully functional! 2 years ago
utilities.py Integrated LikedSavedDatabase 1 year ago
videoDownloader.py Fixes for Gfycat to RedGifs migration 1 month ago

ReadMe.org

Content Collector

Use this awesome tool to download

  • Images

  • Gifs

  • Image Albums

  • Videos

  • Comments

…which you've marked as Liked, Hearted, or Saved from

  • Reddit

  • Tumblr

  • Pixiv

…to disk! You can then browse the results.

Directions

0. Check Releases

Check the Releases page for a ready-to-use version of Content Collector. If you find a release for your system that works, you can skip straight to the Usage section.

1. Clone this repository

git clone https://github.com/makuto/Liked-Saved-Image-Downloader

2. Install python dependencies

The following dependencies are required:

pip install praw pytumblr ImgurPython jsonpickle tornado youtube-dl git+https://github.com/ankeshanand/py-gfycat@master git+https://github.com/upbit/pixivpy

You'll want to use Python 3, which for your environment may require you to specify pip3 instead of just pip.

Login-Protected Server

If you want to require the user to login before they can interact with the server, you must install passlib:

pip install passlib bcrypt argon2_cffi

3. Generate SSL keys

cd Liked-Saved-Image-Downloader/
./Generate_Certificates.sh

This step is only required if you want to use SSL, which ensures you have an encrypted connection to the server. You can disable this by opening LikedSavedDownloaderServer.py and setting useSSL = False.

4. Run the server

python3 LikedSavedDownloaderServer.py

Usage

Access the server

Open localhost:8888 in any web browser.

If your web browser complains about the certificate, you may have to click Advanced and add the certificate as trustworthy, because you've signed the certificate and trust yourself :).

(Explanation: this certificate isn't trusted by your browser because you created it. It will still protect your traffic from people snooping on your LAN).

If you want to get rid of this, you'll need to get a signing authority like LetsEncrypt to generate your certificate, and host the server under a proper domain.

Set password

When first running the server, you will be prompted to set a password.

If you forget your password, simply delete passwords.txt.

Home page

The home page provides access to all server features:

/code/macoy/Content-Collector/src/branch/master/images/Homepage.png

Set up accounts

Use Settings to configure the script:

/code/macoy/Content-Collector/src/branch/master/images/LikedSavedSettings.png

Make sure to click "Save Settings" before closing the page.

You don't have to fill in every field, only the accounts you want.

Download content

Go to the Download Content page and click "Download new content":

/code/macoy/Content-Collector/src/branch/master/images/DownloadContent.png

Wait until the downloader finishes (it will say "Finished" at the bottom of the page). While the downloader is running, the "Download new content" button will disappear.

Browse content

Enjoy! Use Browse Content to jump to random content you've downloaded, or browse your output directory:

/code/macoy/Content-Collector/src/branch/master/images/LikedSavedBrowser.png

The browser should scale nicely to work on both mobile and desktop.

Login management

The script requires login before running the script, changing settings, or browsing downloaded content.

If you host Content Collector on the internet, you should rely on a more robust authentication scheme (e.g. use a reverse proxy which won't proxy requests to Content Collector until you have authenticated with the proxy server). Content Collector was designed for LAN use.

Note that all login cookies will be invalidated each time you restart the server. If you don't restart the server, your browser should remember login indefinitely.

Managing passwords(s)

The web interface will automatically prompt for a new password when first starting up.

You can also use PasswordManager.py to generate a file passwords.txt with your hashed (and salted) passwords:

python3 PasswordManager.py "Your Password Here"

You can create multiple valid passwords, if desired. There are no separate accounts, however.

If you want to reset all passwords, simply delete passwords.txt.

Disabling Login

Open LikedSavedDownloaderServer.py and find enable_authentication. Set it equal to False. You must restart the server for this to take effect.

Running the script only

This is deprecated. You should use the web server to configure and run the script instead.

  1. Copy settings_template.txt into a new file called settings.txt

  2. Open settings.txt

  3. Fill in your username and password

  4. Set SHOULD_SOFT_RETRIEVE to False if you are sure you want to do this

  5. Run the script: python redditUserImageScraper.py

  6. Wait for a while

  7. Check your output directory (the default is output relative to where you ran the script) for all your images!

If you want more images, set Reddit_Total_Requests and/or Tumblr_Total_Requests to a higher value. The maximum is 1000. Unfortunately, reddit does not allow you to get more than 1000 submissions of a single type (1000 liked, 1000 saved).

Not actually getting images downloaded, but seeing the console say it downloaded images? Make sure SHOULD_SOFT_RETRIEVE=False in settings.txt

settings.txt has several additional features. Read the comments to know how to use them.

OSX Python issues

On OSX, running the downloader from the Content Collector server may cause this error:

Output: output
objc[29889]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.

This is a problem with Python and OSX's security model clashing. See this issue for an explanation.

To work around it, you need to first run

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

…before running the Content Collector server in that same terminal.

Or add the bash profile suggested in this answer.

Issues

Feel free to create Issues on this repo if you need help. I'm friendly so don't be shy.