pixiv downloader 20150328

Change Log for 20150328:

  • Implement #69: Create .ugoira format, open with HoneyView
  • Implement #69: Add local .ugoira file checking.
  • Implement #70: call os._exit() with error code != 0 if got exception.
  • Fix Issue #71: Always overwrite files even if local file is larger, use backupoldfile=True to backup the old file!
  • Fix Issue #72: Update download by tags parser.
  • Fix Issue #69: Add option to delete zip file, set deleteZipFile = True.

Download link for pixiv downloader 20150328, source code in GitHub.

Donation Link on the side bar ==> 😀

59 thoughts on “pixiv downloader 20150328”

  1. Hello, I am sorry but I’m a noob and don’t know how to use this program. The readme didn’t help either.

    Where is the “tags.txt” ? I can’t find it anywhere. Do I have to create it myself?

  2. Hello, is there a way to download images with, as example, ten or more bookmarks?
    I tried to write “10” when I asked about bookmarks in item 3, yet it does not find any images.

      1. I already gone throught this issue, it seems it does not work ~7/10 when I use it normally, so I just have to retry few times. Also it never works if I try to combine tags with “OR”. If that matters, I used jpn symbols, and since I am not on jpn locale I used encoder/decoder from readme.txt. I already moved from my desktop, so I cannot submit any old logs, but if I get this issue again I will upload it.

    1. Nope, it is not possible as the folder structure might changed or moved.

      Use the list feature, either using list.txt (for member id) or tags.txt (for tag based query). See readme.txt for more details.

      1. And couldn’t you use that SQL database you have there? Just store IDs and then load them all.
        I wasn’t really studying the database, since I got rid of it as soon as possible, but adding another table into it would be easy.

        I kinda solved this thing myself. My bash script puts IDs into extra file and takes care of it.
        Update of my database (1145 IDs in total) took 2 hours and it was downloading loads of stuff.

        I know that I’m comparing incomplete manufactory with a hand crafter. I’m just trying to help >.<

        1. ya, the member id is loaded to DB using the text file (useful for the first time), and assuming he set processfromdb = True, then the actual list will be selected from DB.

          If you key in from console, the member id will not be saved to the DB (only the image id).

          This way I can separate the member that I want to update time to time (loaded from list.txt), and the member that I just want to download once (key in manually).

  3. Would it be possible to add a sort of interrupt protection? That is to say, a check to make sure it’s actually downloaded the full file? For some reason (probably to do with my crappy ISP) I get the occasional file that just stops downloading partway through. It’ll get to the ‘Start downloading… xxxxx of XXXXX bytes’ display, stall out before reaching the full filesize- something like ‘37000 of 50000’- and then just proceed on as if the file finished, leaving me with an incomplete image. It seems to be particularly prone to doing it with larger files, like the ugorias.

    Thanks for everything you do.

    1. If you know python a little and have some kind of a compiler for it (and eventually some IDE), then doing that personally or with someones guidance wouldn’t be a problem.
      But it’s strange though, the program downloads the file by 8192B and each time it does, the buffer with these data is written on disc, and it should wait for buffer to be filled up and also wait for the whole file to be downloaded.

      Funny thing is, that in the code this, the thing you want, is actually written in the code, but it’s inside of an exception, which means something like, if the program doesn’t notice any real errors, then it won’t check for this thing. You/author would actually have to delete/move elsewhere that “except:” and “raise” and it would actually say that it f*cked up lol.

        1. exception on line #522
          I meant to raise exception immediatelly and let it check if it was correctly downloaded.

          Basically put line #522 between #526 and #527. And eventually put there some additions to it (eg. to actually raise an exception when it was downloaded incorrectly or to check the file with os.path.getsize(filename) ).

      1. That’s the thing, though, it’s not even registering as something that needs retried. It just gets to ‘less than X of X bytes’ downloaded, stalls, skips right to the ‘Completed in xx seconds at xx k/Bs’ notice, and moves on to the next download. It’s not regarding it as an error at all.

  4. First of all, thanks for the awesome program.

    One thing though. When overwrite mode is on, does that mean it just overwrites no matter what and doesn’t check the filesize or anything?

    If I want it to simply only overwrite changed files, then I should set ‘alwayscheckfilesize = True’ and if I want it to backup the old files ‘backupoldfile = True’ too? And just leave overwrite mode off?

    Am I understanding this correctly?

  5. I noticed that some of the images I downloaded have been changed into manga, but my program has not updated or downloaded the new revision.
    Like I have xxx.jpg but when I check the illustid its now a manga like xxx_p0 and xxx_p1 etc.

    How would I have it check for a change like that?

    My settings are:
    timeout = 80
    uselist = True
    processfromdb = False
    overwrite = False
    tagsseparator = ,
    daylastupdated = 0
    rootdirectory = C:PixivDownloaded
    retry = 10
    retrywait = 15
    createdownloadlists = False
    downloadlistdirectory = .
    irfanviewpath = C:Program FilesIrfanView
    startirfanview = False
    startirfanslide = False
    alwayscheckfilesize = True

  6. Thanks so much for making this program! One question, how do I download bookmarked pictures from a certain tag? For example, I only want to download my bookmarks that I have tagged as ‘bismarck’

    1. Nice soft you made to be honest. Shame it’s not fast for large scale downloads.

      (a little story of mine, not really worth of reading, I just describe my process how I speeded everything up. You can just skip to the last 3 paragraphs where are the fruits of my suffering and a big thanks.)

      # First problem – I need to download pictures
      Found PixivDownloader. ’nuff said I guess…

      I tried running it multiple times (on Windows) at the same with different galleries to download, but that was kinda uncomfortable and slow.
      Getting the ids took about 5m, and download itself took a couple of hours.

      # And so here I had the second problem – slow selection of pictures / galleries / member ids
      Before I had to click through every picture, click on the profile of the author and get the ID from URL – annoying itself. And when you want to download about 50 galleries at once it’s pain.

      Well, and since I’m eager programmer, big anime lover, and I’m up for anything that is dirty, I decided to remake the whole process of downloading pictures much faster.
      (btw, my option named manual is on the last place)

      First I had to learn javascript to make extensions for web browsers. That failed because I realized that I need to be somehow logged in when getting html code from URL, and I’m aware that Jquery can’t do that or at least I didn’t want to bother with it.
      Then I discovered Imacros. And it was a success with it, although I had to go through some documentations to understand “its” code which was also something new to me.
      I created Script which filters out of the code IDs of profiles to which the pictures I opened belonge to, it also grabs IDs of profiles I get recommended, and puts it all into clipboard.

      So now getting the IDs took about 20s (with actually clicking on pictures and opening tabs with them). Much better than 5m, but the download was still the same.

      # Third problem – slow download
      I’ve got pretty much long delay when I’m connecting to pixiv.net itself, so I thought that somehow optimizing the process of connecting would be a pointless thing. Parallel download suddenly came up as an option.

      So I learnt Python without knowing anything about it. (I also moved to linux with the work)

      (you can skip this paragraph – I’m just stupid)
      Don’t even let me start what effort it took to run it from the source code actually -.-‘
      Learning the syntax, types of declarations, about imports took me about 1-2 hours, BUT! running it was worse. Obtaining the packages was easy, although installing them was a different story. My package manager went nuts when it realized what I was about to do (damaging package integrity or wtf, never seen it tbh). After fighting it a little I found a handy command easy_install… worked like magic. So the only thing left was running the code itself. My IDE, Pycharm, used python3 and I didn’t know/realize that, and I was really hopeless, but I still wasn’t giving up. (the whole process was taking a few weeks at this point, because school… you know). One day I was hopelessly trying running it from command line… python PixivUtil2.py… nope. Then I out of nowhere I thought that it could be written in python2.0 (honestly I wasn’t recognizing the difference between python3 and python2. The syntax should be the same, right? but no)
      python2 PixivUtil2.py
      “Heureka! it works!” (I nearly flipped table tho. A single “2” made it work… -.-‘)

      Anyway, everything was working and I could start fiddling with the code and testing it.
      I had to strip the entire code on the useful part only and redirect input on parameters (there are a couple of things about arguments/parameters in the code, but it doesn’t serve any purpose – as it lookes to me). I wasn’t studying the code too deeply, but there was still the thing where it seemed that multiple loggers were writting into the same log file and it was making errors. I just had to toy with it.

      At last, when everything was finished, I made a nice bash script which can handle the huge load of IDs and run the download parallely on background within one terminal.

      And the final results?

      Before:
      Much clicking, copying, pasting to get IDs, and then splitting the ID list manually and again complicated pasting what to download.
      Grabbing IDs: a couple of minutes
      Downloading galleries: a couple of hours

      Now:
      I click on as many pics I want. Within a single click I get all the IDs with IDs of recommended people. Then with typing a single command it downloads every gallery parallely.
      Grabbing IDs: almost instant
      Downloading galleries: a question of minutes (depends on the longest gallery size), but I can cook on my CPU when I’m dowloading… (800 bacground processes isn’t really for laptop with (4 core) i5 lol)

      Anyway, I gotta give great thanks for this soft. It really opened another gate for my hobby, which is collecting… anime stuff and all. c:

        1. Here’s a little update – something that actually is worth mentioning I guess

          So I managed to limit the active downloading streams, my bash script does the trick, and luckily it saves the IDs which were successfully downloaded and which failed to be downloaded to separated files (here I learnt something about putting whole piece of code as background process and not just putting there a process itself). I put a little switch into the script which filtrs out already downloaded galleries in instant, without actually launching the downloading stream.
          I also achieved something simmilar to your SQL database with a difference that it doesn’t interact with HDD at all (only when really needed). This was very convenient when there are 120k files and my poor linux is trying to search for each one specifically… between 15s and 0.07s really is difference lol.
          Today I even had to create a script which helps to view images faster (instantly, like before), because I really couldn’t stand the waiting time for each image to load in my image viewer (0.5s is still fine, but god, it is 5s now).

          Anyway, the only thing left is to actually forward the whole process when it gets image_id and then it has to check/download info about it.
          Maybe I might achieve that by checking the next picture whether downloaded or not. Manga pictures could mix up the process a little, but I don’t see much of a problem here. Also what about that thing… “ugoria” was it? Still got no clue what it actually means or where it’s stored lol.

          Btw, if you think that this kind of topic should be moved elsewhere, like github, I’m fine with that. Not that I’m planning on spamming my fails here, I’m just open to tell you what crazy stuff I’m doing with your soft, lol.

          Cheers

          1. I was thinking about posting some of my things on githut once it’s working nice and fast as I wanted at first.
            But there are few things to be stated.

            – I focused my modifications on downloading whole galleries by member_id.
            So the way I launch one stream/the program is like “python2 PixivUtil2.py 123456” or the way I do it is “pixivgrabber” and everything in clipboard is suddenly sent to streams to handle it.

            – I replaced some of the important things (eg that database which gets locked up once there are more processes to fiddle with it) which are actually spread in the entire code, and I doubt that the author would be dying in pleasure as he erases his own code and replaces with mine. lol

            – I used shell on linux to make it run parallely, which also has the control over the processes in its possesion now. (nothing complex really)
            Yet there’s this thing where your request stack for communication gets filled up and unless the requests get responses no other streams may make a request for info/data. Nothing to be worried too much tho, it can be caused by my slow connection. (I personally studied networking just a little, so nothing much to the detail, but the thing, what I’m going on, is that communication is a serial thing.)
            If you are not so crazy like me, who likes to stress test his PC until flames show up, then don’t mind the last lines at all.

            I’d be freakin happy if I could combine powers of torrent clients and this soft. That would be like… the ultimate downloader for pixiv, lol. I’m getting closer haha.

            If there’s anybody interested in this I could share it somewhere or put one big enhacement request on github. But it may take time… trying to deal with my university stuff now.
            That imacros script comes in handy when you want to get IDs and you are lazy to copypaste it from URLs.

            Sorry that I write so long essays compared to the other comments. >.<

            Cheers

            PS: derpy me, can't even calculate right. It doesn't check for file 0.07s, that would be way tooo long, it is actually 1.2e-05s ~ 0.000012s

          2. > code removal

            as long it doesn’t interfere with other function, then it should be ok if give performance boost (and I can learn new stuff).

            > linux

            hopefully it is compatible with windows, as that one is my only dev pc.

            > ugoira

            it is pixiv format for animation, basically zipped images with timing information in the json file inside. The only viewer can support is only HoneyView as far as I know.

            > github

            yap, please use github and create pull request for the enhancement 😀

          3. I posted it as issue on github, since changes are too big and my experience with github is bad. I also improved the checking, so now… I think it could match the function of SQL database with a couple of bugs, but these are not damaging files, just speed issues (second here, second there… everything matters).
            Basically, it downloads whole image list first, filters out downloaded illustrations, then with a little bit of thinking it filters out mangas and what’s left is downloaded. The last manga is checked always, coz the filtering is based on success from previous downloads. Anyway, if it misses then just some check of manga won’t ruin anything.

            I’ll repeat myself.
            There are a lot of changes, and some are limiting options for users. It is just for linux (the scripts for launching it parallely). And yet again, it was modifyed for large scale download, not for better perfomance only.

            So ye, I doubt there will be anything useful for you to implement from my version, but you can take a look what I’ve done with it. Source code is linked there.

  7. Thanks so much for making this program! One question, how do I download bookmarked pictures from a certain tag? For example, I only want to download my bookmarks that I have tagged as ‘bismarck’

  8. Dear Nanaka-san, I have a question to make, I wish to download all my pixiv boomarks into one single folder, but when I use your downloader the program make many folders for each one of the bookmarks, around 500 folders, can you please tell me how I must configure the downloader to make a single folder with all the bookmarks I have, public and private? Thanks in advice.

    1. change the filenameformat and filenamemangaformat in config.ini, remove the so it doesn’t create subfolder.
      see readme.txt for more details

  9. Thank you so much for this program; it makes downloading all the images I’d bookmarked that much easier 🙂

    Is there a way to download images I’ve bookmarked based on their tag? For example, I put a “DL” tag on some images to remind myself to download them later, and I’d like to be able to download only those bookmarked images with that tag. Is there a combination of options that will let me do that? Thanks again!

  10. can i download ugoria files only? yes, then how? tags dosn’t work for it 🙁
    need all animation files only

  11. Hi, I had a question, if I want to download only R-18 items when choosing a tag, what do I do?

    I enter 3, then I put the ‘tag’, then a space, and then ‘R-18’, but it fails. I thought leaving a space is what you do to process multiple tags.

    Also I wouldn’t for example want ALL items with an R-18 tag (There would be tonnes) but only R-18 items that also come under the heading of the tag I’ve put in.

    What do I do?

    Excellent program by the way.

    1. What you are doing should be correct.
      1. Make sure the R-18 is enabled from the pixiv website
      2. Remove the cookie value from the config.ini
      3. Set r18mode = True, this will show only the R-18.
      4. and relogin.

      If still failed, give me the generated url by the application. You can check from the log file (find line like Looping… for http://www.pixiv.net/search.php?)

    1. UgoIra = Ugoki Irasuta = Moving Illustration = Animation

      Pixiv is implementing animation using their own format (zip + json),

      You can use the latest HoneyView application to see the animation (the json file need to be inserted to the zip as animation.json, which done automatically in PixivDownloader)

      1. Man, as soon as you find a favorite image viewer it gets deprecated 😛
        Thanks for the info, I guess I’ll try HoneyView.
        So if I set deleteZipFile = True in the settings, files will be saved to .uogira instead of .zip?

          1. Great, works perfectly.
            An unrelated question: is it possible to download a list of pictures instead of users? I have like 400 assorted pictures I’d like to download, it gets tiresome getting then 1 at a time 😛

  12. The ugoira is a good addition. Two questions though:
    1: What’s the best way to convert old ugoira?
    2: Can Honeyview replace Irfanview as downlist list viewer?

Comments are closed.