pixiv downloader 20120210

Change log:

  • Fix no limit for end page as reported by Troid.
  • Fix parseTags method as reported by gsilver.
  • Add start/end date for download by tag as requested by UnemployedIncubator.
  • Fix member bookmarks parsing as reported by Anonymous.
  • Fix new illustration download if run from console as reported by ToadadnoChikara.

Download link here, source code here or in my github. Donate linkΒ  on the sidebar πŸ™‚

26 thoughts on “pixiv downloader 20120210”

  1. Yeah that’s the date I’m looking for.

    I was talking about having the date tagged in the file (EXIF in jpeg) but don’t worry about it if you’re not familiar with that. Would love to have the %date% from the works_Data div.

  2. Would it be possible to add a %date% variable? I would like to be able to include the date updated (since not all artists date their images in the metadata).

    Also, would it be possible to be able add a way to add the date into the metatag? i.e. place all of the %date% into ‘date created’ or wherever?

    1. Would it be possible to add a %date% variable?

      for the filenameFormat? I can get the value from works_data div if this is that you want (the date portion on the left top side, where the image resolution and the tool is listed).

      Also, would it be possible to be able add a way to add the date into the metatag?

      which metatag? I don’t quite understand.

  3. Issues I encountered, none of which are really your responsibility to fix. Hopefully this comment is of interest to future users of the source code who have trouble getting it working:

    In some setups the application will be set to place all downloaded files in the current directory. In such scenarios, mkdir will be passed the empty string and will throw an exception.

    Not using os.path.split (or taken the lazy and slightly less reliable route of using os.sep) means it won’t work well on non-Windows machines. A global find/replace of \ with / in PixivUtil2, as well as removing / from __badchars__ in pixivhelper is enough for a quick and dirty patch.

    I needed to add “req._tunnel_host = None” below every call to “req = urllib2.Request(url)” to prevent attribute exceptions. This is a compatibility fix for issues using some versions of Mechanize on some versions of Python.

    1. In some setups the application will be set to place all downloaded files in the current directory. In such scenarios, mkdir will be passed the empty string and will throw an exception.

      Can I get your OS/setup details, I assume you are using linux? Probably I will add a check if os != winnt for the __badchars__

      I needed to add β€œreq._tunnel_host = None” below every call to β€œreq = urllib2.Request(url)” to prevent attribute exceptions. This is a compatibility fix for issues using some versions of Mechanize on some versions of Python.

      Can I get the detail for this one, maybe a link?

      1. OS/Setup: OSX 10.6.8, Python 2.6.1.

        The current directory exception is probably not OS-specific. That will happens if you use a filenameformat that contains no path separators and use “.” as your rootdirectory, as the downloader will see the directory is “.”, which is not a directory that it makes sense to call mkdir on. You might be making things worse by sanitizing “.”, thus passing an empty string to mkdir.

        Don’t check if os is winnt to fix __badchars__. Just check the value of os.sep. If os.sep is equal to /, then __badchars__ should not remove / from paths. For joining and combining paths, the Python documentation recommends os.path.split/os.path.join over using os.sep. That said, switching from a hard-coded separator to os.sep would be a 99% solution. On Windows, os.sep is ‘\’, so from the perspective of windows users this is not a meaningful code change.

        It would be nice if the application noticed if os.sep and filenameformat didn’t match, but that’s only for the sake of fixing paths in the case that a user completely ignores the config file.

        I have no idea where the documentation is for for the reg._tunnel_host fix I mentioned. Googling the error was not very informative. I added that line because I was getting attribute exceptions and None was the only sensible default value (and any other value caused a new exception, when I tested it). My best guess is that some versions of urllib2 don’t have a _tunnel_host attribute and some versions of Mechanize check the value of that attribute. That’s probably an idiosyncrasy of my setup rather than something you should change in your application.

      2. As an aside, Windows does allow the use of ‘/’ as a path delimiter, so a really lazy way to fix OS compatibility (at least among Windows and *nix systems) would be do a find/replace of \ with /. I was thinking my initial comment about “global find/replace of \ with / in PixivUtil2” would break PixivUtil2 on Windows, but I don’t believe that is actually the case.

      3. Sure I’ll check. Thank you for being responsive; I didn’t know you cared about platform independence πŸ™‚

        The only oddity in my own usage is that folder.jpg is now ending up in a subfolder instead of being at the top level of the artist’s directory. You should use os.sep in the line of PixivUtil2.py:
        filenameFormat = filenameFormat.split(‘\’)[0]

        Your GitHub update seems to have fixed every other issue I encountered. I’m a little suspicious that there’s still an issue with list.txt, but I didn’t hit that part of the code during use so I’m not sure. Maybe just replace all your ‘\’ in PixivUtil2.py with os.sep to be safe.

        If this were a professional project I’d recommend taking the hit and globally replacing all your string-based path manipulation with calls to functions in the os.path module. However, for hobby code it’s probably not worth the effort.

    2. Note that if you do switch to using os.sep in “filenameFormat.split(β€˜\’)[0]”, you’ll also need to call sanitize on filenameFormat, assuming you want the default filenameFormat not to have the same issue I am encountering with my copy. Maybe just do that replacement in your loadConfig function. :::shrug:::

      1. Updated in github and you can check it πŸ™‚

        It is being called after splitting the filenameFormat, but the config must be correct (e.g.: for windows must be ”, *nix is ‘/’).

        Anyway this is only a hobby project and I still learning more Python. Any comment/fix/update/pointer is gladly accepted πŸ™‚

      2. It’s working successfully on my machine without me making any modifications.

        Ah, I didn’t notice that config.ini is not included in the release. Assuming a correct config is fine in that case.

  4. Sorry for the late reply. I checked again and it seems it works fine, as you said, I was inputting the page numbers and dates in reverse. Regarding the β€˜No more images’ message, I did what you asked and it appeared again, but maybe that’s just me being retarded again.

  5. This version seems to have a problem reading lists. I have it set to download lists and it keeps giving me the same error: 2012-02-18 15:27:50,720 – PixivUtil20120210 – ERROR – Unknown Error: ‘decimal’ codec can’t encode character u’u3046′ in position 0: invalid decimal Unicode string.

    I have the lists correctly typed out as said in the readme:

    #うみょんげ
    2761932

    The problem is that I think Pixiv mistakenly thinks that the username is a User ID. Is there anyway to fix this? (I can always change my list, but it’s VERY long.)

    1. Sorry, I’ve managed to find a solution: all I had to to was begin my list.txt with a number. (Any User ID would do.) PixivUtil2.exe treats it like an ordinary list and I don’t get the same error.

  6. Hi there.

    I can’t download some galleries from pixiv (like this member: 14157 ). After 3rd image it says that there is an Windows Error (something’s wrong with the path or the filename).

  7. Sorry to bother again, but it seems it doesn’t go after page 30. I tried with and without wildcard and using different date ranges and tags, but it stops and says there’re no more images.

      1. 火薬少ε₯³γ‚γ‘γΏβ˜†γ»γ‚€γ‚‰ Stops at page 30, actual page count is 48. Dates were from 2012-02-12 to 2009-01-01 and 2012-02-12 to 2009-10-01, tested with and without wildcard.

        ほむほむ Started search at page 30, looped it and then said “No more images. Done”, actual page count is 223. Dates were from 2012-02-12 to 2000-01-01, without wildcard.

        γ‚­γƒ₯をピース (this one has a lot of porn) Stopped at page 30, actual page count was 38, with and without wildcard. Dates from 2012-02-12 to 2011-10-01 and 2012-02-12 to 2000-01-01. I tested this one with your pixiv downloader 20111102 and worked just fine.

        And then I tried γƒ—γƒͺγ‚­γƒ₯γ‚’, started at page 35, actual page count is 1311, no wildcard, dates from 2012-02-13 to 2000-01-01 and it worked, so maybe I’m not using it correclty.

        1. I assume you are using no 3 (Download by tags). First, you put the start and the end page in reverse.

          >> 火薬少ε₯³γ‚γ‘γΏβ˜†γ»γ‚€γ‚‰ Stops at page 30, actual page count is 48. Dates were from 2012-02-12 to 2009-01-01 and 2012-02-12 to 2009-10-01, tested with and without wildcard.
          if you only put the search word (火薬少ε₯³γ‚γ‘γΏβ˜†γ»γ‚€γ‚‰), anyhow for the 2009-01-01 to 2012-02-12 only yield 2 page, same with the other one. My input are the search word(%E7%81%AB%E8%96%AC%E5%B0%91%E5%A5%B3%E3%81%82%E3%81%91%E3%81%BF%E2%98%86%E3%81%BB%E3%82%80%E3%82%89) and use wildcard (y), the other input are empty (just enter) and it can loop until the page 48.

          >>ほむほむ Started search at page 30, looped it and then said β€œNo more images. Done”, actual page count is 223. Dates were from 2012-02-12 to 2000-01-01, without wildcard.
          with wildcard: result is 264 page, without wildcard: 223, parsed correctly and both without using the date. If using date, it only yield 10 and 8 page, and those also parsed OK.

          Second, technically it shouldn’t show the ‘No more images’ message, because it will parse and check if the current page is the last one (no next button ‘>’) and should have an image. Can you try to extract the application in different folder (fresh config & DB) and run it again?

  8. Thank you so much for this, but I have a couple of questions:
    1.- What is the wildcard?
    2.- When downloading by tags it asks for a date, does it refers to the dates when the pictures were uploaded?

    1. You’re welcome.
      1. None, actually it is refer to different search method, one is using search.php (wildcard) and the other are using tags.php (http://www.pixiv.net/help.php#3-3), maybe need better name.
      2. Yes, if you enter the start/end date, it will tell pixiv to search for images that uploaded in that range.

Comments are closed.