pixiv downloader 20120210

Change log:

Fix no limit for end page as reported by Troid.
Fix parseTags method as reported by gsilver.
Add start/end date for download by tag as requested by UnemployedIncubator.
Fix member bookmarks parsing as reported by Anonymous.
Fix new illustration download if run from console as reported by ToadadnoChikara.

Download link here, source code here or in my github. Donate link on the sidebar 🙂

26 thoughts on “pixiv downloader 20120210”

Yanoflies says:

February 28, 2012 at 16:29

Yeah that’s the date I’m looking for.

I was talking about having the date tagged in the file (EXIF in jpeg) but don’t worry about it if you’re not familiar with that. Would love to have the %date% from the works_Data div.
1. Nandaka says:
  
  February 28, 2012 at 16:34
  
  updated on 20120224. EXIF access need another library for python 🙂
Yanoflies says:

February 21, 2012 at 14:09

Would it be possible to add a %date% variable? I would like to be able to include the date updated (since not all artists date their images in the metadata).

Also, would it be possible to be able add a way to add the date into the metatag? i.e. place all of the %date% into ‘date created’ or wherever?
1. nandaka says:
  
  February 21, 2012 at 14:18
  
  Would it be possible to add a %date% variable?
  
  for the filenameFormat? I can get the value from works_data div if this is that you want (the date portion on the left top side, where the image resolution and the tool is listed).
  
  Also, would it be possible to be able add a way to add the date into the metatag?
  
  which metatag? I don’t quite understand.
Joe says:

February 19, 2012 at 20:19

Issues I encountered, none of which are really your responsibility to fix. Hopefully this comment is of interest to future users of the source code who have trouble getting it working:

In some setups the application will be set to place all downloaded files in the current directory. In such scenarios, mkdir will be passed the empty string and will throw an exception.

Not using os.path.split (or taken the lazy and slightly less reliable route of using os.sep) means it won’t work well on non-Windows machines. A global find/replace of \ with / in PixivUtil2, as well as removing / from __badchars__ in pixivhelper is enough for a quick and dirty patch.

I needed to add “req._tunnel_host = None” below every call to “req = urllib2.Request(url)” to prevent attribute exceptions. This is a compatibility fix for issues using some versions of Mechanize on some versions of Python.
1. nandaka says:
  
  February 19, 2012 at 20:48
  
  In some setups the application will be set to place all downloaded files in the current directory. In such scenarios, mkdir will be passed the empty string and will throw an exception.
  
  Can I get your OS/setup details, I assume you are using linux? Probably I will add a check if os != winnt for the __badchars__
  
  I needed to add “req._tunnel_host = None” below every call to “req = urllib2.Request(url)” to prevent attribute exceptions. This is a compatibility fix for issues using some versions of Mechanize on some versions of Python.
  
  Can I get the detail for this one, maybe a link?
  1. Joe says:
    
    February 20, 2012 at 07:24
    
    OS/Setup: OSX 10.6.8, Python 2.6.1.
    
    The current directory exception is probably not OS-specific. That will happens if you use a filenameformat that contains no path separators and use “.” as your rootdirectory, as the downloader will see the directory is “.”, which is not a directory that it makes sense to call mkdir on. You might be making things worse by sanitizing “.”, thus passing an empty string to mkdir.
    
    Don’t check if os is winnt to fix __badchars__. Just check the value of os.sep. If os.sep is equal to /, then __badchars__ should not remove / from paths. For joining and combining paths, the Python documentation recommends os.path.split/os.path.join over using os.sep. That said, switching from a hard-coded separator to os.sep would be a 99% solution. On Windows, os.sep is ‘\’, so from the perspective of windows users this is not a meaningful code change.
    
    It would be nice if the application noticed if os.sep and filenameformat didn’t match, but that’s only for the sake of fixing paths in the case that a user completely ignores the config file.
    
    I have no idea where the documentation is for for the reg._tunnel_host fix I mentioned. Googling the error was not very informative. I added that line because I was getting attribute exceptions and None was the only sensible default value (and any other value caused a new exception, when I tested it). My best guess is that some versions of urllib2 don’t have a _tunnel_host attribute and some versions of Mechanize check the value of that attribute. That’s probably an idiosyncrasy of my setup rather than something you should change in your application.
  2. Joe says:
    
    February 20, 2012 at 07:43
    
    As an aside, Windows does allow the use of ‘/’ as a path delimiter, so a really lazy way to fix OS compatibility (at least among Windows and *nix systems) would be do a find/replace of \ with /. I was thinking my initial comment about “global find/replace of \ with / in PixivUtil2” would break PixivUtil2 on Windows, but I don’t believe that is actually the case.
    1. nandaka says:
      
      February 20, 2012 at 10:58
      
      updated in my Github, can you check it?
  3. Joe says:
    
    February 20, 2012 at 12:16
    
    Sure I’ll check. Thank you for being responsive; I didn’t know you cared about platform independence 🙂
    
    The only oddity in my own usage is that folder.jpg is now ending up in a subfolder instead of being at the top level of the artist’s directory. You should use os.sep in the line of PixivUtil2.py:
    filenameFormat = filenameFormat.split(‘\’)[0]
    
    Your GitHub update seems to have fixed every other issue I encountered. I’m a little suspicious that there’s still an issue with list.txt, but I didn’t hit that part of the code during use so I’m not sure. Maybe just replace all your ‘\’ in PixivUtil2.py with os.sep to be safe.
    
    If this were a professional project I’d recommend taking the hit and globally replacing all your string-based path manipulation with calls to functions in the os.path module. However, for hobby code it’s probably not worth the effort.
2. Joe says:
  
  February 20, 2012 at 12:21
  
  Note that if you do switch to using os.sep in “filenameFormat.split(‘\’)[0]”, you’ll also need to call sanitize on filenameFormat, assuming you want the default filenameFormat not to have the same issue I am encountering with my copy. Maybe just do that replacement in your loadConfig function. :::shrug:::
  1. nandaka says:
    
    February 20, 2012 at 14:01
    
    Updated in github and you can check it 🙂
    
    It is being called after splitting the filenameFormat, but the config must be correct (e.g.: for windows must be ”, *nix is ‘/’).
    
    Anyway this is only a hobby project and I still learning more Python. Any comment/fix/update/pointer is gladly accepted 🙂
  2. Joe says:
    
    February 21, 2012 at 01:21
    
    It’s working successfully on my machine without me making any modifications.
    
    Ah, I didn’t notice that config.ini is not included in the release. Assuming a correct config is fine in that case.
Ikarum says:

February 19, 2012 at 11:06

Sorry for the late reply. I checked again and it seems it works fine, as you said, I was inputting the page numbers and dates in reverse. Regarding the ‘No more images’ message, I did what you asked and it appeared again, but maybe that’s just me being retarded again.
Bob says:

February 18, 2012 at 15:30

This version seems to have a problem reading lists. I have it set to download lists and it keeps giving me the same error: 2012-02-18 15:27:50,720 – PixivUtil20120210 – ERROR – Unknown Error: ‘decimal’ codec can’t encode character u’u3046′ in position 0: invalid decimal Unicode string.

I have the lists correctly typed out as said in the readme:

#うみょんげ
2761932

The problem is that I think Pixiv mistakenly thinks that the username is a User ID. Is there anyway to fix this? (I can always change my list, but it’s VERY long.)
1. Bob says:
  
  February 18, 2012 at 18:32
  
  Sorry, I’ve managed to find a solution: all I had to to was begin my list.txt with a number. (Any User ID would do.) PixivUtil2.exe treats it like an ordinary list and I don’t get the same error.
asm_demon says:

February 15, 2012 at 10:22

Hi there.

I can’t download some galleries from pixiv (like this member: 14157 ). After 3rd image it says that there is an Windows Error (something’s wrong with the path or the filename).
1. nandaka says:
  
  February 15, 2012 at 10:55
  
  I will check it.
Ikarum says:

February 12, 2012 at 12:47

Sorry to bother again, but it seems it doesn’t go after page 30. I tried with and without wildcard and using different date ranges and tags, but it stops and says there’re no more images.
1. nandaka says:
  
  February 12, 2012 at 20:10
  
  what is the actual search page count? Can you give me the queries/keyword for me to check?
  1. Ikarum says:
    
    February 13, 2012 at 04:00
    
    火薬少女あけみ☆ほむら Stops at page 30, actual page count is 48. Dates were from 2012-02-12 to 2009-01-01 and 2012-02-12 to 2009-10-01, tested with and without wildcard.
    
    ほむほむ Started search at page 30, looped it and then said “No more images. Done”, actual page count is 223. Dates were from 2012-02-12 to 2000-01-01, without wildcard.
    
    キュアピース (this one has a lot of porn) Stopped at page 30, actual page count was 38, with and without wildcard. Dates from 2012-02-12 to 2011-10-01 and 2012-02-12 to 2000-01-01. I tested this one with your pixiv downloader 20111102 and worked just fine.
    
    And then I tried プリキュア, started at page 35, actual page count is 1311, no wildcard, dates from 2012-02-13 to 2000-01-01 and it worked, so maybe I’m not using it correclty.
    1. nandaka says:
      
      February 13, 2012 at 10:00
      
      I assume you are using no 3 (Download by tags). First, you put the start and the end page in reverse.
      
      >> 火薬少女あけみ☆ほむら Stops at page 30, actual page count is 48. Dates were from 2012-02-12 to 2009-01-01 and 2012-02-12 to 2009-10-01, tested with and without wildcard.
      if you only put the search word (火薬少女あけみ☆ほむら), anyhow for the 2009-01-01 to 2012-02-12 only yield 2 page, same with the other one. My input are the search word(%E7%81%AB%E8%96%AC%E5%B0%91%E5%A5%B3%E3%81%82%E3%81%91%E3%81%BF%E2%98%86%E3%81%BB%E3%82%80%E3%82%89) and use wildcard (y), the other input are empty (just enter) and it can loop until the page 48.
      
      >>ほむほむ Started search at page 30, looped it and then said “No more images. Done”, actual page count is 223. Dates were from 2012-02-12 to 2000-01-01, without wildcard.
      with wildcard: result is 264 page, without wildcard: 223, parsed correctly and both without using the date. If using date, it only yield 10 and 8 page, and those also parsed OK.
      
      Second, technically it shouldn’t show the ‘No more images’ message, because it will parse and check if the current page is the last one (no next button ‘>’) and should have an image. Can you try to extract the application in different folder (fresh config & DB) and run it again?
Ikarum says:

February 12, 2012 at 06:08

Thank you so much for this, but I have a couple of questions:
1.- What is the wildcard?
2.- When downloading by tags it asks for a date, does it refers to the dates when the pictures were uploaded?
1. nandaka says:
  
  February 12, 2012 at 08:08
  
  You’re welcome.
  1. None, actually it is refer to different search method, one is using search.php (wildcard) and the other are using tags.php (http://www.pixiv.net/help.php#3-3), maybe need better name.
  2. Yes, if you enter the start/end date, it will tell pixiv to search for images that uploaded in that range.
  1. Ikarum says:
    
    February 12, 2012 at 10:22
    
    Alright, I see. Thanks again.
nhorus says:

February 11, 2012 at 18:05

Thank you.

Comments are closed.