Change log:
- Add feature for download member’s bookmarked images.
- Add new filename format for member’s bookmark mode.
- %bookmark% ==> for bookmark mode, add ‘Bookmarks’ string.
- %original_member_id% ==> for bookmark mode, put original member id.
- %original_member_token% ==> for bookmark mode, put original member token.
- %original_artist% ==> for bookmark mode, put original artist name.
Download link for pixiv downloader 20121108, source code in GitHub.
On other note, this blog reach 150k+ views 😀
Thanks for the great work >__< I'll be appreciate it ^^
When running the client with -n 1 or numberofpage = 1 it seems to check the first two pages instead of just the first page.
Can you give me the full command you are using?
PixivUtil2.exe -n 1
Then from the menu I select 4 (download from list)
I use list.txt to add artists that I follow and first download their gallery.
Every week I run list download with -n 1 to get the recent updates.
Checking the 2nd page slows down the update process since it isn’t needed.
I’m guessing it’s as simple as having the loop that cycles through the pages stop one page sooner.
(Like < instead of <=)
Shouldn't affect it, but just in case here is my config file. I'm using a proxy and a custom naming format.
[Settings]
proxyaddress = 127.0.0.1:8123
useproxy = True
useragent = Mozilla/5.0 (X11; U; Unix i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1
debughttp = False
userobots = False
filenameformat = %artist% (%member_id%)%urlFilename% – %title%
filenamemangaformat = %artist% (%member_id%)%image_id% – %title%%urlFilename%
timeout = 60
uselist = True
processfromdb = True
overwrite = False
tagsseparator = ,
daylastupdated = 7
rootdirectory = .
retry = 3
retrywait = 5
createdownloadlists = False
downloadlistdirectory = .
irfanviewpath = C:Program FilesIrfanView
startirfanview = False
startirfanslide = False
alwayscheckfilesize = False
checkupdatedlimit = 0
downloadavatar = True
createmangadir = False
usetagsasdir = False
useblacklisttags = False
usesuppresstags = False
tagslimit = -1
writeimageinfo = False
[Pixiv]
numberofpage = 0
[Authentication]
username = xxxxx
password = xxxxx
cookie = xxxxx
usessl = False
You can use
checkUpdatedLimit
to skip after n number of images already downloaded. Try to set = 19 (1 page)That does exactly what I need. I’ll just use that instead of the page option since it is a bit smarter. Thanks.
I try to download image 22073711 and I get:
Processing Image Id: 22073711
Image ID (22073711): ‘An error occurred!’
Any error html generated? Can you give more details? Can you retry again?
There is no “Error medium page for image ######.html” and no other message. The log looks like this:
2012-11-20 04:21:04,108 – PixivUtil20121108 – INFO – Image id mode.
2012-11-20 04:21:11,296 – PixivUtil20121108 – INFO – Image ID (22073711): ‘An error occurred!’
what can I try?
Found out the cause, I’ll fix it in weekend 😀
cool, thanks!
I’ve got some error message in linux using proxy (not only this image_id), did some search on google, but can’t figure out why…( link: http://bytes.com/topic/python/answers/31490-help-w-htmlparser-lib )
And these are the versions of the software:
python: Python 2.6.6
mechanize: 1.64-1
beautifulsoup: 3.1.0.1-2
Error message:
sh-4.1$ ./PixivUtil2.py -s 2 30000001
PixivDownloader2 version 20121108
https://nandaka.wordpress.com/tag/pixiv-downloader/
Reading /…/PixivUtil2-master/config.ini …
done.
Using proxy: 127.0.0.1:4321
Creating database… done.
Only process member where day last updated >= 7
Using Username: …
logging in with saved cookie
Trying to log with saved cookie
done.
Processing Image Id: 30000001
Traceback (most recent call last):
File “./PixivUtil2.py”, line 545, in processImage
parseMediumPage = BeautifulSoup(mediumPage.read())
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1230, in __init__
self._feed(isHTML=isHTML)
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1263, in _feed
self.builder.feed(markup)
File “/usr/lib/python2.6/HTMLParser.py”, line 108, in feed
self.goahead(0)
File “/usr/lib/python2.6/HTMLParser.py”, line 150, in goahead
k = self.parse_endtag(i)
File “/usr/lib/python2.6/HTMLParser.py”, line 317, in parse_endtag
self.error(“bad end tag: %r” % (rawdata[i:j],))
File “/usr/lib/python2.6/HTMLParser.py”, line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u””, at line 78, column 114
Error at processImage(): (, HTMLParseError(), )
Dumping html to: Error Medium Page for image 30000001.html
Traceback (most recent call last):
File “./PixivUtil2.py”, line 1433, in main
menuDownloadByImageId(mode, opisvalid, args)
File “./PixivUtil2.py”, line 1106, in menuDownloadByImageId
processImage(mode, None, int(image_id))
File “./PixivUtil2.py”, line 545, in processImage
parseMediumPage = BeautifulSoup(mediumPage.read())
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1230, in __init__
self._feed(isHTML=isHTML)
File “/usr/lib/pymodules/python2.6/BeautifulSoup.py”, line 1263, in _feed
self.builder.feed(markup)
File “/usr/lib/python2.6/HTMLParser.py”, line 108, in feed
self.goahead(0)
File “/usr/lib/python2.6/HTMLParser.py”, line 150, in goahead
k = self.parse_endtag(i)
File “/usr/lib/python2.6/HTMLParser.py”, line 317, in parse_endtag
self.error(“bad end tag: %r” % (rawdata[i:j],))
File “/usr/lib/python2.6/HTMLParser.py”, line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u””, at line 78, column 114
The error html file generated seems fine when opened by my browser.
Thanks :p
Weird, I tried to run the command but it ran successfully.
Can you try again? Most likely it caused by the proxy.
It maybe the proxy… but I also wonder what the error means…
It said “HTMLParseError: bad end tag: u””, at line 78, column 114″, so i go to the html file get dumped, found this string:
window.jQuery || document.write(”);
Why would this possibly cause trouble ? this ‘
Thanks 🙂
Don’t know why some string won’t show up in wordpress, here’s the missing string in pastebin:
http://pastebin.com/raw.php?i=yzaAz7BX
Can you upload the whole html to mediafire? I have checked the original html from pixiv, and they also have those string inside the html. I can parse it just fine.
Can you try to run the application without using proxy?
EDIT: Just notice, you running from script, not compiled application. Can you update your mechanize version? See the readme for the recommended version:
– Running from source code:
– Python 2.7.2++
– mechanize 0.2.5
– BeautifulSoup 3.2.0
The html files:
http://www.mediafire.com/?y1tf6674uoh7x9d
And sadly I can’t connect to pixiv directly( probably they banned my IP address ).
Yeah…I’ll try update my software as well. XD
Question! Can you help me out with an error I am getting with this release?
I am trying to dl pictures using the first option (by pixiv user ID) and my output ends up like this:
—
Input: 1
Member id: 猫兎
Start Page (default=1):
End Page (default=0, 0 for no limit):
Processing Member Id: 猫兎
Reading C:Python27pixiv_utilityconfig.ini …
done.
Page 1
http://www.pixiv.net/member_illust.php?id=猫兎&p=1
‘NoneType’ object has no attribute ‘ul’
1 2 3 4
http://www.pixiv.net/member_illust.php?id=猫兎&p=1
‘NoneType’ object has no attribute ‘ul’
1 2 3 4
http://www.pixiv.net/member_illust.php?id=猫兎&p=1
‘NoneType’ object has no attribute ‘ul’
1 2 3 4
http://www.pixiv.net/member_illust.php?id=猫兎&p=1
‘NoneType’ object has no attribute ‘ul’
1
—
No .html error page or similar was generated in the directory, and no picture was downloaded either. Something else you might need to help me with this?
Enter the ID, which is the numeric part from the url, not the artist name.
For example:
http://www.pixiv.net/member.php?id=27517
==>27517
Gah! Nevermind, I googled a solution you gave someone else who had the same problem. I used the member name instead of the numbers in the url, so nevermind my query!
Btw, now it works like a charm!
But thanks regardless! 😀
awesome app you have here. you have given me an excuse to start collecting artwork again. <3
Does this program report to you my password.. ?
Nope, you can check the source code in GitHub :D.
As long you download the application from this site or my GitHub, then your password is save with you.
With both the previous version and this one, I’m getting this error when I launch the program:
—
2012-11-08 14:19:15,046 – PixivUtil20121108 – INFO – Starting…
2012-11-08 14:19:15,078 – PixivUtil20121108 – INFO – Only process member where day last updated >= 7
2012-11-08 14:19:15,092 – PixivUtil20121108 – INFO – Using Username: cpgendo
2012-11-08 14:19:15,108 – PixivUtil20121108 – INFO – Log in using secure form.
2012-11-08 14:19:47,078 – PixivUtil20121108 – ERROR – Error at pixivLoginSSL(): (, <httperror_seek_wrapper (urllib2.HTTPError instance) at 0xe67ca8 whose wrapped object = <closeable_response at 0xe7b260 whose fp = <response_seek_wrapper at 0xe79ad0 whose wrapped object = <closeable_response at 0xe7b3c8 whose fp = >>>>, )
Traceback (most recent call last):
File “PixivUtil2.py”, line 303, in pixivLoginSSL
File “mechanize_mechanize.pyc”, line 203, in open
File “mechanize_mechanize.pyc”, line 255, in _mech_open
httperror_seek_wrapper: HTTP Error 504: Gateway Time-out
2012-11-08 14:19:47,078 – PixivUtil20121108 – ERROR – Unknown Error: HTTP Error 504: Gateway Time-out
Traceback (most recent call last):
File “PixivUtil2.py”, line 1413, in main
File “PixivUtil2.py”, line 303, in pixivLoginSSL
File “mechanize_mechanize.pyc”, line 203, in open
File “mechanize_mechanize.pyc”, line 255, in _mech_open
httperror_seek_wrapper: HTTP Error 504: Gateway Time-out
2012-11-08 14:19:52,062 – PixivUtil20121108 – INFO – EXIT
—
Since all worked fine until a couple of hours ago, it seems like a problem that has just popped in. The only way to solve it, right now, is to set “usessl = False”.
Looks like your ISP got problem with pixiv server. I’m currently running the app with useSSL = True.
Try to use proxy to connect pixiv or use different isp/pc or check your date/time settings.
Python’s SSL is quite sensitive with the pc date/time.
The catch is, it happened on both my PCs, each of which as an Internet Key of a different ISP! But I will verify if that keeps happening (consider that some time ago, for a day, I was unable to use the downloader on the other PC for mysterious reasons – the day after, it was all back to normal).
Is there a site where I can search for good proxies?
Google 😀 or try to use Tor-Vidalia bundle.