• Happy New Year 2026! 🎉🌟 May this year bring new desires, deeper bonds, and unforgettable moments. If destiny hasn’t brought love to your door just yet, we've got you covered all year long.
  • Malware detected from member's upload: We have received a credible report potential CryptoMiner from jekson5865's upload. Please check if you had downloaded from this member. Full details here. マルウェアがメンバーのアップロードから検出されました: @jekson5865のアップロードから、潜在的なCryptoMinerの存在について信頼できる報告を受け取りました。もし、このメンバーからダウンロードした場合は、確認をお願いします。詳細については、こちらで確認できます。检测到来自成员上传的恶意软件: 我们已收到来自@jekson5865上传的报告,指出存在潜在的CryptoMiner。请检查您是否下载过该成员的文件。完整详情请点击这里查看。

Need help with Python script on importing user-posted genres tag from dlsite.com by html scrapping.

sonicstream

New member
May 18, 2022
5
1
I have hundreds of doujin folders to manage so I wrote python script that output excel file where all doujin folders are hyperlinked for easier access and can be sort by author and name.

I'm a novice in Python programming so I compiled the script with help of ChatGPT, but then while the script is able to extract genres from official tag, the script cannot extract genres that is voted by user reviews.

My code is compiled so to collect text embedded in <a href="/maniax/fsr/=genre/\d+/"></a> and <li class="meny selected "></li>, combining them into one line of output string text.

While <a href="/maniax/fsr/=genre/\d+/"></a> worked as intended in extracting work genres, <li class="meny selected "></li> failed to extract review voted work genres.

What else do i need to modify?

for url in product_urls:
product_link = url['href']
if "product_id" in product_link: # product_idが含まれるリンクだけを対象とする
product_response = requests.get(product_link)
product_soup = BeautifulSoup(product_response.content, "html.parser")

# /maniax/fsr/=/genre/*番号*/ のパターンを持つリンクを取得(class="btn_default"の有無に関わらず)
genre_links = product_soup.find_all("a", href=re.compile(r'/maniax/fsr/=/genre/\d+/'))

# 取得したリンクから<a>タグのテキスト内容を抽出し、genres集合に追加
for genre_link in genre_links:
genres.add(genre_link.text)

# <ul class="meny_selected_list">の中の<li class="many_selected_item">を探す
selected_items = product_soup.select("ul.meny_selected_list > li.many_selected_item")

# 各<li>タグ内の<a>タグのテキストをgenres集合に追加
for item in selected_items:
if item.a: # <a>タグが存在する場合
genres.add(item.a.text)

# ジャンル名をカンマで区切って1つの文字列に連結
if genres:
genres_text = ", ".join(genres)
 

Attachments

  • 同人作品ファイルリスト整理_タグ追加.zip
    2.5 KB · Views: 117
Last edited:

Users who are viewing this thread

Latest profile posts

生きる糧 wrote on Otokonoko's profile.
Hello. Can you upload the video from
[kaosのファンティア (kaos)] 2025年10月投稿分まとめ
please
Bolt Crank wrote on FapForFun's profile.
Hello. Excuse me my request. An update for "Honey Village" version 1.1 from Nov/14/2025.

Original post:
https://www.anime-sharing.com/threa...m-action-rpg-jp-en-ch-pc-mac-android.1770008/

DLSite:
https://www.dlsite.com/maniax/work/=/product_id/RJ01272146.html
hdfjeyjshjaj wrote on Ryzen111's profile.
Please reupload
[240531][1266076][MELLOW] スカイコード パッケージ版 [v1.01 + Blue Skies -DayDream mix-]
https://www.anime-sharing.com/threads/-240531-1266076-mellow-v1-01-blue-skies-daydream-mix.1510897/
KATFILE part4 DL error

Thank you