Need help with Python script on importing user-posted genres tag from dlsite.com by html scrapping.

sonicstream

New member
May 18, 2022
5
1
I have hundreds of doujin folders to manage so I wrote python script that output excel file where all doujin folders are hyperlinked for easier access and can be sort by author and name.

I'm a novice in Python programming so I compiled the script with help of ChatGPT, but then while the script is able to extract genres from official tag, the script cannot extract genres that is voted by user reviews.

My code is compiled so to collect text embedded in <a href="/maniax/fsr/=genre/\d+/"></a> and <li class="meny selected "></li>, combining them into one line of output string text.

While <a href="/maniax/fsr/=genre/\d+/"></a> worked as intended in extracting work genres, <li class="meny selected "></li> failed to extract review voted work genres.

What else do i need to modify?

for url in product_urls:
product_link = url['href']
if "product_id" in product_link: # product_idが含まれるリンクだけを対象とする
product_response = requests.get(product_link)
product_soup = BeautifulSoup(product_response.content, "html.parser")

# /maniax/fsr/=/genre/*番号*/ のパターンを持つリンクを取得(class="btn_default"の有無に関わらず)
genre_links = product_soup.find_all("a", href=re.compile(r'/maniax/fsr/=/genre/\d+/'))

# 取得したリンクから<a>タグのテキスト内容を抽出し、genres集合に追加
for genre_link in genre_links:
genres.add(genre_link.text)

# <ul class="meny_selected_list">の中の<li class="many_selected_item">を探す
selected_items = product_soup.select("ul.meny_selected_list > li.many_selected_item")

# 各<li>タグ内の<a>タグのテキストをgenres集合に追加
for item in selected_items:
if item.a: # <a>タグが存在する場合
genres.add(item.a.text)

# ジャンル名をカンマで区切って1つの文字列に連結
if genres:
genres_text = ", ".join(genres)
 

Attachments

  • 同人作品ファイルリスト整理_タグ追加.zip
    2.5 KB · Views: 104
Last edited:

Users who are viewing this thread

Latest profile posts

taatat wrote on Ryzen111's profile.
ccps960408 wrote on Ryzen111's profile.
不好意思,打擾了
大佬,這部也要麻煩補檔了!
https://reurl.cc/5bOVYz
[SURVIVE MORE] 巨乳バリキャリOLが超ド田舎に住む男といちゃらぶ関係になって一夏の間ひたすら中出ししまくる話 The Motion Anime
感謝大佬!
DeadPotato wrote on Shine's profile.
https://www.anime-sharing.com/threads/✨shine✨-240103-マヨタマ-【1-12日まで-早期限定31大特典】【5時間x3人母乳母娘x女王ハーレム】~英雄召喚されたボクと-~ドスケベ爆乳女王母娘x3の母乳たっぷり孕ませ子作りおっぱい英雄譚♪-rj01301039.1607936/

links are down. thanks for uploading, as usual!
ccps960408 wrote on Ryzen111's profile.
不好意思打擾了
麻煩大佬補檔
https://reurl.cc/la7lZq
[SURVIVE MORE] ひなたの向こう。 The Motion Anime
感謝大佬!
akirayo wrote on Ryzen111's profile.