

41·
1 month agoThe data is non critical and doesn’t contain indentifying info so I use ocr.space API. You could probably find ways to use the tesseract libraries locally.


The data is non critical and doesn’t contain indentifying info so I use ocr.space API. You could probably find ways to use the tesseract libraries locally.


A governmental-ish site I’m required to use doesn’t push notifications as mails, so you have to login daily to check for updates. Updates may happen multiple times daily or once a month. I automated my server to access the site once a day with my credentials, screenshot the notifications, parse them with ocr, and send myself a mail.


One of the reasons I switched to YunoHost (the other being backups).
Since the dawn of LLMs it’s virtually impossible to scrape web content. Headless browsers have become basically useless. I actually have to automate keyboard inputs to simulate the navigation. I could maybe try to write the javascript cache to file but honestly it’s just faster that way.