My Sig Ads thread was deleted? WTF? o_X
This thread was full of all users signatures. =/ It took quite a while to prepare the data. Tis disappointing that it was so easily dismissed as spam/deletion-worthy.
https://bitcointalk.org/index.php?topic=20333.0The policy to not remove anything worked when the forum was small. Now that we have thousands of posts a day, we can't afford 50% of them being junk. The moderators are now instructed to be less tolerant of low-value posts.
Some guidelines:
1. Free speech - you can say anything as long as it is relevant and presented in a calm and polite manner. Swearing, SHOUTING etc. make your post more likely to be removed.
2. No zero value posts or threads, like "SELL SELL SELL"
3. No pointless or uninteresting threads.
4. No referral code spam
5. No NSFW content
1. Okay.
2. Okay. The thread had some value/significance as it allowed users to see all user's signatures in a single page view (e.g. view all pages - for threads with 25 pages or less)
3. Okay. See #2.
4. Okay.
5. Okay.
So why was the thread deleted?
Here is the method I used to compile the signatures:
#!/usr/bin/python
import httplib, logging, os, re, socket, ssl, stat
# Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
logging.basicConfig(level='DEBUG');log = logging.getLogger(":")
headers = {'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008121819 Gentoo Firefox/3.0.4'}
def writeDataToFile(data, file):
f = open(file, 'w')
os.chmod(f.name,stat.S_IREAD | stat.S_IWRITE | stat.S_IRGRP | stat.S_IROTH)
f.write(data)
f.close()
for x in range(44000):
url = "https://bitcointalk.org/index.php?action=profile;u="+str(x)
url = re.search('(https?)://([^/]*)(.*)', url)
if url.group(1) == "http": connection = httplib.HTTPConnection(url.group(2), timeout=10)
elif url.group(1) == "https": connection = httplib.HTTPSConnection(url.group(2), timeout=10)
try: connection.request("GET", url.group(3), '', headers);response = connection.getresponse()
except socket.error, e: log.error("socket error:"+url.group()+" "+str(e))
except socket.timeout, e: log.error("socket timeout:"+url.group()+" "+e.message)
except ssl.SSLError, e: log.error("ssl error:"+url.group()+" "+e.message)
except httplib.BadStatusLine, e: log.error("http status code error:"+url.group()+" "+e.message)
html = response.read()
if html[283:304] != "An Error Has Occurred":
#<td colspan="2" width="100%" class="smalltext"><div class="signature"></div></td>
sightml = '<table width="100%" cellpadding="0" cellspacing="0" border="0" style="table-layout: fixed;">\n+'
sightml += ' <tr>\n+'
sightml += ' <td style="padding-bottom: 0.5ex;"><b>Signature:</b></td>\n+'
sightml += ' </tr><tr>\n+'
sightml += ' <td colspan="2" width="100%" class="smalltext"><div class="signature">(.*)</div></td>\n+'
sightml += ' </tr>\n+'
sightml += ' </table>'
sig = re.search(sightml, html)
if sig.group(1) != "":
print x,sig.group(1)
writeDataToFile(sig.group(1), 'g2/'+str(x))
And then converting html back to bbcode consisted of executing each of these a few times trying to convert all the children first:
for i in g2/*;do iconv -f latin1 < $i > "$i.";mv "$i." "$i";done
sed -i "s|<hr />|[hr]|g" g2/*
sed -i "s|<i>|[i]|g" g2/*
sed -i "s|</i>|[/i]|g" g2/*
sed -i "s|<b>|[b]|g" g2/*
sed -i "s|</b>|[/b]|g" g2/*
sed -i "s|<br />|[br]|g" g2/*
sed -i "s|<sup>|[sup]|g" g2/*
sed -i "s|</sup>|[/sup]|g" g2/*
sed -i "s|<sub>|[sub]|g" g2/*
sed -i "s|</sub>|[/sub]|g" g2/*
sed -i "s|<del>|[s]|g" g2/*
sed -i "s|</del>|[/s]|g" g2/*
sed -i "s|<pre>|[pre]|g" g2/*
sed -i "s|</pre>|[/pre]|g" g2/*
sed -i "s|<tt>|[tt]|g" g2/*
sed -i "s|</tt>|[/tt]|g" g2/*
sed -i "s|<li>|[li]|g" g2/*
sed -i "s|</li>|[/li]|g" g2/*
sed -i "s|<marquee>|[/move]|g" g2/*
sed -i "s|</marquee>|[/move]|g" g2/*
sed -i "s|<table style=\"font: inherit; color: inherit;\">|[table]|g" g2/*
sed -i "s|</table>|[/table]|g" g2/*
sed -i "s|<tr>|[tr]|g" g2/*
sed -i "s|</tr>|[/tr]|g" g2/*
sed -i "s|<td valign=\"top\" style=\"font: inherit; color: inherit;\">|[td]|g" g2/*
sed -i "s|</td>|[/td]|g" g2/*
sed -i "s|<ul style=\"margin-top: 0; margin-bottom: 0;\">|[list]|g" g2/*
sed -i "s|</ul>|[/list]|g" g2/*
sed -i "s/<img src=\"\([^\"]*\)\" alt=\"[^\"]*\" border=\"0\" \/>/[img]\1[\/img]/g" g2/*
sed -i "s/<img src=\"\([^\"]*\)\" alt=\"[^\"]*\" height=\"[^\"]*\" border=\"0\" \/>/[img]\1[\/img]/g" g2/*
sed -i "s/<img src=\"\([^\"]*\)\" alt=\"[^\"]*\" width=\"[^\"]*\" border=\"0\" \/>/[img]\1[\/img]/g" g2/*
sed -i "s/<img src=\"\([^\"]*\)\" alt=\"[^\"]*\" width=\"[^\"]*\" height=\"[^\"]*\" border=\"0\" \/>/[img]\1[\/img]/g" g2/*
sed -i "s|<a href=\"\([^\"]*\)\" target=\"_blank\">\([^<]*\)</a>|[url=\1]\2[/url]|g" g2/*
sed -i "s|<a href=\"\([^\"]*\)\">\([^<]*\)</a>|[url=\1]\2[/url]|g" g2/*
sed -i "s|<span style=\"text-decoration: underline;\">\([^<]*\)</span>|[u]\1[/u]|g" g2/*
sed -i "s|<div align=\"center\">\([^<]*\)</div>|[center]\1[/center]|g" g2/*
sed -i "s|<div align=\"left\">\([^<]*\)</div>|[left]\1[/left]|g" g2/*
sed -i "s|<div align=\"right\">\([^<]*\)</div>|[right]\1[/right]|g" g2/*
sed -i "s|<div class=\"code\"><pre style=\"margin-top: 0; display: inline;\">\([^<]*\)</pre></div>|\[code]\1\[/code\]|g" g2/* # <-- Change \[ to [
sed -i "s|<span style=\"color: \([^;]*\);\">\([^<]*\)</span>|[color=\1]\2[/color]|g" g2/*
sed -i "s|<div style=\"text-align: right;\">\([^<]*\)</div>|[right]\1[/right]|g" g2/*
sed -i "s|<div class=\"quote\">\([^<]*\)</div>|[quote]\1[/quote]|g" g2/*
sed -i "s|<div class=\"quoteheader\">Quote from: \([^<]*\)</div>\[quote\]|[quote=\1]|g" g2/*
sed -i "s|<div class=\"quoteheader\">Quote</div>||g" g2/*
sed -i "s|<span style=\"background-color: \([^;]*\);\">\([^<]*\)</span>|[glow=\1,2]\2[/glow]|g" g2/*
sed -i "s|<span style=\"font-size: \([^;]*\); line-height: 1.3em;\">\([^<]*\)</span>|[size=\1]\2[/size]|g" g2/*
sed -i "s|<font size=\"\([^\"]*\)\" style=\"line-height: 1.3em;\">\([^\"]*\)</font>|[size=\1]\2[/size]|g" g2/*
sed -i "s|<span style=\"color: \([^;]*\);\">\([^<]*\)</span>|[color=\1]\2[/color]|g" g2/*
And then preparing posts to be under 64,000 (60,000 to be safe because even under 63,000 triggered the 64,000 limit) limit:
c=1;for i in g2/*;do if test ! -f "y/$c"; then cat $i >> "y/$c";echo -n "[hr]" >> "y/$c";else fs=$(stat -c%s "y/$c");len=$(cat $i|wc -m);ts=$(echo $fs + $len + 4|bc -l);echo $ts; if test $ts -lt 60000;then cat $i >> "y/$c"; echo -n "[hr]" >> "y/$c"; else c=$(echo "$c + 1"|bc);cat $i >> "y/$c";echo -n "[hr]" >> "y/$c";fi;fi;done
[/code]