I was getting a little agitated that my professional friends Sam and Ben seemed really into their personal programming projects recently. Where the hell was my personal programming project and why was I not working on it in my spare time like a proper adult person. Well, because I’m not a proper adult person. Not really. But then, of course, I remembered WangBot. It turns out you can be a little juvenile and work on interesting side-projects from work, which is really the greatest of both worlds. It’s not something I’ll be able to talk about in a job interview without a lot of lies of omission (“oh yeah, I created an interactive IRC bot called wangbot for our channel wangingout. some of its commands include ‘!wangme’, wherein it talks dirty at you, and ‘!greet’ wherein it mixes up topical political commentary with dick jokes and insults. hire me?”) but it is a way to keep busy and productive in my free time instead of mooching around the house burning through the entire Netflix back catalogue.
So, then; wangbot. I don’t really have an end-game with wangbot. Most of the time I am dicking about in the IRC channel with the boys and one of them will say something that stirs some inspiration. I’m pretty sure the bulk of wangbot’s voting code was implemented in one afternoon, after it was first discussed. Then a few subsequent days to iron out the bugs, and now we have the statute book online. Most of this was hacked together in TCL, the scripting language used by the eggdrop IRC bot. I am learning it as I go, which means a lot of browser tabs.
Another thing it can do is tweet to the @wangingout twitter account. That it cannot do in TCL. I guess? If it can, it’s easier to do it in python. I checked. Commandline tweeting was an aim of mine before I went for a four-month stay in China. I wanted a way to quickly SSH into my home server or access a very simple web page on it and I needed a suitable back-end to access twitter and bring its interface to me in the firewalled People’s Rep. But having a way for our IRC channel to shout at our twitter friends not present was a great little side-effect of this. So, there we go, it’s mostly TCL with a few external calls to python scripts to handle some of the modern stuff.
A work in progress is a way to link IRC nicks and twitter handles in wangbot’s memory, so when someone mentions a nickname in a tweet or topic or tabled motion, wangbot can throw in the @ symbol, map usernames and be better-integrated. Same in reverse. When @wangingout gets a mention, wangbot posts it in the channel. It would be good, for those whose IRC clients light up specifically at the sound of their name, to have those @ handles turned into nicks. Have to implement the stages of authentication, though, where the twitter user proves ownership of the IRC nick and the nick holder proves ownership of the twitter handle.
While thinking about all that, I decided to delve into sqlite too. It’s small and lightweight (liteweit?) and would be good for storing the small small number of mappings we’d ever need for that kind of stuff.
So that’s how wangbot got his backend. Python and sqlite.
One previous suggestion was to use the thatcan.be to generate novel greetings for users entering the IRC channel. Trouble is, the @wangingout account doesn’t really talk about that much. There’s no voice there to imitate. Occasionally one of us will !tweet a great out-of-context line, but amongst all the announcements that’s kind of lost. But there is a voice in the IRC channel. We all contribute to it. It’s us. We are #wangingout.
Right, so, let’s learn how to speak like us. First step: harvest the logs for all the words we say.
Next, the first iteration. Chain words together by counting the frequency of word pairs and using that as a probability to randomly generate from. chrysics pointed out that this was essentially a Markov chain of words. He’s right, but I don’t remember what things are called. Obviously some of that shit sank in though because here we are.
But mostly it was garbage! Thanks Markov, you douchebag. So I upgraded it today to word triplets. If possible, use the previous two words to pick the next one. If not, use the previous word to pick the next two. If that’s not possible, fall back to each word picking one next word. If THAT’s not possible, end the sentence and start a new one. Fun, right? I’ll post some highlights at the end.
But the thing this post is about, and that I wanted to talk about here, is not any of the stuff above. It’s the top 25 by frequency triplets of words (case insensitive, punctuation stripped). I thought it was some interesting data. This isn’t what my friends and I talk about, it’s what everyone says in between talking about stuff:
i|dont|know|114 you|have|to|103 im|going|to|102 i|dont|think|99 i|want|to|89 a|lot|of|89 i|have|a|89 i|need|to|84 one|of|the|79 one|of|those|78 i|have|to|77 im|not|sure|75 be|able|to|73 all|the|time|65 going|to|be|65 it|was|a|63 i|think|i|63 a|bunch|of|59 a|couple|of|58 the|end|of|57 i|have|no|55 it|would|be|55 that|would|be|53 in|the|office|52 to|be|a|52
This is how you get a bot to start sounding like a person. String some fucking triplets of words together. I have this now. It’s in a database. I can use it somehow to make my stupid bot make more effective dick jokes. Just got to figure out how!
- cantona wearing a hat are too many letters to write spoilers.
- joining the irc boys may be for something actually important ill handle this one really good quote on it she was ok i.
- ill be more than one finger sarnie for supper for business continuity solutions.
- u2 are rubbish but thats exactly how this is.
- i would do 16million shades of gay porn film how did the system preferences just come in to.