ORG-tech-vols IRC meeting log 2014 05 20

(19:39:37) korikisulda: Might as well get here early so I don't forget ^.^
(19:41:37) graphiclunarkid: Hey!
(19:41:53) graphiclunarkid: Good to see you korikisulda :)
(19:41:58) korikisulda: Ohaiiii
(19:48:41) ***graphiclunarkid AFK
(19:49:31) korikisulda: Ah, the thing I dreaded
(19:50:00) korikisulda: Convincing Java to accept the certificate that differs from the domain...
(20:07:16) dantheta [~daniel@dsl-217-155-42-217.zen.co.uk] entered the room.
(20:08:06) korikisulda: Hello
(20:13:08) dantheta: Hiya - sorry about that, I signed on and then popped out for a pre-meeting ciggie. How's it going?
(20:14:10) korikisulda: Interestingly
(20:14:41) korikisulda: I'm in the dark depths of working out how to convince apache that bowdlerize is in fact not trying to kill me ^.^
(20:15:39) korikisulda: specifically the SSL cert
(20:16:06) dantheta: Ah right ... is the program caching the old cert?
(20:16:17) korikisulda: Not as such
(20:16:38) korikisulda: It's just that the https cert is for blocked.org.uk, not bowdlerize
(20:16:54) dantheta: yep - the hostname that is used to connect to it has changed too, so they should agree
(20:16:59) dantheta: api.blocked.org.uk
(20:17:18) korikisulda: You get a redirect from a browser, so...
(20:17:26) dantheta: Ah, I didn't remember the redirect
(20:17:34) dantheta: I should fix that now
(20:17:43) korikisulda: Awesomeness :D
(20:19:11) dantheta: There we go - all done. I forgot about browser access (since there's nothing really there for browsers)
(20:19:49) dantheta: It's probably quite handy for the odd bit of testing though!
(20:22:33) korikisulda: Ah, there we are
(20:22:48) korikisulda: Had to change the URL to be correct, but it now seems to work :D
(20:23:55) korikisulda: Actually, never mind :s
(20:24:21) dantheta: Still broken?
(20:24:31) korikisulda: I suspect the problem is mine
(20:25:02) dantheta: You're getting a 403 from /register/probe
(20:25:41) korikisulda: A 403? I'm disregarding status codes entirely atm :s
(20:25:45) dantheta: and you got a 400 from prepare/probe just before that
(20:26:01) dantheta: I think you've got a missing parameter in the prepare/probe call, and that's why register probe isn't working
(20:26:29) dantheta: Nope, that's the problem - timestamp out of range on prepare probe. The timestamps have to be in UTC
(20:26:40) dantheta: (just in case that's the problem)
(20:27:04) korikisulda: Ah, that's possibly it
(20:27:39) vasilis [~vasilis@gateway/tor-sasl/vasilis] entered the room.
(20:27:44) mkillock [~matt@78-32-160-53.static.enta.net] entered the room.
(20:28:04) dantheta: Yeah, the API royally soiled its pants when the clocks changed, and then I fixed it.
(20:28:28) korikisulda: I remember a TV channel had it wrong for a few days afterwards
(20:28:32) korikisulda: That was pretty funny to me
(20:29:10) mkillock: hi all
(20:29:20) dantheta: I tend to prefer to run my servers on UTC all year round
(20:29:31) dantheta: Oop - back in a mo, shopping arrives at the worst possible time
(20:29:33) dantheta: back in 5
(20:29:35) korikisulda: It's a good policy imo
(20:29:39) korikisulda: Eeep, okay
(20:30:03) ***graphiclunarkid back
(20:30:05) graphiclunarkid: Hi folks
(20:30:17) mkillock: Hi
(20:30:23) korikisulda: Welcome back
(20:30:35) vasilis: Hi everyone
(20:31:16) mkillock: Can I start with a very quick question?
(20:31:32) graphiclunarkid: Shoot
(20:31:32) vasilis: mkillock: go ahead
(20:32:05) mkillock: Is Alex the website HTML designer for blocked.org.uk, and is he here?
(20:32:20) graphiclunarkid: Yes and no. In that order :)
(20:32:27) mkillock: he he, ok
(20:32:35) graphiclunarkid: Alexxx has done almost all the front-end work so far.
(20:32:37) korikisulda: Okaies, got my probe using UTC and it's fine now
(20:33:41) mkillock: well, I don't want to tread on toes, but I will have a go at copying his design for the URL history results page
(20:33:46) mkillock: if that's ok
(20:34:25) graphiclunarkid: mkillock: Not seen Alex for a while actually - I think he was away for a bit and now not sure where he is. Do feel free to make changes though. We're getting close to launch now so I think it's all hands to the pumps!
(20:34:41) dantheta: korikisulda: cool!
(20:34:46) graphiclunarkid: If you're particularly unlucky I might even have a go at closing a few tickets ;-)
(20:34:47) dantheta: Back now, sorry about that
(20:35:20) mkillock: what needs to be done still?
(20:36:02) graphiclunarkid: Here's the list: waffle.io/openrightsgroup/cmp-issues
(20:36:31) graphiclunarkid: If you filter it on Version 2.0 milestone that's what we're aiming to achieve before the end of May.
(20:36:45) graphiclunarkid: http://waffle.io/openrightsgroup/cmp-issues
(20:36:56) graphiclunarkid: https://waffle.io/openrightsgroup/cmp-issues even!
(20:37:05) mkillock: got it
(20:37:06) ***graphiclunarkid fails at hyperlinks.
(20:37:25) mkillock: my irc client requires me to copy and paste anyway!
(20:37:43) graphiclunarkid: mkillock: Do you think I can close this bug now? I don't think we're relying on FormSave any more after your changes... https://github.com/openrightsgroup/blocked-org-uk/issues/8
(20:38:01) dantheta: The submission form is working a treat.
(20:38:23) graphiclunarkid: dantheta: Can you remind me of the URL for that please? I have a couple of interesting URLs to add to the list!
(20:38:38) dantheta: http://stage.blocked.org.uk/
(20:38:38) mkillock: We don't rely on FormSave but we are using it to store a backup copy of the submissions
(20:39:07) graphiclunarkid: mkillock: Ah ok. I'm not expecting my pull request to be acted upon any time soon unfortunately - the FormSave project looks dead.
(20:39:07) mkillock: we could amend the submission process to save data to a custom table, but I guess there are more important thigs to do first?
(20:39:14) graphiclunarkid: mkillock: Indeed.
(20:39:21) graphiclunarkid: dantheta: Thanks
(20:40:20) mkillock: ok - issue 10 on waffle.io
(20:40:35) mkillock: it's done, except the HTML needs sorting
(20:40:53) dantheta: Cool
(20:40:55) Allll [Allll@cpc1-sals3-2-0-cust147.15-1.cable.virginm.net] entered the room.
(20:40:55) mkillock: 10 and 29 seem to be the same thing
(20:41:49) mkillock: mmm, the same code can be used for 10 and 29, if you want a separate page for just getting history?
(20:42:45) dantheta: I think they used to have slightly different meanings, but I can't remember what they were
(20:43:15) dantheta: Perhaps one was for a search page and one was to show in response to a submission, I'm not sure
(20:43:29) dantheta: You're right though, the API will treat them the same
(20:43:59) mkillock: the snippet I've done can be used to collect the data and display differently using different chunks on each page
(20:44:04) graphiclunarkid: Yeah, one was for the front-end changes to display the information, the other was for the back-end call to the API to retrieve the data.
(20:44:13) mkillock: oh ok
(20:44:31) mkillock: ok then the back-end call thing has been done
(20:44:40) graphiclunarkid: Just in case dantheta was working on one and Allll the other!
(20:44:49) dantheta: hehe :)
(20:45:08) graphiclunarkid: Actually, I might have got my Allll and my Alexxx mixed up earlier, but since Allll has arrived I'm sure I'll be corrected if I was wrong!
(20:45:50) Allll: both the same, Alex is fine tho! sorry I haven't had much time the last week or so to do much, I'm just catchingu up on progress. Let me know whatever you need me to do
(20:46:10) graphiclunarkid: mkillock: Cool - want me to close #10 then? Or you can if you like.
(20:46:19) graphiclunarkid: Allll: Ah - that explains it, thanks :)
(20:46:29) graphiclunarkid: Allll: Good to see you BTW :) How's tricks?
(20:46:53) Allll: good cheers, all back to normal now :)
(20:47:09) graphiclunarkid: Allll: :)
(20:47:10) Allll: looks like everything's made a lot of progress in the last few weeks
(20:47:31) mkillock: graphiclunarkid: I'll do it
(20:47:45) graphiclunarkid: Yeah - there's been a lot going on. mkillock has been killing the Modx snippets stuff and dantheta has got the API into a pretty comlete state!
(20:48:05) dantheta: Yep - we've got two more mobile ISP results for the alexa 10k too!
(20:48:07) graphiclunarkid: vasilis has rebuilt the Raspberry Pi image-building scripts from scratch and there's a new repo up.
(20:48:20) Allll: sweet :)
(20:48:24) graphiclunarkid: Real data is now coming in thick and fast - as dantheta says.
(20:48:31) mkillock: Allll: I should explain what I've done with respect to the submission/url history to you
(20:48:53) vasilis: to ad
(20:49:01) graphiclunarkid: In case you missed the news, plett and the team at A&A have given us access to 6x broadband lines, covering the major UK ISPs.
(20:49:07) Allll: I know I need to move some of the stuff from the repo (images etc) that you committed to the modx, but I've not seen any content to add in - or is that being done straight into the server?
(20:49:25) graphiclunarkid: Allll: My fault - most of the content issues are still open.
(20:49:36) Allll: that's good, looks like we on for getting it running by the end of the month?
(20:49:51) vasilis: *add in what graphiclunarkid I have also added "ooniprobe superpowers" on the original raspi-config
(20:49:52) graphiclunarkid: Allll: However I have a long day of boring travelling to do tomorrow so I was planning on doing some writing then.]
(20:50:05) Allll: glk: that's cool, just let me know what you want me to do and I'll work through it
(20:50:27) Allll: should kill the time!
(20:50:43) graphiclunarkid: Allll: There's a list of open issues here: https://waffle.io/openrightsgroup/cmp-issues
(20:51:09) mkillock: graphiclunarkid: Can't see how to close issue 10 but I have commented
(20:51:37) plett: graphiclunarkid: I'm still working on getting all the login details for the broadband lines so I can set all the parental filters as we want them
(20:51:37) Allll: cool, I'm assuming Matt and Dan have all the middleware/API side of things sorted by the sounds of it
(20:51:42) graphiclunarkid: vasilis: dantheta: we need to have a think about what commands the Raspberry Pi super-powers should have to fit in with the API. I was going to raise issues for each after the last meeting however I realised a discussion might be useful first.
(20:52:16) mkillock: Allll: hopefully! Seems to work anyway. Main thing left is issue 29, which I have part completed
(20:52:41) graphiclunarkid: Hi plett - thanks for that. We are testing the lines "as they are" at the moment - and that should provide a useful baseline. We'll want to change them all to be at their "default on filtering" settings before launch, though, I think.
(20:52:44) dantheta: plett: that sounds great, thanks!
(20:52:44) Allll: matt: cool, if you need any HTML/styling for it just let me know
(20:53:03) dantheta: glk: and hopefully have time to run the alexa1m as well
(20:53:11) Allll: glk: do you have an ETA for moving to the same server as the main Modx site?
(20:53:47) mkillock: Allll: there is a test-submission page here: http://stage.blocked.org.uk/test.html
(20:53:58) graphiclunarkid: mkillock: I have closed #10 (on waffle.io you can drag issues to the "done" column to close them on github).
(20:54:09) mkillock: graphiclunarkid: ta
(20:54:21) mkillock: Allll: that page doesn't need styling
(20:54:41) mkillock: Allll: It's just for testing the code, but the page it redirects to needs styling
(20:54:57) graphiclunarkid: Allll: I think this is pretty much the last unknown. I think you, me, mkillock and Lee from ORG probably need to discuss it via email. It needs to be done before we go live!
(20:55:24) vasilis: I run the alexa top1m in all AAISP broadband lines, and quite strange all of the probes couldn't connect to the backend after probing ¨ 40000 URLs
(20:55:37) mkillock: Guys, I've got to go for food, will be back later
(20:55:53) graphiclunarkid: TTYL mkillock
(20:56:00) Allll: matt: ok, just checked that out, I'll get styling the page it's submitted to. Presumably once you're happy with the form you'll copy it to the text home page in place of the non functioning one - or let me know and I'll do it
(20:56:19) dantheta: vasilis: ooniprobe is going to have resource problems on the AA VMs
(20:56:35) mkillock: Allll: basically the three HTML chunks under Chunks/CMPMiddleware need styling
(20:56:51) Allll: I'll get that done
(20:56:53) graphiclunarkid: dantheta: vasilis: maybe 1M is too many to run at once? We might need a script that carves the list up into batches?
(20:57:09) mkillock: Allll: If you need changes to how the chunks work, then let me know
(20:57:23) vasilis: dantheta: yep disk space limitations for sure, but I could config them to save nothing there..
(20:57:35) mkillock left the room (quit: Quit: Leaving).
(20:57:36) Allll: matt: will do
(20:57:43) dantheta: ooniprobe is cpu-bound, and the HP microserver is getting caned. It has two physical cores, running 6 VMs
(20:57:58) vasilis: graphiclunarkid: I have a script already for that..
(20:58:11) dantheta: I didn't check for memory leaks, but I did see that it was having a bit of trouble.
(20:58:22) vasilis: dantheta: I don't think that ooniprobe uses high CPU...
(20:58:47) vasilis: It's running pretty well on RasPis!!
(20:58:52) Allll: glk: re server move - that's fine drop me an email when you're ready, if you know roughly when it is let me know and I'll make sure I have some time free for then
(21:00:18) dantheta: vasilis: It's better now than it was yesterday - all 6 VMs had ooniprobe at 99%cpu
(21:01:50) graphiclunarkid: Allll: OK will do. I'll start a thread to discuss it shortly.
(21:02:50) vasilis: dantheta: The issue could be related to Tor, since ooniprobe makes heavy use of Tor.
(21:03:09) dantheta: You could cut the 1m file into 40 * 25000, and run a new instance of ooniprobe for each file (but only one at a time). That way you don't hit 40k.
(21:03:49) dantheta: That also means that if you're leaking memory or file descriptors, they get cleaned up when the process dies
(21:04:11) dantheta: supervisor probably isn't the right taskmaster for that though.
(21:04:51) vasilis: dantheta: True... supervisor is quite easy to manage from ansible playbooks thought.
(21:05:39) dantheta: Yes, but not the always the right job for batch processing
(21:06:03) dantheta: s/job/tool
(21:06:30) vasilis: dantheta: ..any reccomendations?
(21:07:13) dantheta: It could be as simple 4 line script, you can kick it off using ansible's script command (in the files module)
(21:07:39) dantheta: The script forks to the background and keeps running. The 'at' command can also be used to schedule one-off jobs
(21:09:51) dantheta: supervisor is better for recurring worker jobs
(21:09:58) vasilis: dantheta: are you sure that this is
(21:10:21) vasilis: quite stable to run via ansible playbooks?
(21:10:46) graphiclunarkid: korikisulda: I really want to see if we can use your library to get the Android probe running with the v1.2 API.
(21:10:54) graphiclunarkid: korikisulda: If I wanted to look into that where would I start?
(21:11:08) korikisulda: Not going to work at the moment
(21:11:26) korikisulda: Just from the perspective of my library
(21:11:39) korikisulda: A lot of stuff is either unimplemented or now broken
(21:12:07) dantheta: vasilis: It can be. The trouble with having the alexa1m under supervisor is, if the job gets stopped and started the 1m starts again at the beginning.
(21:12:52) graphiclunarkid: korikisulda: But if the library were complete it would be something we could use for Android, right?
(21:12:58) korikisulda: If, yeah
(21:13:11) dantheta: korikisulda: the copy of the 1.1 api code (server side) that we got way,way back at the beginning was incomplete. The dev server has got some extra files that weren't in github
(21:13:18) graphiclunarkid: korikisulda: OK cool.
(21:13:45) graphiclunarkid: korikisulda: Is there some way we can help you develop the library?
(21:14:00) vasilis: dantheta: but this would happen anyway if the ooniprobe job gets stopped...
(21:14:32) dantheta: korikisulda: glk: I'm going to be re-implementing the request queueing stuff inside the API, but the interface will stay the same. It will help close #6 too.
(21:14:41) graphiclunarkid: korikisulda: I just want you to know I'm really keen that we make use of it :)
(21:14:52) korikisulda: At the moment, I'm having problems with apparently failing a signature check... Annnd... I'm not sure why tbh
(21:15:05) korikisulda: It was perfectly fine a while ago :s
(21:15:30) graphiclunarkid: korikisulda: Annoying :s
(21:15:59) korikisulda: I've checked the middleware code, and it doesn't look like that bit was changed since then.... *sigh*
(21:16:51) dantheta: korikisulda: I'll take a look in a bit
(21:17:07) graphiclunarkid: korikisulda: If there *is* anything that others could help with, if you raise github issues for those things, we might be able to get others to contribute pull requests (hell, I might even have a go!)
(21:18:36) dantheta: vasilis: this is true - but this means supervisor isn't adding anything
(21:19:18) dantheta: vasilis: If you were running <n> ooniprobes and feeding batches of URLs to them using a queue or the submission API, that would be very different
(21:20:04) graphiclunarkid: vasilis: dantheta: I've just forked lepidopter raspi-config to the ORG account so we can add blocked.org.uk API-specific features.
(21:20:12) vasilis: dantheta: It's actually giving a better interface to check the status of the process... correct me if i'm wrong but this can be done so easy with ansible. I found long async and poll times not such a neat solution.
(21:20:15) graphiclunarkid: (Should have done so last week but forgot - sorry).
(21:21:08) vasilis: dantheta: True but I'm not sure how stable the ooni-backend is ATM for providing URLs to probes..
(21:21:58) vasilis: I wanted to run the tests AFAP...
(21:22:19) dantheta: vasilis: It's perfectly fine to keep using supervisor for it, it is working!
(21:22:35) dantheta: vasilis: I just thought I'd mention a lighter alternative
(21:22:57) dantheta: How far has ooni got with the alexa1m
(21:22:59) dantheta: ?
(21:23:33) dantheta: korikisulda: I'll be about for a bit to help debug, if you like :)
(21:23:59) vasilis: dantheta: btw there is even a resume support in ooniprobe, but not sure if it's *really* what is supposes to do ;)
(21:24:18) dantheta: Oh good - now *that* makes supervisor worthwhile.
(21:24:33) dantheta: Your process dies - supervisor restarts it where it left off.
(21:24:35) dantheta: Very good.
(21:24:39) vasilis: dantheta: I have restarted the test earlier today.. I don't really know how far the test is.
(21:24:53) mkillock [4e94f5bf@gateway/web/freenode/ip.78.148.245.191] entered the room.
(21:25:08) vasilis: dantheta: I'm not sure that the resume function really works!
(21:25:27) dantheta: With a 1 million line job, it's worth having it!
(21:26:27) vasilis: The weird thing is that ooniprobe has never stopped running.. just connection errors with Tor in all nodes, at the same time!!
(21:26:58) vasilis: that's why I restarted the process..
(21:27:20) graphiclunarkid: vasilis: That sounds like it could be an issue with networking on the hypervisor or something. If it really happened at the same time on all 6 VMs.
(21:27:43) graphiclunarkid: s/hypervisor/host\ machine/
(21:27:55) ***graphiclunarkid knows almost nothing about how virtualisation works!
(21:28:21) dantheta: It's not a bad point, by any means
(21:28:35) ***korikisulda knows nothing at all
(21:29:17) mkillock: You could ping the Tor entry node, if no response then that's the problem
(21:29:24) mkillock: ah
(21:29:32) mkillock: well, supposing you know the tor entry node
(21:29:55) vasilis: If the problem persists I ll cut the alexa top 1m list
(21:29:56) mkillock: and it responds to ping
(21:30:09) dantheta: The 40,000 number seems like too big a coincidence. If I'm online next time it happens, give me a ping.
(21:30:11) vasilis: mkillock: The tor entry node was working correctly.
(21:30:48) vasilis: dantheta: You can check the ooniprobe logs
(21:31:21) mkillock: it's possible to do a tcp 'ping' if that helps
(21:31:25) graphiclunarkid: I think having a resume feature would be a good idea regardless. It could be that a network connection goes down temporarily, and ooniprobe decides to give up, then we'd want to resume from where we left off when it's restored.
(21:32:04) dantheta: I was thinking more of checking the state of the system when it happens - TCP ports, file descriptors, memory, that sort of thing
(21:32:06) vasilis: when graphiclunarkid enables access to the server where the git repo is hosted I 'll share all config files, playbooks and setup scripts.
(21:32:16) dantheta: Cool.
(21:32:25) graphiclunarkid: vasilis: Sorry - I'm on it. Just trying to do 10 things at once here!
(21:32:28) mkillock: so you could send a SYN ping to a remote TCP port, through Tor
(21:32:38) vasilis: graphiclunarkid: no worries ;)
(21:32:59) dantheta: brb
(21:36:09) dantheta: back
(21:36:54) vasilis: Does it make sense to a hackday for the censorship project next week?
(21:37:12) graphiclunarkid: vasilis: That could be fun!
(21:37:40) dantheta: If it's a weekday evening or weekend, that could work very well! We had an impromptu one of those the other week when a bunch of us were online at the same time coincidentally
(21:37:47) mkillock: Does Tor have any kind of inbuilt scraping detection/protection?
(21:38:25) graphiclunarkid: dantheta: Indeed!
(21:38:49) vasilis: mkillock: scraping?
(21:39:01) graphiclunarkid: mkillock: I haven't heard it does anything like that inherently - though nodes can set their own policies about traffic throughput to a certain extent.
(21:39:23) graphiclunarkid: I would guess each of the VMs has its own path through the Tor network though.
(21:39:28) dantheta: I'm looking again at the ooni scripts in the near future, so that could be quite a good thing to get done when we're assembled.
(21:40:05) graphiclunarkid: vasilis: I will start a thread on the mailing list asking people to say when they'd be available for a hackday/evening.
(21:40:06) mkillock: vasilis: basically - systematically going through a list of connections (usu to same website) - e.g. using Google to try 1000's of search terms to do SEO investigation
(21:41:16) mkillock: graphiclunarkid: yeah, different path except perhaps the entry node?
(21:42:29) graphiclunarkid: mkillock: I'm not sure. Is there anything that would cause them all to pick the same entry node?
(21:43:21) mkillock: graphiclunarkid: I thought that's how it worked? You have to connect to the node using a known entry point, then each TCP connection picks a random path through the network?
(21:43:45) vasilis: mkillock: I think twisted (https://twistedmatrix.com) could probably do this
(21:44:52) mkillock: vasilis: ok, not sure I understand? What would you use that for?
(21:45:43) mkillock: TCP ping can be done with hping3 or tcping in windows
(21:45:48) korikisulda left the room (quit: ).
(21:45:56) dantheta: or nmap in Linux
(21:45:57) graphiclunarkid: mkillock: But there are many known entry points - does Tor have a system for picking the fastest based on local network location and conditions?
(21:46:14) vasilis: mkillock: [..]"systematically going going through a list of connections"[..]
(21:46:15) graphiclunarkid: That could cause all 6 VMs to pick the same entry point.
(21:47:27) vasilis: graphiclunarkid: The 6 VMs run in different networks.
(21:48:50) mkillock: graphiclunarkid: I don't know
(21:49:20) vasilis: brb
(21:50:02) mkillock: graphiclunarkid: there are lots of entry points, just I can't see how a Tor user joins the network without having some kind of entry point, or at least a place to get a list of entry points to choose from
(21:50:34) graphiclunarkid: vasilis: Yeah, but if they're all on the same physical box, it's possible that with few enough entry nodes available they would all see the same one as closest - if such a mechanism exists. Unlikely though!
(21:51:15) graphiclunarkid: mkillock: I don't know how initial node selection is done, but if they *were* all connected to the same one, it could explain the simultaneous death of all six connections that vasilis observed.
(21:51:40) graphiclunarkid: Of course it could be something in the nature of whatever was returned by the 40,000th URL that killed the probe too!
(21:52:18) dantheta: I think another possibility is local resource exhaustion - all of the ooniprobes are started at the same time, but they will drift apart from each other over time. 40,000 makes me think of ephemeral port usage or file descriptors.
(21:53:06) graphiclunarkid: Yeah - something like that occurred to me too - especially if such a limit were shared between all VMs on the box.
(21:53:38) mkillock: they will have a 65535 limit if all go through the same IP
(21:53:52) mkillock: but it's 6 ISPs, right?
(21:54:04) mkillock: so 6 IPs
(21:55:09) dantheta: Yep.
(21:55:20) dantheta: Each VM has its own subnet bridging back through the VM host
(21:56:09) mkillock: every VM has it's own default GW?
(21:56:16) dantheta: yep
(21:56:33) mkillock: and only ORG are using each ISP connection?
(21:56:45) vasilis: I think we would know for sure tomorrow..
(21:56:59) vasilis: 40000 are crawled pretty fast
(21:57:19) dantheta: About 8 hours, isn't it?
(21:57:36) dantheta: 120,000 a day is 8 days for the full 1m
(21:59:09) dantheta: I'd really like to get the A&A VM filtering switched on soon though, so that we've got time for running the 1m against the filters
(21:59:24) vasilis: dantheta: not sure..
(21:59:34) vasilis: let me have a look..
(21:59:35) dantheta: It would be very cool to have the results of the most popular websites ready and waiting in the database when people search
(22:00:20) graphiclunarkid: dantheta: I think there will be diminishing returns as we work down that list - 1M is a lot of websites!
(22:00:35) graphiclunarkid: I agree we should try to get as many as we can in there though.
(22:00:47) graphiclunarkid: But it won't be a disaster if we don't process all 1M on all lines in time.
(22:01:16) dantheta: We've already got the 100k on the two fixed lines that arrived with their chosen filtering options (talktalk & AA)
(22:01:29) dantheta: We've got 250,000 on the other 4 lines in their unfiltered state
(22:01:35) Allll: gotta run, I'll have a look at styling the chunks soon and will speak over email about server move - cya
(22:01:53) mkillock: Allll: email me on the list if you need explanation
(22:01:59) Allll: cheers
(22:02:00) vasilis: ~25500 probed URLs the last 9 hours
(22:03:00) mkillock: :) very cool
(22:03:08) dantheta: Around 13 days for the 1m, if I calculate correctly
(22:03:22) dantheta: Perhaps if we just do the top 100k or 250k instead?
(22:03:34) mkillock: ah - question about the stats
(22:03:44) dantheta: mkillock: sure!
(22:04:01) mkillock: we're stating: "14002 SITES WRONGLY BLOCKED"
(22:04:12) vasilis: we shouldn't forget that some websites are probed faster
(22:04:13) dantheta: Yes, I was wondering about that
(22:04:16) mkillock: do we know they are all wrongly blocked?
(22:04:46) graphiclunarkid: mkillock: I think if we're going to claim that we'd want to have checked them manually first.
(22:04:54) dantheta: No, we don't. We know that they are blocked, the database doesn't contain any value judgment on the merits of the blocking.
(22:05:00) graphiclunarkid: mkillock: Maybe it would be better to say "9001 sites blocked"
(22:05:08) vasilis: [ooniprobe configuration] measurement_timeout: 60 measurement_retries: 2 measurement_concurrency: 10 reporting_timeout: 80 reporting_retries: 3 reporting_concurrency: 15
(22:05:08) dantheta: I don't know where that statement came from, I'm afraid.
(22:05:18) mkillock: Also: "1531111 sites reported" suggests that lots of people have been very busy posting URLs!
(22:05:24) dantheta: I have, yes.
(22:05:36) mkillock: yeah, but programmatically!
(22:05:50) graphiclunarkid: mkillock: dantheta: can we distinguish between sites reported through the front end and those we've monkeyed in ourselves?
(22:05:52) dantheta: I've fed it the alexa 1m, in addition to half a million social URLs that the original author left in the database
(22:05:52) vasilis: I consider this a quite safe configuration
(22:05:57) dantheta: glk: yes
(22:06:17) graphiclunarkid: dantheta: Then maybe we should expose just the former as a statistic on the front page?
(22:06:30) vasilis: just remembered some valuable url lists
(22:06:36) dantheta: Sure. it will drop to about 18 :)
(22:06:48) mkillock: This is accurate, however: "319951 sites tested"
(22:06:54) graphiclunarkid: dantheta: Yeah - but we expect that to increase after we go live ;)
(22:06:54) dantheta: Yes.
(22:06:57) mkillock: how about we change the language
(22:07:08) graphiclunarkid: Maybe something like:
(22:07:19) graphiclunarkid: 18 sites reported
(22:07:25) mkillock: what does the 1531111 actually represent?
(22:07:26) graphiclunarkid: >9000 sites checked
(22:07:48) mkillock: oh, got it
(22:07:52) dantheta: 1531111 urls listed in the database.
(22:07:53) graphiclunarkid: Hmm. Reported isn't the right word though. It implies we've reported them to someone, not people reporting them to us.
(22:07:56) mkillock: yeah the queue
(22:08:07) vasilis: https://github.com/citizenlab/test-lists
(22:08:46) dantheta: mkillock: not quite the same as the queue - the URLs in the database are periodically requeued for retesting
(22:08:56) graphiclunarkid: "Visitors have asked us to check 18 sites so far and our probes have tested 1531111 sites in total."
(22:09:02) mkillock: we can accurately say: "14002 sites blocked" - right?
(22:09:22) dantheta: 14002 out of 319951 sites - definitely accurate
(22:09:32) dantheta: 319951 is the number of URLs we've tested on one or more ISPs
(22:09:42) graphiclunarkid: Ah, OK.
(22:10:55) mkillock: "319951 checked" or checked against some ISPs
(22:10:58) dantheta: 6 mobile networks and 2 fixed line ISPs (talktalk and AA). I don't think the current alexa1m list on bt,plusnet,sky and virgin should count until we're running it against their filters
(22:11:08) graphiclunarkid: So: 18 requests for testing submitted by users; 319951 URLs tested on at least one ISP; 14002 sites blocked by at least one ISP; 1531111 tests performed in total across all ISPs.
(22:11:39) dantheta: all good apart from the last bit "1531111 tests performed in total across all ISPs."
(22:11:59) dantheta: Over time, the 319951 urls will count up to 1531111.
(22:12:18) dantheta: When we've reached the end of the queue, the number of tested urls will equal the number of urls in the database
(22:12:19) mkillock: so it's a bit like a queue
(22:12:55) dantheta: Yes, you are right. Sorry, I was thinking a little too rigidly earlier
(22:12:57) mkillock: I could do some calcs - 1531111-319951 is the number of URLs waiting to be tested?
(22:13:05) graphiclunarkid: So 1531111 is the total number of URLs to be tested, of which 319951 have been tested so far, on at least 1 ISP?
(22:13:22) dantheta: yep - that's right.
(22:14:36) mkillock: For brevity, if it's ok, we could say 1531111 - 319951 'queued' ?
(22:15:06) mkillock: mm
(22:15:10) mkillock: thesaurus
(22:15:17) dantheta: Yes, that'd be accurate. "pending"?
(22:15:24) mkillock: pending!!
(22:15:28) mkillock: excellent
(22:15:50) graphiclunarkid: I think we should say "tests pending" not "URLs pending".
(22:15:58) mkillock: ok
(22:16:10) graphiclunarkid: When do we consider a site to have been tested? When it's been checked on a single ISP or on all of them?
(22:16:11) mkillock: so 319951 sites tested
(22:16:24) mkillock: and a calc for the 'tests pending'
(22:16:36) mkillock: graphiclunarkid: when tested against one?
(22:16:46) mkillock: I think that's what the figure represents
(22:16:59) graphiclunarkid: mkillock: Is that what a visitor to the site would expect it to mean though?
(22:17:04) dantheta: When we have a result from one ISP.
(22:17:19) mkillock: graphiclunarkid: probably not, fair point
(22:17:25) dantheta: Our total number of ISPs changes over time, and we have overlapping result sets from multiple ISPs
(22:17:54) dantheta: It would be very difficult to maintain that figure when we add an ISP, because then all of the existing tests are no longer against a complete set of ISPs
(22:18:07) graphiclunarkid: Maybe "tested" doesn't have meaning unless we're examining results for a single ISP?
(22:18:57) graphiclunarkid: We can say a site is either "tested" or "not tested" for a single ISP but I'm not sure we can say that a URL has been "tested" in general.
(22:19:11) dantheta: We can apply a threshold - when we have results from 1, 2, 3 or more ISPs. The only one that's tricky is trying to use "all" ISPs.
(22:19:18) mkillock: graphiclunarkid: unless we check every ISP, it will never be 'tested' in that sense
(22:19:28) dantheta: I mean every ISP that we've got a connection for
(22:20:18) mkillock: we can be more wordy
(22:20:36) mkillock: "tested with at least one ISP"
(22:20:55) mkillock: ?
(22:21:13) mkillock: saves making API changes
(22:21:47) graphiclunarkid: mkillock: How about "results available for NNN sites"?
(22:21:52) dantheta: I'm happy to make the changes, it's all good :)
(22:22:19) mkillock: graphiclunarkid: sure
(22:22:32) mkillock: I like that
(22:22:48) dantheta: I'm wary of things which cause us to have to re-evaluate all of the old test data, or stats based upon it.
(22:22:49) graphiclunarkid: TBH I'm not sure what it should say - probably best just to pick *something* and then share it around to see if anyone has a bad reaction to it ;)
(22:23:16) mkillock: I think what's above is fine
(22:23:17) mkillock: i.e.
(22:23:24) mkillock: AAA SITES BLOCKED
(22:23:37) mkillock: BBB tests pending
(22:23:52) mkillock: results available for CCC sites
(22:23:57) mkillock: right?
(22:24:14) dantheta: Looks good to me.
(22:24:25) mkillock: graphiclunarkid: ok with you?
(22:24:48) graphiclunarkid: Yeah, great :)
(22:24:56) mkillock: ok, I'll make those changes
(22:25:03) graphiclunarkid: Nice one!
(22:25:07) dantheta: Many thanks!
(22:25:34) mkillock: :)
(22:28:45) dantheta: While we're on the subject of stats, I think it's worth removing some of the tested sites from that counter - they're the baseline test for the unfiltered BT,Sky,Virgin and Plusnet lines
(22:29:08) dantheta: about 275,000 of them, anyway.
(22:29:35) graphiclunarkid: dantheta: Agreed. That data is useful as a baseline for "as installed" results but we really want the answers for the default filters.
(22:29:46) graphiclunarkid: It's interesting that those lines didn't have the filters installed by default though!
(22:30:17) dantheta: I wondered if that was because they were being installed at a business address. TalkTalk arrived with full filtering (above and beyond!)
(22:30:54) dantheta: It blocks open comments, alcohol, tobacco. Even youtube is blocked on there.
(22:31:05) graphiclunarkid: I'm not sure where each ISP is in their roll-out of default-on filters. ISTR they've all announced "active choice" for new connections though.
(22:31:33) graphiclunarkid: It would be worth recording how they were set up initially so we can include that in our written reports.
(22:31:58) dantheta: Yep. Hopefully plett can get the details for the line. The thing is, it ooni has between 12 and 15 days to run to get to 1m. Should we cut the alexa1m down to 250k for speed?
(22:32:02) graphiclunarkid: i.e. TalkTalk: explain it like I'm 5. Virgin: anything goes ;-)
(22:33:18) dantheta: :)
(22:33:42) dantheta: At current rate, ooni should do 250k in about 90 hours
(22:33:43) mkillock: ok - http://stage.blocked.org.uk/
(22:34:31) dantheta: Hmmm, that database query is getting a bit slower, isn't it :).
(22:34:40) mkillock: a bit!
(22:34:48) graphiclunarkid: mkillock: Looks great to me!
(22:35:27) dantheta: Are you caching the numbers your side?
(22:35:34) mkillock: no, but could do?
(22:35:42) dantheta: I'd expected to see them decreasing with each page load
(22:35:44) mkillock: could check every minute or so
(22:35:51) mkillock: oh
(22:36:02) mkillock: well, I'm not caching deliberately
(22:36:04) dantheta: I'm happy for them to be live (in fact, I'd prefer that!)
(22:36:22) vasilis: dantheta: We have more reports for UK from various networks.
(22:37:28) vasilis: Do you think that we could add them as well?
(22:37:50) dantheta: If you've got a list of URLs that are blocked and on what ISPs, I can import those
(22:40:08) vasilis: Hm.. we should first parse the reports..
(22:41:40) vasilis: dantheta: Your script make them ready to be imported in the DB?
(22:42:20) dantheta: I guess so - although we also need to extract from those files which ISP the test was conducted on.
(22:42:30) dantheta: The AS number is in the dump file, isn't it?
(22:42:39) vasilis: Yes
(22:42:45) dantheta: That'll do nicely.
(22:44:40) dantheta: Hmm, that sites tested figure doesn't seem to be increasing in the database. Most odd. I'll look into that.
(22:45:36) graphiclunarkid: dantheta: Slight topic derailment... I'm seeing some strange ssh behaviour on dev-censor-1. Need to consult your superior CentOS knowledge!
(22:45:59) dantheta: Yes, we've been agressively probed by servers in china!
(22:46:43) graphiclunarkid: Not to do with that - I'm having trouble logging in with my SSH key, and so is vasilis.
(22:46:53) graphiclunarkid: If I do this: $ ssh richard@ssh dev-censor-1.default.orgtech.uk0.bigv.io
(22:46:58) graphiclunarkid: I get prompted for a password.
(22:47:11) graphiclunarkid: If I do this: $ ssh dev-censor-1
(22:47:24) graphiclunarkid: I'm logged in straight away without entering a password.
(22:47:26) vasilis: graphiclunarkid: Let's go off IRC?
(22:47:27) mkillock: dantheta: it has changed over longer periods
(22:47:46) graphiclunarkid: Any ideas?
(22:47:49) dantheta: Oh good - that is a relief. Perhaps I was expecting rather too much :)
(22:47:56) dantheta: Yes, check your ~/.ssh/config
(22:48:03) dantheta: See if you've got a host entry in there
(22:48:27) graphiclunarkid: dantheta: I have - and it's the same as the one I typed above (possibly ill-advisedly, vasilis has just reminded me).
(22:52:48) graphiclunarkid: (Conversation about SSH server moved to a private chat)
(22:59:52) mkillock: gonna, call it a night guys, so... night!
(23:00:06) mkillock left the room (quit: Quit: Page closed).
(23:03:41) dantheta: Oops. Cheers mkillock!
(23:04:14) dantheta: OK, so - how many alexa urls shall we do in the standard test?
(23:06:02) vasilis: dantheta: 100k?
(23:06:12) graphiclunarkid: Maybe a balance between time and popularity somewhere?
(23:06:12) dantheta: 100k is good.
(23:06:46) dantheta: 100k sounds good. For the mobile networks, 10k should be fine, since they are quite a bit slower. 10k I can do in a few hours.
(23:07:06) dantheta: Although the limiting factor on mobile broadband is credit
(23:07:36) vasilis: You should consider using "smarter" url lists as well.
(23:07:54) dantheta: Yep, that sounds good. smart > popular, I always say.
(23:08:22) graphiclunarkid: dantheta: ORG has a budget for this, so if you want, I can arrange for you to be reimbursed for the credit.
(23:09:05) graphiclunarkid: dantheta: In any case, I should now have the time to arrange for ORG to buy its own probes, at which point we can join them to the botnet :D
(23:09:11) graphiclunarkid: *mobile probes
(23:09:49) dantheta: That's very kind, but I'm sure it will be fine. For a couple of months, I can manage it :)
(23:10:03) dantheta: It's been quite enjoyable running a mobile broadband lab :)
(23:10:22) graphiclunarkid: dantheta: Heh :)
(23:11:02) dantheta: Plus I know now more than anyone ever needs to about data plan tariffs. None of them are easy.
(23:12:04) dantheta: I'll shut down the PyProbes that are currently running - they've gathered 300,000 results, which is way over the 100k mark. When we've got the filters enabled, I'll run them again.
(23:12:24) graphiclunarkid: dantheta: Sounds like a plan.
(23:12:28) vasilis: dantheta: There are no "unlimited" mobile plans?
(23:12:52) graphiclunarkid: Your knowledge of data plan tarrifs would be very useful to help decide what ORG should buy for this purpose!
(23:13:35) dantheta: hehe!
(23:14:05) graphiclunarkid: dantheta: Seriously - if you have any clear recommendations please let me know!
(23:14:15) dantheta: I'll put together a mail
(23:14:21) graphiclunarkid: Thanks!
(23:14:44) graphiclunarkid: Sorry folks - it's 23:15 here and I have an early flight to catch tomorrow, so I'm gonna have to call it a night I'm afraid.
(23:14:56) graphiclunarkid: I'll be on IRC and email while travelling so drop me a line if you need anything.
(23:15:02) dantheta: I was about to do the same. It's been a good one!
(23:15:18) graphiclunarkid: Yeah - it has :)
(23:15:23) dantheta: Many thanks, once again.
(23:15:35) graphiclunarkid: I'm back in Norway on Tuesday - apologies for any comms delays between now and then.
(23:15:48) graphiclunarkid: See you all soon :)
(23:15:55) dantheta: Seeya, and have a good trip.
(23:15:59) vasilis: see you graphiclunarkid
(23:16:14) dantheta left the room.