ORG-tech-vols IRC meeting log 2014 05 28

(20:32:37) graphiclunarkid: Hey gang - sorry I'm a bit late. I was plumbing in a washing machine 0_o
(20:33:23) ***graphiclunarkid waves to Allll atsampson dantheta _guy_ hellais_ NAMOS plett xarlekino
(20:33:40) dantheta: Greetings!
(20:33:45) dantheta: Sorry, was reading PHP docs
(20:34:56) graphiclunarkid: Well, I guess we both have fun lives then ;-)
(20:35:16) vasilis [~vasilis@gateway/tor-sasl/vasilis] entered the room.
(20:35:20) dantheta: How's it going? Busy day on the github issues!
(20:35:22) graphiclunarkid: Hey vasilis
(20:35:30) vasilis: Hi everyone
(20:35:35) graphiclunarkid: dantheta: Yeah!
(20:36:08) graphiclunarkid: I've spent much of the day improving information on github issues, labelling them up, assigning them to people etc.
(20:36:33) Allll: hi
(20:36:34) graphiclunarkid: jimkillock also added a bunch, most of which seem to be feature requests for future releases, but some we'll have to address now.
(20:36:37) graphiclunarkid: Hi Allll :)
(20:37:14) graphiclunarkid: One thing I was going to ask you all to do is review the board at waffle.io and move things up or down the list depending on how important you think they are.
(20:37:30) Allll: glk: I'm just going through the commits you made to the repo for the logos (now uploaded) and minor text changes - did you copy the text changes into modx? Just checking to see if I need to move anything other than images over
(20:38:00) graphiclunarkid: You can also add and remove issues from the "version 2.0" milestone, as well as changing labels such as "minimum viable product" and "enhancement" if you think I've got those wrong.
(20:38:35) graphiclunarkid: Allll: I haven't updated Modx at all - I thought it would be best to let you manage that since you've done all the content creating there so far and you know what state it's in.
(20:38:55) Allll: cool, in that case I'll have a look through and copy bits over.
(20:39:08) Allll: is there any content signed off that needs doing?
(20:39:19) graphiclunarkid: Allll: How about if I notify you in the github issues when I've checked in content changes so you know which ones are ready to move?
(20:39:42) Allll: sounds good, just assign me the issue and I'll get it done!
(20:39:48) graphiclunarkid: Allll: I don't think there's a formal sign-off process. We will just edit until we think it's pretty good then the ORG staff will make their own changes once it's in the live Modx instance.
(20:39:54) graphiclunarkid: Allll: Will do.
(20:39:57) Allll: cool
(20:40:23) graphiclunarkid: I meant to get some more content written today but there has been something of a firehose of emails and notifications so I've just been treading water really ;)
(20:40:38) graphiclunarkid: I'm working on this tomorrow and Friday too though so expect changes Any Time Now [TM]
(20:41:05) Allll: no worries, I've a few bits that need tidying up tonight and I'll be able to get content done fri night/the weekend
(20:41:08) graphiclunarkid: vasilis: I am literally just about to start raising issues for features in the raspi-config repo. I have a list in a notepad here and am just updating labels etc.
(20:41:23) vasilis: graphiclunarkid: go for it
(20:41:49) vasilis: I was off this week hw issues :\
(20:42:03) graphiclunarkid: vasilis: :(
(20:42:20) vasilis: but everything is up and running again ;)
(20:42:28) vasilis: dantheta: how's going?
(20:42:46) graphiclunarkid: I live in fear of this laptop dying.
(20:43:04) dantheta: glk: World backup was last month, wasn't it? :)
(20:43:10) dantheta: vasilis: Pretty good - was working on comparing ooni's talktalk results to the Org DB ones
(20:43:26) graphiclunarkid: I have good data backups but downtime would be bad news!
(20:44:13) graphiclunarkid: Current emergency plan is to boot my wife's macbook using a TAILS usb stick!!
(20:44:29) vasilis: any time critical tasks due today/tomorrow?
(20:44:56) graphiclunarkid: vasilis: Anything marked as Version 2.0 needs to be done ASAP so if you can help with any of those issues that would be great!
(20:45:23) graphiclunarkid: I think we only have a few critical things to sort out though.
(20:45:33) vasilis: OK
(20:45:51) graphiclunarkid: 1) Filtering settings for networks at A&A not already configured to the defaults (plett emailed me on this earlier I think).
(20:46:13) graphiclunarkid: 2) Content (everything tagged as copywriting; plus the header and footer)
(20:46:26) graphiclunarkid: 3) Migrate to the live server (in progress)
(20:47:27) graphiclunarkid: vasilis: When the RPis arrive at ORG's office we'll need you and dantheta to commission them as probes.
(20:47:41) graphiclunarkid: Hopefully the ansible scripts will help!
(20:48:28) dantheta: The script to drive OONI from the ORG queues is almost done!
(20:48:36) graphiclunarkid: dantheta: Awesome!
(20:48:44) vasilis: I can also provide separate imgs, with (almost) everything preinstalled.
(20:48:58) dantheta: (https://github.com/openrightsgroup/Blocking-Middleware/issues/6)
(20:48:58) graphiclunarkid: vasilis: That might be useful if we are pressed for time.
(20:49:12) vasilis: dantheta: I have some updated ansible script as well, will update the repo..
(20:49:20) dantheta: Cool.
(20:52:18) dantheta: I've done some stuff for hardening API server - firewall, MySQL, denyhosts, a little bit of package uninstallation
(20:52:39) dantheta: Are the API & API DB going to stay on dev-censor-1, or is there a live host for that too?
(20:53:09) vasilis: dantheta: nice!
(20:53:45) graphiclunarkid: dantheta: At the moment they're staying where they are. There is a possible issue with data protection that I raised on the list earlier though. https://github.com/openrightsgroup/cmp-issues/issues/9
(20:53:56) vasilis: graphiclunarkid: we should ask Lee where the API and API DB should stay..
(20:54:19) graphiclunarkid: vasilis: Yeah - it may be that he suggests we re-host them on one of ORG's production servers.
(20:54:39) graphiclunarkid: I think we could leave moving them until after we've launched v2.0 though.
(20:54:52) dantheta: glk: Yeah, I saw that one. If we're feeling uneasy about user data in the API Database, then in the short term we can turn off the API features that accepts and stores it, and rely on reconciling with the ModX DB later.
(20:55:57) dantheta: I've got ansible scripts for recreating the API server (based on the ones I did for the vagrant image), but they aren't fully tested yet.
(20:56:05) graphiclunarkid: dantheta: That's a possibility, though Jon suggested on the mailing list that a sequence of URLs submitted by the same user over a short period of time might constitute personal data, even without an email address or other identifier.
(20:56:46) dantheta: Without an email address, the API can't possibly discern one user from another. They're all submitted using the website's credentials.
(20:56:52) graphiclunarkid: (s/Jon/John/ sorry)
(20:57:14) dantheta: Just in case that's a useful misfeature :)
(20:57:36) graphiclunarkid: But as someone with access to the database, you could figure out which URLs were submitted by the same user session, then use the content of those websites to deduce who it was (potentially).
(20:58:08) graphiclunarkid: Not saying it's a big risk - just something that John highlighted.
(20:58:26) dantheta: Indeed, it's a good call.
(20:58:31) graphiclunarkid: Anyway, I don't know what the solution to that is yet, so we will need to take advice from ORG's board.
(20:59:05) dantheta: I did find a paper produced by a scottish volunteer group discussing options where you've got volunteers and customer/client/PII data
(20:59:29) graphiclunarkid: That would be interesting to pass on if you can find it again. Maybe in reply to the mailing list thread so that John Elliot sees it too.
(20:59:41) graphiclunarkid: (I don't think he's following the issue on github).
(21:00:13) dantheta: OK, will do. I was worried that it was too obviously a google search result :)
(21:00:19) graphiclunarkid: vasilis: What's on the other dev server at the moment?
(21:00:35) graphiclunarkid: dantheta: At least you bothered googling it - which I haven't yet ;-)
(21:01:08) graphiclunarkid: vasilis: I'm wondering whether we could turn dev-censor-1 into our production server and move development to the other one (is it dev-censor-2? I don't have an account on it).
(21:01:55) vasilis: The ooni reports data repository (official mirror) and an ooni backend.
(21:02:10) dantheta: An additional option is to have dev servers provided by vagrant on developers' local machines
(21:02:34) vasilis: graphiclunarkid: I wouldn't consider this a "dev" server!
(21:03:26) dantheta: (just referring above: http://www.volunteeredinburgh.org.uk/organise/GPG_Store/GPG_19_Volunteers_Data_Protection)
(21:03:33) graphiclunarkid: So we have a mixture of development and production stuff on both servers.
(21:03:46) dantheta: A fine tradition :P
(21:04:03) graphiclunarkid: dantheta: Vagrant is a good idea.
(21:04:30) dantheta: I can get the image updated with the newest database, I have to admit that I haven't republished the original image as often as I should have!
(21:05:47) graphiclunarkid: dantheta: In the short-term, if we have to lock people out of dev-censor-1, at least dev could continue via vagrant in the mean-time. We can set up git with a post-receive hook to update the API code on the server without allowing shell access so that we can push changes to the live environment.
(21:06:34) graphiclunarkid: vasilis: Is there any personally-identifying data stored in the oonib or ooni reports repo?
(21:07:37) dantheta: Hmmm, it's a tough call that one. The first week of live operation is not the best time to have the admins unable to access, unless Lee can do it all week. Not intending to scaremonger, it's just the sort of thing that worries me about newly live systems
(21:08:15) graphiclunarkid: dantheta: I agree. I don't want to do it if we don't have to. On the other hand ORG needs to eat its own dog-food when it comes to personal data and privacy.
(21:08:39) dantheta: This was VCE's opinion: "Volunteers are bound by the organisation’s confidentiality requirements and this should be stated in the volunteering policy, if you have a separate confidentiality policy this should be mentioned in the volunteering policy too. A confidentiality statement should also form part of the volunteer agreement. "
(21:08:47) graphiclunarkid: An alternative might be that we ask people with shell access to sign an agreement about how they will treat personally-identifying data.
(21:08:58) graphiclunarkid: Yeah, basically that.
(21:09:16) vasilis: graphiclunarkid: The OONI data reports by default report the AS of the probe.
(21:09:18) graphiclunarkid: Would people be happy to do that? vasilis, Allll, dantheta?
(21:09:18) dantheta: That's pretty much what I thought - I'm happy to sign up
(21:09:26) dantheta: Yep
(21:09:37) graphiclunarkid: vasilis: AS?
(21:09:56) vasilis: Autonomous System
(21:10:07) graphiclunarkid: Sorry, I don't know what that means.
(21:10:17) dantheta: Corresponds to a range of IP addresses belonging to an organization
(21:10:24) graphiclunarkid: Ah, I see.
(21:10:51) graphiclunarkid: So that would be like the ISP detection NAMOS is using in the android probe?
(21:11:30) vasilis: hm.. I'm not really awared of what is NAMOS
(21:11:56) graphiclunarkid: I guess that's not very personally identifying - unless you're the only person using that ISP and contributing tests (I bet I'm the only one using NetCom - or worse, Hughes telecom, from the plane back to Norway the other day!)
(21:11:56) dantheta: @NetworkString :)
(21:12:18) graphiclunarkid: NAMOS is Gareth :)
(21:12:33) graphiclunarkid: He's lurking - he's travelling ATM but will catch up with the scrollback when he arrives.
(21:12:35) dantheta: I'm not 100% sure, but I think the android probe gets its network identification from the API, which does do Autonomous System lookups
(21:12:50) dantheta: It was a way better option that whois :)
(21:12:58) graphiclunarkid: dantheta: That's very cool!
(21:13:21) graphiclunarkid: It occurred to me earlier that we should centralise it and provide it through the ISP - but I never got round to making the suggestion. Nice to see you're way ahead of me!
(21:13:22) vasilis: In any case at the moment the ooni backend is being used only by us (mostly on AA servers).
(21:13:26) dantheta: Which reminds me, it might be nice to give a shout out to http://www.team-cymru.org/, who provide the AS lookup API
(21:13:30) Allll: fine with me
(21:13:40) Allll: fine with me - sorry was looking at github
(21:13:54) graphiclunarkid: dantheta: Excellent idea. Maybe add it to the github ticket for crediting people?
(21:14:14) dantheta: Yep, will do.
(21:14:16) graphiclunarkid: Allll: No worries - I saw the emails coming in as you were working away in the background there :)
(21:14:37) graphiclunarkid: Actually I wanted to turn this meeting into more of a collaborative hacking session anyway.
(21:14:45) graphiclunarkid: Since we're so close to finishing v2.0 and all that.
(21:15:06) graphiclunarkid: Is there anything in particular that people are working on right now where they could use some help or need to bounce ideas around?
(21:15:46) dantheta: I did see a couple of tickets regarding the difference between submitting a site for checking, and looking up historical checks on a site.
(21:16:37) dantheta: The API call for searching URL history was modified to give a nice result for the just-submitted usecase, so I was wondering if it would be worth creating an additional endpoint for getting more history about a URL?
(21:16:58) graphiclunarkid: dantheta: Yeah - there was some discussion of this early in the project.
(21:17:15) graphiclunarkid: I'm not sure we need both calls to be separate now though.
(21:17:45) graphiclunarkid: If we have a single landing page, and when people submit a URL it both checks again and returns any historical results, that would cover both use-cases.
(21:17:56) dantheta: The just-submitted page won't show any historical information - only the most recent status and the date of the last "blocked" status. People wouldn't be able to see a site get blocked and unblocked
(21:18:11) dantheta: Well, unless they looked at exactly the right time :)
(21:18:25) graphiclunarkid: Ah, I see.
(21:18:26) Allll: I think Matt mentioned that we could reuse the same code to pull historical results
(21:18:41) dantheta: Perhaps that's more a use case for the analytically inclined (who can get a CSV if they want one)
(21:18:49) graphiclunarkid: Then maybe there should be a "more detail" button on the just-submitted page that fetches the full history?
(21:19:10) graphiclunarkid: So a "download raw data" button on the page that shows the full history?
(21:19:53) dantheta: That might be useful for some people, but I think on-screen presentation might have more impact for the regular user.
(21:20:45) dantheta: I think a "more detail" link to a longer historical view would be cool, but it's something that could be considered the future, beyond go-live
(21:22:57) dantheta: unless there's a case for adding it sooner, which I'm happy to do. All good, really.
(21:25:03) graphiclunarkid: dantheta: I agree it's not MVP material. Maybe raise an issue for it so we can come back to it in a future version?
(21:25:13) dantheta: Sure, that's cool.
(21:27:07) graphiclunarkid: Ok - I've got something else to run past you folks: https://github.com/openrightsgroup/cmp-issues/issues/2
(21:27:18) vasilis: In any case the ooni reports will be public as well.
(21:29:09) graphiclunarkid: vasilis: Is there a case for including in the raspi config tool, or the installer, a message to the user describing what data is collected and warning that it will be made public? Then people can choose whether to participate on the basis of how comfortable they are sharing that data.
(21:31:10) vasilis: graphiclunarkid: could you please raise an issue?
(21:31:31) vasilis: ooniprobe displays a warning on every run
(21:31:31) graphiclunarkid: Yep. Against the config-script repo or lepidopter?
(21:31:53) vasilis: graphiclunarkid: any
(21:31:56) graphiclunarkid: ok
(21:32:01) ***graphiclunarkid adds it to the list.
(21:33:12) graphiclunarkid: So with reference to https://github.com/openrightsgroup/cmp-issues/issues/2, how do people feel about emails from the blocked project being signed by me, and coming from an address like blocked at org dot org?
(21:33:50) graphiclunarkid: Apparently naming a person from whom the emails come would improve engagement - but I'm nervous about appearing to be taking credit for all your hard work!!
(21:34:56) dantheta: It's cool with me :)
(21:36:08) vasilis: Yep not an issue ;)
(21:37:55) graphiclunarkid: I'll probably arrange for the signature to be something like...
(21:38:14) graphiclunarkid: Richard King, project coordinator (on behalf of the blocked.org.uk team)
(21:39:12) mkillock [~matt@78-32-160-53.static.enta.net] entered the room.
(21:39:45) graphiclunarkid: Hey mkillock :)
(21:39:50) mkillock: hey!
(21:40:03) mkillock: busy!?
(21:40:26) graphiclunarkid: Just about keeping ahead of the github firehose ;-)
(21:41:14) graphiclunarkid: It's seriously awesome to see so much happening though - soooo close to getting everything working for v2.0 too :D
(21:42:02) graphiclunarkid: mkillock: Since we're hacking as well as talking, is there anything you're doing right now that we can help with, or that you're blocked on?
(21:42:06) mkillock: good stuff!
(21:42:07) graphiclunarkid: (no pun intended!)
(21:42:15) mkillock: hehe
(21:42:25) mkillock: umm, looking at this http:// issue right now
(21:42:39) mkillock: is webal on here?
(21:42:42) dantheta: Yep, I'm looking at that one on the backend too
(21:42:43) graphiclunarkid: mkillock: Yeah, thanks for that.
(21:42:53) graphiclunarkid: mkillock: He's on as Allll, I think.
(21:43:01) mkillock: right, yes, of course
(21:43:25) mkillock: well it worked on staging by just removing type="url" from the input tag
(21:43:31) mkillock: from here: <input id="4DRXNE97LE" class="form-control" type="url" name="4DRXNE97LE" placeholder="http://%22 value="" />
(21:43:54) graphiclunarkid: Right, yell my nick if you need me - I'm going to go raise some github issues for vasilis.
(21:43:59) ***graphiclunarkid attention elsewhere.
(21:44:03) Allll: yeah here
(21:44:05) mkillock: now - I don't know if it's wanted for a good reason
(21:44:09) mkillock: hi Allll
(21:44:13) mkillock: ok graphiclunarkid
(21:44:19) Allll: hi
(21:44:25) ***graphiclunarkid AFK 5 mins
(21:44:43) mkillock: ok, so this type=url thing
(21:44:44) Allll: I've change it to type='text' now so the browser won't try to validate it
(21:45:25) mkillock: oh, ok... perhaps I'm looking at a varnished version of the page
(21:45:35) mkillock: did a ctrl-u and saw the above
(21:46:04) Allll: for a moment I thought I'd forgotten to hit save :O
(21:46:37) mkillock: I've seen something like this before with varnish
(21:46:47) mkillock: let's change the length of the html page
(21:48:15) mkillock: it's still type="url" in manager
(21:48:22) mkillock: <input id="4DRXNE97LE" class="form-control" type="url" name="4DRXNE97LE" placeholder="http://%22 value="!+fi.4DRXNE97LE" />
(21:48:24) Allll: weird!
(21:48:30) Allll: on staging?
(21:48:41) mkillock: well, on new.blocked.org
(21:48:47) mkillock: ah...
(21:48:56) Allll: ah, I'm on staging - which should I be using now?
(21:49:16) mkillock: good question!
(21:49:27) mkillock: I can confirm it's changed on staging!
(21:49:34) mkillock: and yeah, let's close that
(21:49:41) mkillock: how do we ping graphiclunarkid ?
(21:49:42) Allll: cool
(21:49:51) mkillock: let's ask him about which we're working on
(21:49:57) Allll: not sure, say his name I think graphiclunarkid!
(21:50:06) Allll: haven't used IRC since the nineties
(21:50:36) mkillock: oh away from keyboard for 5 mins he said
(21:50:58) ***graphiclunarkid back
(21:50:59) dantheta: If you put a URL parameter on the end of the page, you should get a fresh version through varnish
(21:51:13) Allll: guess he'll be back, I'll hold off working on server till we find out, I was going to have a crack at some HTML on repo anyway
(21:51:25) mkillock: there's no varnish problem, as far as I can tell, dantheta
(21:51:26) graphiclunarkid: Yeah, you just say graphiclunarkid to get my attention. Tab complete is your friend!
(21:51:55) graphiclunarkid: If you say it five times while standing in front of a mirror I appear behind you and stab you to death with a github issue though, so be careful ;)
(21:52:03) Allll: :O
(21:52:04) dantheta: Sorry, was reading from the top of the page, oops!
(21:52:04) mkillock: ok! So gr-tab graphiclunarkid , should we be editing new.blocked or stage.blocked?
(21:52:22) mkillock: no worries, thanks for taking an interest, dantheta
(21:53:12) graphiclunarkid: So... today jimkillock copied the existing site at stage.blocked.org.uk over to ORG's main Modx instance
(21:53:31) graphiclunarkid: He pointed the subdomain new.blocked.org.uk at this site.
(21:54:22) graphiclunarkid: I asked him to send a message to the mailing list detailing the process for performing updates to new.blocked.org.uk since we have not yet finished development, but I don't think he did, nor did he suggest to me how this should work.
(21:54:30) graphiclunarkid: Therefore we're making this up as we go along.
(21:54:36) mkillock: ha ha, ok!
(21:54:50) mkillock: well it doesn't look complete on new.blocked.org.uk anyway
(21:54:58) mkillock: and there is a cookie issue apparently
(21:55:07) Allll: so we'll stick with editing on stage.blocked?
(21:55:16) graphiclunarkid: mkillock: Quite. I think he did what he could and then threw it back over to us when it proved not to be working.
(21:55:30) mkillock: it's working now
(21:55:36) graphiclunarkid: Let's see what everyone thinks. There are two options:
(21:55:58) Allll: also I updated the ORG logos in the assets folder this evening so they'll need to be updated on new.blocked - I did mention it in the issue as well
(21:56:19) graphiclunarkid: 1) Continue to develop on stage.blocked - this would mean that server is always the latest development version but we still have work to do to get it working on new.blocked.
(21:57:00) graphiclunarkid: 2) Switch to developing on new.blocked - this would mean stage.blocked falls behind the development curve which will probably cause us a problem once we've gone live. It would mean we'll get the live version working quicker though.
(21:57:21) graphiclunarkid: My vote is for (1) since the short-term speed gain will be far less than the long-term dev problems in my experience!
(21:57:28) Allll: how tricky is it to do a database dump from one to another?
(21:57:39) dantheta: Really easy
(21:57:44) graphiclunarkid: We should also write down what we had to do to move from stage to new so that we can repeat it in future.
(21:57:45) Allll: I'd agree with 1
(21:58:09) graphiclunarkid: And figure out why there's a difference between the two environments, then change stage.blocked to match new.blocked, if possible.
(21:58:12) mkillock: I'm happy to fit in with y'all
(21:58:26) mkillock: since I have only made a minor contribution!
(21:58:59) graphiclunarkid: mkillock: Stop being so modest ;)
(21:59:23) mkillock: too kind ;)
(21:59:27) Allll: you saved me hours of banging my head against the desk getting modx working!
(21:59:27) graphiclunarkid: OK, let's continue working on stage.blocked.org.uk then.
(21:59:31) Allll: cool
(21:59:51) graphiclunarkid: The migration can remain a separate issue. At least then we'll have one working version that's up to date!
(22:01:06) mkillock: I was encouraged that most of the home page worked without much fuss - it's only this cookie issue with the main site that caused a problem for one snippet function that we don't use anywhere else
(22:01:16) graphiclunarkid: I'd love to be able to automate more of this using git (or something else if necessary). Modx just doesn't seem to work in a way that supports that easily, though, being as it stores lots in the database.
(22:01:35) graphiclunarkid: mkillock: That's pretty good going, then, all things considered.
(22:01:57) graphiclunarkid: We might be able to replicate the cookie issue on dev if that Modx instance were serving more than one domain...
(22:02:15) graphiclunarkid: (That's one obvious difference between the two environments).
(22:02:29) Allll: glk: did you have a couple of the personal stories I can use to mock up the personal stories page? or at least an idea of how long they are (1/2few/many paragraphs) & what information they have - name, position, etc
(22:03:11) Allll: yeah I'm not a fan of it saving to db there doesn't seem to be any history we could easily overwrite each other
(22:03:15) graphiclunarkid: Allll: I'm working on that at the moment. We're sending an email out to people who have submitted blocked sites previously asking them to contribute stories.
(22:03:26) Allll: cool - I'll make it up then
(22:04:22) mkillock: graphiclunarkid: modx can be migrated by copying the directory structure wholesale, and then modifying some config.inc.php files
(22:04:27) graphiclunarkid: Allll: Try mocking up something like a small photo (headshot); a header including the name of the site that was blocked, a link to the site, the networks on which it was blocked and the date of the check; and about three paragraphs of text.
(22:04:39) mkillock: graphiclunarkid: so whether that can be done with git somehow
(22:04:46) graphiclunarkid: mkillock: Should we be keeping the whole directory structure in git then?
(22:04:57) graphiclunarkid: Less the config.inc.php files, of course.
(22:05:08) mkillock: graphiclunarkid: well, we could just keep the bits we need to merge
(22:05:21) graphiclunarkid: Allll: What's the github ticket for this - I can update it with that outline if that would be useful.
(22:05:27) Allll: you'd have to be careful only non sensitive tables were committed to git, but guess a script could easily pull them out and ignore the ones with personal info in
(22:05:44) Allll: https://github.com/openrightsgroup/blocked-org-uk/issues/23
(22:05:52) graphiclunarkid: We could use a private git repo if there was a risk of private things leaking to github etc.
(22:07:11) mkillock: graphiclunarkid: I have some concerns about how we correctly merge directories from a single site modx instance to a multi site instance
(22:07:26) mkillock: though I think that's all handled in the database
(22:07:39) mkillock: which - actually brings another issue
(22:08:07) graphiclunarkid: mkillock: Hmm, ok.
(22:08:08) mkillock: the page ids are unique across all sites, so for instance home page is ID 1 on staging but 7000-odd in live
(22:08:21) mkillock: and page ids are defo in the database
(22:08:27) graphiclunarkid: Yeah, we will need to update the references on the live server, which is a bit of a pain. They don't seem to be portable.
(22:09:10) Allll: could we switch staging to use ids from 800x onwards and I can go through and amend the links when I'm putting content in
(22:09:39) mkillock: It's possible to add lines to the modx content table directly, without having to manually copy/paste
(22:09:51) mkillock: I've done this with wordpress imports
(22:10:05) mkillock: basically, just find the max(id)
(22:10:11) mkillock: and increment from there
(22:10:47) mkillock: though this probably isn't all that compitable with github - just a guess? maybe you can
(22:11:36) dantheta: It gets messy later on - unless you set the sequence number using ALTER TABLE, the auto_inc will run into a used value sooner or later.
(22:12:02) dantheta: Then you get a unique constraint violation when adding a row
(22:12:44) mkillock: dantheta: is that if you start at 8000 or something?
(22:13:07) mkillock: I don't remember having that problem by just adding rows
(22:13:11) Allll: apparently you can do something like:
(22:13:12) Allll: ALTER TABLE tbl AUTO_INCREMENT = 1000;
(22:13:32) mkillock: perhaps I just added rows, and let mysql increment the id value
(22:13:51) Allll: I know when you create a table you can specify the auto inc value, so we could take a dump of the table, drop it, manually edit ids in SQL and set auto id then import table
(22:14:34) Allll: When I've done it in the past I thought auto inc started from the highest existing number, but it was a long time back so could be completely wrong
(22:16:01) dantheta: Just tried it, and it works the way you described. Perhaps I'm thinking too much of replicated setups. Ignore me!
(22:16:29) mkillock: well, I guess it all depends on whether we want to automate this process or not. With a dozen pages, I wonder if it's worthwhile
(22:17:51) Allll: probably as easy to do manually
(22:18:27) ajb44 [~chatzilla@130.43.175.22] entered the room.
(22:20:53) graphiclunarkid: mkillock: I don't think it matters as long as we can repeat the process easily so that we can release updates.
(22:21:21) graphiclunarkid: So I guess that means the process needs to be either automated or documented. Preferably both ;-)
(22:21:22) dantheta: I've just added protocol handling (http:// prefixing) to the backend
(22:21:48) graphiclunarkid: dantheta: Is testing just a matter of bunging a url into the site without the prefix and seeing what happens?
(22:22:43) dantheta: Yep! If you enter example.com, it should add http://example.com to the queue and return previous results for http://example.com.
(22:22:53) dantheta: It doesn't do anything with www. vs bare domain ATM.
(22:23:18) dantheta: I need to remove all the error results for bare hostnames, since pyprobe didn't handle those and returned an error result.
(22:23:40) dantheta: Not that anyone will ever see them again (mwahaha)
(22:24:45) graphiclunarkid: OK, I entered bugseverywhere.org since it was to hand (fairly sure it's not in the Alexa top 1M!)
(22:24:51) graphiclunarkid: I got this:
(22:24:57) graphiclunarkid: Thank you for submitting bugseverywhere.org for checking. This request has been queued by our testing suite, more results maybe shown if you recheck the site in the future as our probes complete. If you provided an email you will receive a response by email shortly.
(22:25:00) graphiclunarkid: And this:
(22:25:02) graphiclunarkid: We have no previous test results for http://bugseverywhere.org, if this is the first time that the site has been submitted it can take some time for our system to check it on the various ISPs.Check back shortly and you should start to see the results as they filter through.If you entered your email we will send you a notification.
(22:25:18) graphiclunarkid: Note: http:// prefix appears in the second block of text but not the first.
(22:25:30) graphiclunarkid: That's on http://stage.blocked.org.uk/thankyou.html
(22:25:32) mkillock: ah
(22:25:34) mkillock: I can explain that
(22:25:44) mkillock: the process on the frontend is
(22:26:24) mkillock: fill in form -> click submit -> Submit URL (no http checking) -> redirect to thanks page -> GetURLHistory (has http checking)
(22:26:32) mkillock: -> display results
(22:26:47) mkillock: I'll add the same check to the submit url snippet
(22:27:23) dantheta: There's also something strange in the backend code
(22:27:33) dantheta: I think I was a little ambitious with some logging
(22:28:39) graphiclunarkid: mkillock: Cool. I think we should be consistent, I don't particularly mind which way round it is, though I think it looks cleaner without the prefix ;-)
(22:29:03) graphiclunarkid: mkillock: Possibly perfect would be to never display http:// as a prefix but to always display if it's https:// - but that's just me.
(22:29:31) graphiclunarkid: (I think that's how Chromium displays urls in the omnibar these days...)
(22:29:37) mkillock: graphiclunarkid: that can be arranged, but at the moment what is typed in is what is submitted and displayed
(22:29:57) graphiclunarkid: mkillock: Sure - that's fine.
(22:30:52) graphiclunarkid: mkillock: My suggestion is hardly MVP material ;-)
(22:31:58) dantheta: OK, the probing part is mostly fixed, but I can only see previous results when I prefix http:// when submitting from test.html
(22:32:31) graphiclunarkid: I just got a bit of valuable feedback from my wife BTW: when a visitor hits thankyou.html, the "Blocked!" title in the header looks like the result of the check, not the name of the site.
(22:32:38) graphiclunarkid: That's probably not the effect we were after!
(22:33:04) graphiclunarkid: Entirely my bad since I changed the name.
(22:33:43) Allll: I was thinking of having a crack at a logo rather than just text once I'm done with the other bits - that might help the wife out :D
(22:34:07) graphiclunarkid: I have suggested to NAMOS that the android app should use Blocked! as its header and "Censorship Monitoring Project" as a sub-title. Perhaps both should use blocked.org.uk as the header and "Censorship Monitoring Project" as a sub-title.
(22:34:29) graphiclunarkid: Allll: A logo would be >9000 times more awesome, of course, so go ahead if you're feeling inspired!
(22:34:39) Allll: cool, got a couple of ideas
(22:34:50) mkillock: graphiclunarkid: dantheta: I've changed the submission process to prepend http:// when it's not present in the form submission
(22:35:03) graphiclunarkid: I might make that small change in the mean-time, though, just as a backstop.
(22:35:08) dantheta: OK, that's cool.
(22:35:14) mkillock: so now if you type www.example.com, http://www.example.com is submitted to the API
(22:35:20) ***graphiclunarkid goes to test mkillock's changes.
(22:35:45) mkillock: and the gethistory snippet will always be using the http:// prepended version
(22:35:53) graphiclunarkid: mkillock: Perfect :)
(22:35:58) graphiclunarkid: Works for me.
(22:36:13) dantheta: Yep, checks out for me too.
(22:36:27) dantheta: I'm only seeing results for T-Mobile though (http://bugseverywhere.com)
(22:36:41) mkillock: oh
(22:36:51) graphiclunarkid: dantheta: It was a new URL when I submitted it. No previous results.
(22:36:53) mkillock: is there more? I can switch on debugging
(22:38:09) dantheta: Only t-mobile in the database - don't worry, it's a problem my end!
(22:38:29) mkillock: ok! Can I ask about the stats on the homepage dantheta ?
(22:38:42) graphiclunarkid: I'm still getting zero results for http://bugseverywhere.org (not .com, note!)
(22:39:12) graphiclunarkid: Is there some caching somewhere that I need to deal with?
(22:39:35) dantheta: There shouldn't be any caching, but I am currently mucking about with URL normalization, which might be upsetting it
(22:39:40) mkillock: dantheta: I'm wondering whether the tests pending figure is changing as you expect it to?
(22:39:56) mkillock: the sites blocked figure keeps going up
(22:40:05) mkillock: it was 14002 last week
(22:40:11) NetworkString [~NetworkSt@43.80.187.81.in-addr.arpa] entered the room.
(22:40:21) dantheta: That's pretty much expected -
(22:40:24) dantheta: sorry, back in a minute!
(22:40:31) graphiclunarkid: Hey NetworkString :)
(22:40:38) NetworkString: hello
(22:41:25) graphiclunarkid: NetworkString: Sorry for filling your inbox up with github issues this afternoon!
(22:41:50) NetworkString: No worries, the more bugs / issues we find now the better :)
(22:42:43) graphiclunarkid: NetworkString: Indeed. Most of them were suggestions for enhancements but I did have one bug to do with automatic URL polling not working. Possibly just for me though.
(22:44:56) mkillock: graphiclunarkid: general question - the check a site page: do we want a single field form with URL: which can be used to get historical data?
(22:45:12) NetworkString: yeah, I just replied to that one, not quite sure whats going on, will need make sure all code paths eventually call onProbeFinish()
(22:45:15) dantheta: Aha! We broke the results recorder!
(22:45:30) mkillock: dantheta: oh?
(22:46:46) dantheta: Yeah - I mistakenly re-used a variable in the submit/url procedure, which passed a PHP dictionary in place of a URL string, which got json encoded, sent to a probe, sent back, and failed to decode throwing an exception in the decoder, which died enough times that supervisor stopped restarting it
(22:47:17) dantheta: I've kicked the bad entries off the queue, and it's all happy now. You should get results when you next search for bugseverywhere.
(22:48:22) dantheta: Excellent, it's fine now.
(22:48:43) mkillock: good!
(22:49:06) mkillock: so umm, my very minor question, dantheta ... should the tests pending figure be decreasing faster?
(22:49:53) graphiclunarkid: mkillock: I think we might want to do a review of pages and their purpose. Historically we envisaged two modes of operation: checking a new site and getting historical results for a URL. With the way the site's now working, though, I'm not sure we need to separate these out. If we do still want to, we should probably rename "check a site" to "check blocking history" or something.
(22:50:00) ***graphiclunarkid AFK 1 minute.
(22:50:08) dantheta: mkillock: Probably not - newly added URLs will get processed within a few seconds, but at the moment only the android probes are eating at the backlog, and they do ~1 site per minute per probe
(22:50:58) dantheta: Mainly because the mobile pyprobes are running on PAYG credit, and if we enabled them on the A&A pyprobes they'd all be gone in 8 hours, but only show results for talktalk & AAISP.
(22:51:17) dantheta: I'm waiting for BT, Sky and the others before letting the pyprobes do the public backlog
(22:52:11) ajb44 left the room (quit: Quit: ChatZilla 0.9.90.1 [Firefox 29.0.1/20140506152807]).
(22:52:43) ***graphiclunarkid back
(22:53:04) dantheta: Hope nobody minds, I'm going to take a short break. My SO hasn't seen me all evening!
(22:53:14) mkillock: ah! ok, and newly added URLs still get tested by all the available probes in due course
(22:53:27) dantheta: Yep, newly added ones go to the front of the queue
(22:53:30) graphiclunarkid: dantheta: No problem! See you in a bit :)
(22:53:34) dantheta: Then we've got the social backlog
(22:53:39) dantheta: Back in a bit!
(22:53:46) mkillock: bye for now!
(22:55:13) mkillock: graphiclunarkid: ok, well I guess a history check page isn't required, but perhaps nice to have
(22:56:37) mkillock: perhaps someone might want to check history/progress but feel reluctant to re-submit on the home page
(22:56:37) graphiclunarkid: mkillock: Yeah - I'm conflicted between providing the extra facility and making the site just slightly less simple for people to understand.
(22:56:57) NetworkString: BTW a thought I had on the way home; What do we do if the private companies who may not take nicely to us exposing their commercial property (block lists) public sue us or block the reporting endpoint?
(22:57:14) graphiclunarkid: mkillock: Perhaps we could put a check-box on the front-page form, set by default, that if cleared would skip submitting the URL to the checking queue?
(22:57:38) mkillock: graphiclunarkid: yeah... that could work! :)
(22:57:39) Allll: I think it's worth making it clear to people they can check history, in the future at least, I did try a few variations of the form in older commits adding a button 'view results for site' next to the submit site for checking button to make it clear, even though it went the same way but it never seemed great
(22:58:03) graphiclunarkid: NetworkString: My opinion is that we (1) open a bottle of champagne; (2) let ORG's press and legal teams have a field day ;)
(22:58:22) NetworkString: lol :)
(22:59:20) graphiclunarkid: NetworkString: It's a concern we should consider again when we add features to allow bulk data-download.
(23:00:05) graphiclunarkid: NetworkString: I think it's unlikely to be a problem while people can only check one URL at a time though - even if they could also run a script against the API to grab data in bulk.
(23:00:18) mkillock: graphiclunarkid: Allll: How about a link to a hidden page that can be used to check progress that is revealed in the email?
(23:00:51) mkillock: i.e. the hidden page is a simple form for entering a url which they can bookmark or keep the email
(23:01:03) graphiclunarkid: mkillock: If that's to the full history then it could be a useful additional feature.
(23:01:39) mkillock: well, the full history isn't available from the API yet
(23:01:49) graphiclunarkid: mkillock: possibly related: https://github.com/openrightsgroup/cmp-issues/issues/6
(23:01:49) Allll: would be worth having a direct link, but we may be able to kill two birds with one stone if the 'history' page accepts a GET var of the url as this could be used by a form to check the history of a site
(23:02:29) graphiclunarkid: Allll: mkillock: We could actually make displaying the last result the default. How about a process like this....
(23:02:54) mkillock: Allll: That's easy to add, though do we want the text to be say something different?
(23:03:15) NetworkString: graphiclunarkid in regards to changing the name to blocked thats done for the launcher icon and action bar etc. - However for the notification bar it's a bit strange to have "Blocked! - Last URL wasn't blocked"
(23:03:26) NetworkString: What should go in the title bar?
(23:03:36) mkillock: I think the suggestion was blocked.org.uk
(23:03:40) NetworkString: Just the status e.g. "Waiting" / "Requesting URL"
(23:03:45) graphiclunarkid: Visit homepage > submit URL > if there are no existing results site behaves as it does now > if there are existing results site displays them alongside a "schedule a new test" button > new test is only scheduled if the button on the second page is clicked.
(23:04:25) graphiclunarkid: NetworkString: heh - we had that discussion a little earlier too. My wife pointed out that exact same issue on the website!
(23:04:30) Allll: matt: different text might be good, but we could prob come up with something generic, or add an extra var to show/hide any different text
(23:04:48) graphiclunarkid: NetworkString: I suggested we use blocked.org.uk as the title and "censorship monitoring project" as the sub-title.
(23:05:13) mkillock: graphiclunarkid: ah ok, so we still present one form on the site for submitting URLs
(23:05:20) NetworkString: OK
(23:05:35) mkillock: Allll: what do you reckon to graphiclunarkid's idea?
(23:06:06) graphiclunarkid: mkillock: Yes. Just do different things depending on (a) whether we already have results, or alternatively, (b) the state of a checkbox on the form.
(23:06:31) graphiclunarkid: mkillock: In that way we remove the need for a separate history page and form while retaining the feature to check previous results without scheduling a new test.
(23:07:10) graphiclunarkid: NetworkString: We could actually use your "censor census" name for the project if others thought that would be OK. I quite like it :)
(23:07:16) Allll: matt: glk: could do that, although if someone is submitting a site I would expect them to be worried it is currently blocked so prob worth pushing for a new test anyway
(23:08:12) Allll: I thought the system would automatically show the latest result for each ISP that has been test
(23:08:18) Allll: ed
(23:08:18) graphiclunarkid: Allll: There would be no way of seeing the blocking history of a URL without also commissioning a new test if we did that, though, unless we set up a separate form for that.
(23:09:32) graphiclunarkid: NetworkString: Using blocked.org.uk as the name of the app and the site title has the advantage that people will always know where to go to see info about the project, I guess.
(23:09:34) mkillock: graphiclunarkid: Allll: is there significant cost to resubmitting a url for testing?
(23:09:50) mkillock: I mean to the probes
(23:10:19) Allll: ah, think I misunderstood - it's good to have a 'new test' button if they were deliberatally looking at historical results - I thought you meant if someone filled out the main form, clicked 'check site' it wouldn't submit the site for testing
(23:11:48) graphiclunarkid: Allll: That is what I meant - but only if there were historical results to display. Otherwise we'd show the results of the last test and a "check again now" button. However perhaps you're right that the majority use-case will be people wanting to schedule a new test.
(23:12:44) mkillock: Seems to me that the current form is fine as long as there is no cost to always resubmitting
(23:12:46) graphiclunarkid: Allll: In which case we could maybe have two different submission modes for the form: the default being to behave as it does now, and an option to skip the scheduling of a fresh test and show only historical results.
(23:13:05) Allll: To be honest if people are checking a site's hostorical data there's probably no harm submitting it again for checking anyway
(23:13:28) graphiclunarkid: mkillock: My only worry is that if this gets popular we might find e.g. google.com being scheduled for testing every few seconds - which is obviously excessive.
(23:14:11) graphiclunarkid: Is there any kind of rate-limiting on the frequency at which a particular URL can be queued for testing? (Might need dantheta to answer this one).
(23:14:16) Allll: could we build into the backend not to resubmit sites that were checked less than xx minutes ago or are already in a queue to be tested
(23:14:42) graphiclunarkid: Allll: Indeed. Maybe skip the rescheduling if the site has been checked in the last n minutes?
(23:14:55) mkillock: graphiclunarkid: we can do that frontend or backend - basically need to check for results and if the most recent is only an hour ago then forget it
(23:15:59) graphiclunarkid: mkillock: Yeah. Could even say something like "This site was checked recently so we have not scheduled another check. If you believe its state should have changed and wish to force a re-check click here..." if we wanted to cover all bases.
(23:17:34) mkillock: graphiclunarkid: yes I suppose it would be nice to be able to check whether TalkTalk claim a block has been lifted
(23:17:36) Allll: sounds good
(23:18:59) mkillock: I mean check the truth of a claim from a ISP's representative
(23:19:11) mkillock: ok
(23:19:18) graphiclunarkid: So I would suggest we remove the "check a site" page for now, and rely on the existing history feature of the front-page form for the MVP, building the rate-limiting refinement into a future update (especially if we find we're getting hammered, doubly especially once we've got the mobile probes running as they have limited bandwidth).
(23:19:47) ***dantheta catches up
(23:20:01) Allll: ok
(23:20:43) Allll: I've unpublished it on staging
(23:22:57) dantheta: Allll: graphiclunarkid: At the moment, every submission of a URL results in a test being run on one probe per ISP, plus the android clients
(23:23:43) dantheta: rate limiting is easy enough to do though.
(23:24:16) graphiclunarkid: dantheta: I think we should consider it just in case we get popular. I can foresee e.g. google.com being submitted quite often!
(23:24:24) dantheta: It would be better if the frontend could know whether it was genuinely submitting a URL vs. getting history
(23:24:52) graphiclunarkid: dantheta: Maybe we should extend the submission endpoint to return a status code that tells the front end whether the URL was queued or rate limited?
(23:24:59) dantheta: and only make API calls as necessary
(23:25:22) graphiclunarkid: That would let us do the rate-limiting in the back-end so it would work for all clients while still letting each front-end tell the user what happened.
(23:25:56) dantheta: Yep, we can do that!
(23:26:11) dantheta: I'll add an issue for it
(23:26:24) graphiclunarkid: Nice one. I think it's an enhancement rather than MVP and should come after v2.0.
(23:26:38) graphiclunarkid: (Possibly soon after, though, depending on how popular we get!
(23:26:55) dantheta: The pyprobes are very, very fast though, and we can monitor the queue depth
(23:27:10) dantheta: BTW, do you still want issues in Blocking-Middleware repo, or would you prefer new ones to go in cmp-issues?
(23:27:36) graphiclunarkid: Raise it in Blocking-Middleware if you know it's to do with that component.
(23:27:43) dantheta: Cool, will do.
(23:27:57) graphiclunarkid: cmp-issues is useful for general things or for people new to the project who we can't expect to know which repo does what.
(23:28:52) graphiclunarkid: Both are aggregated on waffle.io so it doesn't matter where they're added. But in practice adding issues to the repos themselves means referencing them in commit messages is less tedious because you can type #n rather than openrightsgroup/repo#n ;-)
(23:29:18) dantheta: Right, that's added.
(23:30:00) mkillock: dantheta: will you put something in the JSON reponse or use HTTP code?
(23:31:09) mkillock: I think I have a preference for something in the JSNO
(23:31:11) graphiclunarkid: Allll: I've deleted the results.html page from the github repo to match stage.blocked.org.uk
(23:31:11) dantheta: Probably JSON. The 201 HTTP response would refer to the requests table entry, which always gets created.
(23:31:13) mkillock: *JSON
(23:31:54) dantheta: Perhaps "queued" => false/true ?
(23:32:28) mkillock: dantheta: that would be fine, yeah
(23:32:57) Allll: glk: I'd leave it in as it has the HTML I'm using to display results - I'm trying to keep examples of all the code used in the HTML repo
(23:33:01) dantheta: mkillock: Cool. I've noted that.
(23:33:36) Allll: glk: actually it's pretty much the thank you page - perhaps rename it
(23:34:18) graphiclunarkid: Allll: Damn, new I should have asked you first! OK, I'll change it back...
(23:35:16) mkillock: dantheta: queued=false means we've rate limited?
(23:35:27) Allll: no worries
(23:35:42) dantheta: mkillock: Yep. We've recorded the request, but we haven't posted it to the queues for retesting.
(23:38:15) NetworkString: graphiclunarkid is there copy of the credits text available anywhere?
(23:39:56) graphiclunarkid: NetworkString: No. We need to write some.
(23:40:14) Allll: This is the sort of thing I was thinking of for the logo (it's only VERY rough ATM) : https://dl.dropboxusercontent.com/u/12755204/logo.png
(23:40:19) graphiclunarkid: NetworkString: If you need to cheat for time reasons you could always reference a page like http://www.blocked.org.uk/credits.html and we could then write that into the main site - which also needs such a page ;-)
(23:40:37) graphiclunarkid: NetworkString: Or feel free to make up something appropriate.
(23:40:52) Allll: do you want me to add a placeholder page for this?
(23:41:53) graphiclunarkid: Allll: Yes please, that would be handy, though if we could do it via git that would facilitate a discussion of its contents before we move it to Modx.
(23:43:44) graphiclunarkid: Allll: would it be better if I pulled the content of thankyou.html from stage.blocked.org.uk rather than restoring results.html into the git repo?
(23:51:45) Allll: Prob best to leave it as is at the moment, pasting in the code from modx will result in a load of extra code that breaks the design/HTML - it's easier for me to create it using a proper IDE then cut and paste into modx and add the cmodx code
(23:52:11) Allll: could you just delete your commit where you delete the page - I'm trying to do a rebase and it's crapping out :(
(23:53:19) graphiclunarkid: Allll: OK, hang on a tick...
(23:54:54) graphiclunarkid: Allll: If you get the latest it should be back to how it was before now.
(23:55:45) Allll: cheers :) I've renamed the page and added a placeholder for the credits
(23:56:08) graphiclunarkid: Allll: Thanks :)
(23:56:19) graphiclunarkid: Allll: The logo looks good BTW!
(23:56:54) Allll: cool, if you're happy with that style I'll see if I can get it actually looking good tomorrow
(23:58:14) graphiclunarkid: If you mean the general style, then yes, I think it looks great :)
(23:58:33) graphiclunarkid: It looks like the style of index.html is different to the rest of the pages in git though.
(23:59:20) graphiclunarkid: Not sure why since the stylesheet references seem to be the same.
(23:59:37) Allll: yeah, it's getting to be a pain to manually cut and paste every change to the header/footer on all pages so I'm just keeping index up to date, once that's all settled down I'll update all pages.
(23:59:53) Allll: should only be minor differences tho
(00:00:00) graphiclunarkid: Oh I see! Fair enough.
(00:02:38) graphiclunarkid: Right folks, it's gone midnight here, so I think I need to call it a night.
(00:03:21) graphiclunarkid: I'll be working on the site again tomorrow. Finalising the page content is top of my list but if there's anything else you need just yell.
(00:04:31) mkillock: it's late for me too, night all!
(00:04:57) graphiclunarkid: Cheers mkillock, see you later
(00:05:25) graphiclunarkid: Goodnight Allll, dantheta, vasilis, NetworkString
(00:05:51) NetworkString: night
(00:06:03) Allll: night