Genomics I/O: 2012

Thursday, November 15, 2012

Fixing the proxy problem in Mendeley Developer preview 1.7 on Ubuntu 12.04 LTS

I am a big fan of Mendeley (www.mendeley.com) as a tool for storing and organizing my references. In particular the pretty robust extraction of bibliographic data from PDFs saves me a couple of steps in the workflow that I was used to when doing this with Endnote.
I recently received my new (proper) workstation and I have switched from running Ubuntu 10.04LTS in a virtual machine to using Ubuntu 12.04LTS as the main OS and rather running a virtual machine instance of Windows 7 for the few remaining pieces of software which need a Windows OS. Since switching over to Linux, I have not been able to use the Linux version of Mendeley, as it steadfastly refuses to connect to the internet via our work proxy server. While it is possible to configure the proxy settings under "Tools/Options/Connection" the connection test (which can be invoked by pressing Ctrl+Shift+D to bring up the debug console - "Test Internet Connection") reported that it wasn't able to provide proxy authentication. I tried several fixes as outlined in the Mendeley support forums (e.g. http://feedback.mendeley.com/forums/4941-general/suggestions/1051563-generic-linux-bug-connecting-through-proxy-does-n) but none of them worked...

Eventually a combination of defining system wide proxy settings as described here:

http://askubuntu.com/questions/150210/how-do-i-set-systemwide-proxy-servers-in-xubuntu-lubuntu-or-ubuntu-studio

and here some GNOME specific settings:

http://developer.gnome.org/ProxyConfiguration/

and setting the "Tools/Options/Connection" settings in Mendeley to

"No Proxy"

have fixed this problem.

Now Mendeley uses the system-wide proxy setting for connecting to the server and nicely syncs my library the way I was used to. Unfortunately I cannot tell which environment variable is relevant to Mendeley, but as I am using a number of applications which need internet access and everyone of them seems to read out different variables the best way seems to be to set all possible PROXY variables with the relevant connection information.

Thursday, October 25, 2012

Installing "mixOmics" under R-2.15.1 - fixing missing dependencies

mixOmics

The package "mixOmics" depends on a nunber of packages which render some of the 3D graphics output, among them "rgl" which in turn needs a OpenGL libraries/header files for successful compilation. These can be installed by issuing following command in a separate terminal window:

sudo apt-get install libglu1-mesa-dev libgl1-mesa-dev

Once the system packages have been installed, issuing following command in your R session will install the "mixOmics" package:

install.packages("mixOmics")

Installing R-2.15.1 source package on Ubuntu 12.04 LTS

When attempting to install R-2.15.1 from source on a freshly installed Ubuntu 12.04.1 desktop I ran into several problems stemming from unmet dependencies. The by far most challenging to resolve was the missing of "libg2c0" which is required to run mixed C/FORTRAN code.

The error message from running ./configure in the R source directory:

checking whether mixed C/Fortran code can be run... configure: WARNING: cannot run mixed C/Fortan code
configure: error: Maybe check LDFLAGS for paths to Fortran libraries?

After spending approximately two hours trying to resolve this unmet dependency by several different approaches, the command which brought the solution to this was:

sudo apt-get install cfortran

And some additional packages to resolve issues with HTML and PDF format R manuals:

apt-get install texlive-fonts-extra fig2ps

"fig2ps" in particular will install a lot of dependencies if TEX has not previously been installed on the system.

Thursday, September 6, 2012

The #ENCODE buzz on Twitter

Tweets about "#ENCODE"

Monday, July 16, 2012

As I am a fan of Ed Yong's work and fully agree with his opinion on the "cuddle hormone" I am linking to the storification of his recent #schmoxytocin rant on Twitter:

Tuesday, June 5, 2012

Learning to think in R... I

A developmental process.

I have recently started working in a new position which gives me the opportunity to closely collaborate with "real" statistician. This new colleague is a like a treasure trove of knowledge to me, when it comes to the stringent analysis of genome-scale datasets - which is what I now do on a daily basis. Of course, this involves using R (http://cran.r-project.org/) and the Bioconductor packages (http://bioconductor.org). I have previously made use of the R environment for different types of analyses, but these were always restricted to dealing with almost finalized datasets, i.e. all the data tidying, pruning and preparation had already been done, usually using a combination of SQL (Postgres), scripting languages (Ruby, Python, Perl) and shell scripts (bash/awk/sed).

Recently I have begun moving my datasets straight into R and perform the re-formatting and organization within the R environment. As anyone who uses R on a regular basis knows, the R language is actually extremely powerful for these particular tasks. However, for someone who originally learned programming using procedural languages (object-orientation was in its infancy when I learned Unix SysV programming...), it does require some changes in the way of thinking. My initial ineptitude can be nicely illustrated in this following example:

Objective: To re-organize data from four individual tables (samples in columns, CNV probabilities per BAC in a row, one table each for "normal", "loss", "gain", "amplification", all organized in the same fashion), into matrices for each sample with BACs in columns, and a row per status containing the probabilities. This information is then supposed to be collapsed into a single "status" variable.

My first (non-working) approach looked like this:

for ( i in 1:length(colnames(Probain))) {

name <- paste("matrix", colnames(Probgain[i]), sep=".")

for ( j in 1:length(rownames(Probgain[i]))) {

if (j == 1) {

assign(name, as.matrix(c(Probnorm[j,i],Probloss[j,i],Probgain[j,i],Probamp[j,i])))

}

else {

print(j)

name<-cbind(as.matrix(c(Probnorm[j,i],Probloss[j,i],Probgain[j,i],Probamp[j,i])))

}

This is essentially a tell-tale sign of a procedural thought process: Create two nested loops which work throught the , include a check for the condition the loop is in, and based on that change the behaviour. The end result was a properly named matrix per sample, which only contained a single column...

After a lot of introspection, I finally arrived at this solution:

for ( i in 1:length(colnames(Probgain))) {
name<-paste("matrix", colnames(Probgain[i]), sep=".")
assign(name,rbind(Probnorm[,i],Probloss[,i],Probgain[,i],Probamp[,i]))
}

This does the job. End result are matrices for each sample, containing the probabilities for each status for each BAC. And it also nicely illustrates, that R 's way of dealing with data objects really lends itself to re-formatting data...

I am planning to blog on a regular basis about my experiences delving deeper into the R environment, and I hope to thereby provide some interesting information for other people in the field in similar situations.

Monday, June 4, 2012

Code:

# This command will create a new high-dimensional list which will contain the

# value "gain" for all variables which had a numerical value between 0.5 and 1,

# as well as the value "NA" for all variables with values between 0 and 0.5

probgain.recoded<-

+lapply(probgain,

+function(x)

+recode(x, recodes="0.5:1='gain'; 0:0.5=NA", as.factor.result = FALSE))

# now we're turning this list into a data.frame again and attaching

# the col and rownames from the original data frame

probgain.recoded<-data.frame(probgain.recoded)

colnames(probgain.recoded)<-colnames(probgain)

rownames(probgain.recoded)<-rownames(probgain)