Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlating technical replicates #18

Closed
yaaminiv opened this issue Oct 11, 2017 · 29 comments
Closed

Correlating technical replicates #18

yaaminiv opened this issue Oct 11, 2017 · 29 comments

Comments

@yaaminiv
Copy link
Collaborator

As @emmats suggested, I regressed my technical replicates against each other for each transition to see if some transitions were messier than others. You can see my work in my lab notebook entry.

There are definitely some transitions with lower adjusted R squared values than others. My first instinct is to establish some sort of R-squared cutoff, remove transitions lower than this cutoff, and then remake my NMDS plot. While I'm going through each transition, I can also see if there are certain outliers or leverage points that could be influencing the R-squared values (for those close to the cutoff).

Any suggestions for what that cutoff should be?

@emmats
Copy link

emmats commented Oct 11, 2017

What is the range? I would go pretty high with the cut-off. Your replicates should be right on top of each other. Maybe have @laurahspencer run the same script and figure out what her range of R2 values are? Off the cuff, I would say cut-off should be at least 0.85.

@yaaminiv
Copy link
Collaborator Author

The range is .2 to .9 (there are examples of each in my notebook), with the majority being above 0.6.

@yaaminiv
Copy link
Collaborator Author

Some examples for context. Peak area from the first batch of technical replicates on the x-axis, peak area from the second batch of technical replicates on the y-axis. Points are labelled with the oyster sample ID.

bad-R

good-R

@sr320
Copy link
Member

sr320 commented Oct 11, 2017

To me it should be some defined range around a line that is has slope of 1.
Thus based on replicates and not proteins

@laurahspencer
Copy link

I did a quick work-up using Yaamini's script. Summary data for R^2:

  • Mean: 0.8636
  • Min^: 0.6507
  • Max: 0.9679
  • Median: 0.9016
    ^One peptide from Superoxide Dismutase had an awful R^2 for 2 transitions (<0.1), which were outliers.

NOTE: This wasn't using the full data set. I have 17 samples with 3 reps, and 3 samples with 4 reps; only the first 2 reps run are represented here, which likely skews things a bit (didn't want to dig too deep into modifying the code).

@yaaminiv
Copy link
Collaborator Author

yaaminiv commented Oct 12, 2017

@emmats Maybe I can start with a 0.65 R-squared cutoff. If that doesn't improve anything, work up to a 0.85 cutoff?

@sr320 can you elaborate on your suggestion? From what I understand, I would plot x = y line in addition to a linear regression, and then consolidate the two somehow?

@sr320
Copy link
Member

sr320 commented Oct 12, 2017 via email

@emmats
Copy link

emmats commented Oct 12, 2017

I think the 0.6 cut-off sounds safe. But, of course, @sr320 makes the final call.

This is pretty informative for me. I've never done this before.

@yaaminiv
Copy link
Collaborator Author

@emmats Here's my plan:

  1. Normalize all my values by TIC to reduce any external variation (the plots I've made so far are not normalized...what are your thoughts on this step?)
  2. Use a 0.6 cutoff and discard any transitions with an adjusted R-squared value below this. Remake an NMDS plot and examine clustering

AT THE SAME TIME...

  1. Plot an x = y line on each plot, as well as a 95% confidence interval. @sr320 and I discussed the value of this during class. Since our technical replicates should have the same protein abundances, we expect the best fit model to be a 1:1 ratio.
  2. Discard transitions with less than 95% of the points (43 points) within the condense interval. Remake an NMDS plot and examine clustering

Thoughts?

@emmats
Copy link

emmats commented Oct 12, 2017

I think that sounds good. I don't think you need to normalize by TICs. If your TICs vary widely between technical replicates, then you have other problems.

@yaaminiv
Copy link
Collaborator Author

@emmats @sr320
Notebook

I went through the first part of my plan and used R-squared cutoffs to eliminate transitions and remake NMDS plots. I used a combination of three cutoffs (0.6, 0.7 and 0.8) and normalized/nonnormalized data. I found normalizing made my plots look a little better. Overall this helped a bit, but the technical replication still doesn't look fantastic.

0.6, normalized:

0.6-normalized-NMDS

0.6-normalized-distances

0.7, normalized:

0.7-normalized-NMDS

0.7-normalized-distances

0.8, normalized:

0.8-normalized-NMDS

0.8-normalized-distances

I'll try the second part soon, but it may take me a bit longer since making a confidence interval around a line in a for loop is a bit more tedious. Any thoughts about these results?

@sr320
Copy link
Member

sr320 commented Oct 15, 2017 via email

@emmats
Copy link

emmats commented Oct 16, 2017

I'm still pretty suspicious of these data. It just doesn't make sense that the technical replicates don't look the same.

@sr320
Copy link
Member

sr320 commented Oct 24, 2017

@yaaminiv can you please provide a csv with respective technical in adjacent columns?

@yaaminiv
Copy link
Collaborator Author

@sr320 csv

I just tried playing with slopes and confidence intervals. I'm going to try one more thing on that front and then write it up in a lab nb post/possibly post a new issue

@yaaminiv
Copy link
Collaborator Author

Notebook

I was following @sr320 suggestion to look at slopes and plot a 95% confidence interval around an x = y line. Ran into some issues doing that (more details in my nb), so I can only really plot an x = y line and a prediction line (same intercept as regression, but a slope of 1) along with my data.

example

Any suggestions for how to move forward? A few of my issues are that there are large intercepts for the regression, so an x = y line is far removed and creating a confidence interval around an x = y line/prediction line is essentially impossible with my skill set because neither or those have any error (so plotting a CI would just lead to an upper and lower bound falling directly on top of the original line). I could look at the slope of the original regression and if it falls within some cutoff (1 ± some undetermined error value), I remove the transition and remake an NMDS?

Thoughts? (esp from @emmats since you think this data is suspicious?) I'm stumped, and the only thing I think may work now might be rerunning samples (but I don't know how possible that is)...

@yaaminiv
Copy link
Collaborator Author

There are also transitions that have poor R squared values but slopes close to 1. What should I do about those?

choyp_psa 1 1 m 27259 yfqiayplpk y4 confint

@sr320
Copy link
Member

sr320 commented Oct 24, 2017 via email

@yaaminiv
Copy link
Collaborator Author

@sr320 The one with tech reps in adjacent columns? I linked you to that!

@sr320
Copy link
Member

sr320 commented Oct 24, 2017

Sorry - I need it in just two columns with the sample IDs in a column...

@yaaminiv
Copy link
Collaborator Author

@sr320 I think I'm confused...so sample IDs in one column, transitions in another column?

@sr320
Copy link
Member

sr320 commented Oct 24, 2017

Col1-transition | Col2-sampleID | Col3-rep1 | Col4-rep2

@sr320
Copy link
Member

sr320 commented Oct 24, 2017

Column1 and Column2 could be switched....

@yaaminiv
Copy link
Collaborator Author

normalized or not normalized?

@yaaminiv yaaminiv reopened this Oct 24, 2017
@sr320
Copy link
Member

sr320 commented Oct 24, 2017

How about both...

@yaaminiv
Copy link
Collaborator Author

Normalized
Not normalized

@sr320
Copy link
Member

sr320 commented Oct 24, 2017

Use this data to start making graphs - simply average reps.

http://d.pr/f/OtdTD

This is just the normalized data with coefficient of variance less than 20.

@yaaminiv
Copy link
Collaborator Author

yaaminiv commented Oct 26, 2017

Notebook

Used CV filtering to redo NMDS/ANOSIM analyses. Slight improvement in technical replication, ANOSIM/NMDS indicates no significant clustering pattern.

Going to filter data with CV ≤ 10 and repeat. Will also look at expression of individual proteins making boxplots, etc. Interested in your thoughts @emmats.

@yaaminiv
Copy link
Collaborator Author

Seeing how we've answered my original question, I'm going to continue the current conversation in #35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants