More from @tangming2005

Ming "Tommy" Tang

@tangming2005

Sep 12

Bioinformatics one-line Day 18
1/

delete the blank lines
```
sed /^$/d'
```
delete the last line
```
sed $d
```
sed '1d' to remove the header for all csv files

```
ls *csv | parallel 'cut -f, -d 2 | sed '1d' > {/.}.list'
```

2/ print the second line of a LARGE file and quit:

sed -n '2{p;q}'

#unix #oneliner #bioinformatics #sed

That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience

https://twitter.com/tangming2005/status/1569324488377909249

Read 4 tweets

Ming "Tommy" Tang

@tangming2005

Sep 9

1/ How I started my bioinformatics journey. A thread:
Back in my PhD, I was studying gene regulation in the context of cancer. My first paper was on CTCF functioning as an enhancer blocker at the VEGFA locus. divingintogeneticsandgenomics.rbind.io/publication/20… yeah, CTCF and VEGFA are my two favorites!

2/ my second paper was on identifying a cofactor SFMBT as a co-factor of LSD1 complex divingintogeneticsandgenomics.rbind.io/publication/20… both papers are pure biochemistry studies, and I am so proud that I did western blot, northern blot, lentivirus knock down and ChIP-qPCR etc

3/ it was around 2012. the sequencing technology was booming and a particular assay called ChIP-seq to identify global transcription factor binding sites was really popular. I naturally wanted to identify the binding sites of CTCF, LSD1, and histone modifications in the genome.

Read 15 tweets

Ming "Tommy" Tang

@tangming2005

Sep 9

1/ Bioinformatics one-liner day Day 15
get all the folders' sizes in the current folder: du -h --max-depth=1
the total size of the current directory: du -sh .
display disk space: df -h

2/
memory usage: free -mg
open `top -M` with human readable size in Mb, Gb.
install htop htop.dev for better visualization.
#unix #onliner

That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience

https://twitter.com/tangming2005/status/1568237585079881728

Read 4 tweets

Ming "Tommy" Tang

@tangming2005

Jul 27

1/ "What's the most important factor for your success?"

2/ I have heard many answers.

The most common one I got is "luck".

3/ A little story:

Qi Lu was earning $27 a month when he was 27 years old. At 47, he was the president of Microsoft.

Read 9 tweets

Ming "Tommy" Tang

@tangming2005

Jul 26

1/ Using random forest to calculate feature importance?
The importance score might be biased. #machinelearning #featureimportance Thanks @Matthew_N_B for pointing it out
A thread 👇

2/ explained.ai/rf-importance/

The takeaway from this article is that the most popular RF implementation in Python (scikit) and R's RF default importance strategy do not give reliable feature importances

3/
when “... potential predictor variables vary in their scale of measurement or their number of categories.” (Strobl et al). Rather than figuring out whether your data set conforms to one that gets accurate results, simply use permutation importance.

Read 4 tweets