I would like to convert on a specific column the timestamp in a specific date.
Here is my input :
+----------+
| timestamp|
+----------+
|1532383202|
+----------+
What I would expect :
+------------------+
| date |
+------------------+
|24/7/2018 1:00:00 |
+------------------+
If possible, I would like to put minutes and seconds to 0 even if it's not 0.
For example, if I have this :
+------------------+
| date |
+------------------+
|24/7/2018 1:06:32 |
+------------------+
I would like this :
+------------------+
| date |
+------------------+
|24/7/2018 1:00:00 |
+------------------+
What I tried is :
from pyspark.sql.functions import unix_timestamp
table = table.withColumn(
'timestamp',
unix_timestamp(date_format('timestamp', 'yyyy-MM-dd HH:MM:SS'))
)
But I have NULL.
udf
when there is an equivalent API function. Depending on the size of your data, it may not make a noticeable difference but it's much more efficient to allow all of the processing happen inside the JVM.